Regular Basic Grammar
grammar | Explain | EXAMPLES OF EXPRESSIONS | Complete matching | ||
---|---|---|---|---|---|
. | Match any character | a.c | abc | ||
\ | Escape character | a\.c | a.c | ||
[] | character set | a[bcd]e | abe/ace/ade | ||
\d | Number: [0-9] | a\dc | a1c | ||
\D | Non-numeric: [^\ d] | a\Dc | abc | ||
\s | Blank characters: [<Blank>]\trn\f\v | a\sc | a c | ||
* | Match Fist A Character 0 or infinite times | abc* | ab/abccc | ||
+ | Match the previous character 1 or infinitely | abc* | abc/abccc | ||
? | Match the previous character 0 or 1 times | abc? | ab/abc | ||
\w | Word character [A-Za-z0-9_] | a\wc | abc | ||
{m} | Match the previous character m times | ab{2}c | abbc | ||
( ) | Matching expressions in parentheses also represents a group |
re.match
Trying to match a pattern from the actual position of the string, returning none if the match is not successful
Conventional matching:
import re content="Hello 123 456 World is a houzi" result = re.match("^Hello\s\d\d\d\s\d\d\d\s\w.*houzi$",content) print(result.group()) print(result) print(result.span()) #Return results >>>Hello 123 456 World is a houzi >>><_sre.SRE_Match object; span=(0, 30) >>>match='Hello 123 456 World is a houzi'> >>>(0, 30)
Universal Matching: * Matching any arbitrary character
import re content="Hello 123 456 World is a houzi" result = re.match("^Hello.*houzi$",content) print(result.group()) print(result) print(result.span()) # Return results >>>Hello 123 456 World is a houzi >>><_sre.SRE_Match object; span=(0, 30) >>>match='Hello 123 456 World is a houzi'> >>>(0, 30)
Matching target:
import re content="Hello 123 456 World is a houzi" result = re.match("^Hello\s(\d+)\s.*houzi$",content) print(result.group(1)) print(result) print(result.span()) # Output results >>> 123 >>> <_sre.SRE_Match object; span=(0, 30) >>> match='Hello 123 456 World is a houzi'> >>> (0, 30)
Greed Matches:
* Match as many characters as possible
import re content="Hello 123 456 World is a houzi" result = re.match("^Hello.*(\d+)\s.*houzi$",content) print(result.group(1)) print(result) print(result.span()) # Output results >>> 6 # 123 45 is all matched. >>> <_sre.SRE_Match object; span=(0, 30) >>> match='Hello 123 456 World is a houzi'> >>> (0, 30)
Non-greedy matching:
Match as few characters as possible
import re content="Hello 123 456 World is a houzi" result = re.match("^Hello.*?(\d+)\s.*houzi$",content) print(result.group(1)) print(result) print(result.span()) #Output results >>> 123 # * Stop when only numbers are matched >>> <_sre.SRE_Match object; span=(0, 30) >>> match='Hello 123 456 World is a houzi'> >>> (0, 30)
Matching mode:
import re content="Hello 123 456 World \n is a houzi" result = re.match("^Hello.*?(\d+).*houzi$",content,re.S) # re.S enables. * to match special characters print(result.group(1)) print(result) print(result.span()) # Output results 123 <_sre.SRE_Match object; span=(0, 32), match='Hello 123 456 World \n is a houzi'> (0, 32)
Transliteration:
Add escape before special symbols
import re content="Hello $5.00" result = re.match("Hello \$5\.00",content) print(result.group()) print(result.span())
re.search
- Search the entire string and return the first successful match
import re content="Hello $5.00" result = re.search("(Hello)",content) print(result.group()) print(result.span()) # Output results >>> Hello >>> (0, 5)
Findall (Find all)
- Search for strings and return all matchable substrings in list form
ret = re.findall("Matching rules",Matching object) # The return is a list re.findall(".","\n",re.S) #re.S can match newline characters
Re.sub (replacement)
- Replace each matching substring in the string and return the replaced string
import re content="Hello $5.00" result = re.sub("(Hello)","shut!",content) print(result) # Output results >>> shut! $5.00
import re content="Hello $5.00" # R "1 shut!" means to get the first group of matches. result = re.sub("(Hello)",r"\1 shut!",content) print(result) # Output results >>> Hello shut! $5.00
- You can use sub
import re content="<a src='www.baidu.com'></a>" \ "<li>ssss</li>" result = re.sub("<a.*?></a>","",content) print(result) # Output results >>> <li>ssss</li>
Compile
- Compiling regular strings into regular objects to facilitate reuse and change matching patterns
import re content="<a src='www.baidu.com'></a>" \ "<li>ssss</li>" pattern = re.compile("<a.*?></a>",re.S) result = re.match(pattern,content) print(type(pattern)) print(result.group()) # Output results <class '_sre.SRE_Pattern'> <a src='www.baidu.com'></a>
Finally, regular work is really used more, or study hard, has been constantly beyond their own. The first article to this bar