Regular expressions are not part of Python. Regular expression is a powerful tool for string processing. It has its own unique grammar and an independent processing engine. It may not be as efficient as str's own method, but it is very powerful. Thanks to this, in languages that provide regular expressions, the syntax of regular expressions is the same. The difference lies only in the number of grammars supported by different programming languages. But don't worry, unsupported grammars are usually part of the infrequent. If you've used regular expressions in other languages, you can start with a simple look.
Regular expression concept
Use a single string to describe a sequence of strings that match a certain syntactic rule
A logical formula for string manipulation
Application Scenarios: Processing Text and Data
Regular representation is a process: the expression is compared with the characters in the text in turn. If every character matches, the matching succeeds; otherwise, the matching fails.
Character Matching
character | describe |
---|---|
. | Match any character except \n |
\d \D | Digital/non-digital |
\s \S | Blank/non-blank characters |
\w \W | Word character [a-zA-Z0-9]/non-word character |
\b \B | Word boundaries, a range between w and W, reversible in order/non-word boundaries |
Match any character
# Match string a B c,. Stands for B >> re. match ('a. c','abc'). group ()'abc'
Digital and non-digital
# Match any number >> re.match (' d','1'). group ()'1' match any non-number > > > > re.match (' D','a'). group ()'a'
Blank and non-blank characters
# Match any blank character >> re.match (" s"). group ()' match any non-blank character > > > > re.match (" S", "1"). group ()'1'> > re.match (" S", "a"). group ()'a'a'.
Word Characters and Non-Word Characters
The word character stands for [a-zA-Z0-9]
# Match any one word character >> re.match (" w", "a"). group ()'a'> > > re.match (" w", "1"). group ()'1' match any non-word character >>> re.match ("\ W","). group ()'.
Number matching
character | matching |
---|---|
* | Match the previous character 0 or infinite times |
+ | Match the previous character once or infinitely |
? | Match the previous character 0 or 1 times |
{m}/{m,n} | Match the previous character m or N times |
*?/+?/?? | Change the matching mode to greedy mode (match as few characters as possible) |
introduce
character | matching |
---|---|
prev? | 0 or 1 prev |
prev* | 0 or more prev, match as many as possible |
prev*? | 0 or more prev, matching as little as possible |
prev+ | One or more prev, match as many as possible |
prev+? | One or more prev, matching as little as possible |
prev{m} | m consecutive prev |
prev{m,n} | m to n consecutive prev, matching as many as possible |
prev{m,n}? | m to n consecutive prev, matching as little as possible |
[abc] | a or b or c |
[^abc] | Non (a or b or c) |
Match the previous character 0 or infinite times
>>> re.match('[A-Z][a-z]*','Aaa').group()'Aaa' >>> re.match('[A-Z][a-z]*','Aa').group()'Aa' >>> re.match('[A-Z][a-z]*','A').group()'A'
Match the previous character once or infinitely
# Match the previous character at least once, and if not at all, it will report an error >> re.match ('[A-Z] [a-z]+','A'). group () Traceback (most recent call last): <stdin> File, line 1, in < module > AttributeError:'NoneType'has no attribute'group'.
>>> re.match('[A-Z][a-z]+','Aa').group()'Aa' >>> re.match('[A-Z][a-z]+','Aaaaaaa').group()'Aaaaaaa'
Match the previous character 0 or 1 times
>>> re.match('[A-Z][a-z]?','A').group()'A' # Only one a >> re. match ('[A-Z] [a-z]?','Aaaa'). group ()'Aa'
Match the previous character m or N times
# Re.match ('w {5}','asd234'). group ()'asd23'''# group ()'asd23' # matches the previous character 6-10 times > > > > > > > > > > > > > > > > > > > > re.match ('\w {6,10}','asd234'). group ()'asd234'). group ()'asd234''.ea'
Change Matching Mode to Greedy Mode
Re.match (r'[0-9] [a-z]*','1bc']*'''','1bc']]*'''','1bc''). group ()'1bc''#*? Matching 0 times or multiple times > > > > > > > > re.match (r'[0-9] [0-9] [a-z]]]*??'','1bc'). group ()()'1bc''''''. group ()'1'\#++++??????','1bc'']]]]]]]*','1bc'''. group ()''1bc'''Or once > > > re. match (r'[0-9] [a-z]?','1bc '). group ()'1'
Greedy matching and non-greedy matching
Boundary matching
character | matching |
---|---|
^ | Match the beginning of the string |
$ | Match the end of the string |
\A \Z | The specified string must appear at the beginning/end |
Match the beginning of the string
# Must start with the specified string and end with @163.com >>> re.match ('^[ w]{4,6}@163.com$','asdasd@163.com'). group ()'asdasd@163.com'
Match the end of the string
# Must end with. me >>> re. match ('[ w] {1,20}. me $','ansheng. me'). group ()'ansheng. me'
The specified string must appear at the beginning/end
>>> re.match(r'\Awww[\w]*\me','wwwanshengme').group()'wwwanshengme'
Group Matching of Regular Expressions
| Match any left or right expression
>>> re.match("www|me","www").group()'www' >>> re.match("www|me","me").group()'me'
(ab) Expressions in parentheses as a grouping
# Match 163 or 126 mailboxes >> re.match (r'[ w]{4,6} @ (163 | 126). com','asdasd@163.com'). group ()'asdasd@163.com'> > > re.match (r'[ w]{4,6} @ (163 | 126). com','asd@126.com'.
(?P
) Grouping up an individual name
>>> re.search("(?P<zimu>abc)(?P<shuzi>123)","abc123").groups()('abc', '123')
Group Matching String Referring to Alias name
>>> res.group("shuzi")'123' >>> res.group("zimu")'abc'
Common methods of re module
re.match()
Grammatical Format:
match(pattern, string, flags=0)
Interpretation:
Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.
Example:
# Matching from scratch returns the matched object >> re. match ("abc", "abc123def"). group ()'abc' # Matching from scratch, if no corresponding string is matched, an error is reported > > re. match (" d", "abc123def"). group () Traceback ("most recent call last):" File (< stdin >), line 1, in < module > AttributeError:'NoneType'object has no attribute'group'.
re.search()
Grammatical Format:
search(pattern, string, flags=0)
Interpretation:
Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.
Example:
# Match the entire string and return the matched object >> re. search (" d", "abc1123def"). group ()'1'when matched to the first time.
re.findall()
Grammatical Format:
findall(pattern, string, flags=0)
Interpretation:
Return a list of all non-overlapping matches in the string.
Example:
# Match all the contents of the string and return the matched string as a list >> re. findall (" d", "abc 123def456") ['1','2','3','4','5','6']
re.split
Grammatical Format:
split(pattern, string, maxsplit=0)
Interpretation:
Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.
Example:
# Specifies to split by numbers, returning a list object >> re.split (" d+", "abc123def4+-*/56") ['a B c','def','+-*/','''] # to split by multiple characters > > re.split ("[\\\\\\\\\\\
re.sub()
Grammatical Format:
sub(pattern, repl, string, count=0)
Interpretation:
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.
Example:
# Replace abc with def >>> re.sub("abc","def","abc123abc")'def123def' # Replace only the first string found >>> re.sub("abc","def","abc123abc",count=1)'def123abc'
Example
The string method contains a hundred printable ASCII characters, upper and lower case letters, numbers, spaces, and punctuation marks.
>>> import string >>> printable = string.printable >>> printable'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
>>> import re # Defined string >>> source = '''I wish I may, I wish I migth... Hava a dish of fish tonight.''' # Retrieving wish in strings >>> re.findall('wish',source)['wish', 'wish'] # Query wish or fish anywhere in the source string >>> re.findall('wish|fish',source)['wish', 'wish', 'fish'] # Match wish from the beginning of the string >>> re.findall('^wish',source) [] # Matching I wish from the beginning of a string >>> re.findall('^I wish',source)['I wish'] # Matching fish from the end of a string >>> re.findall('fish$',source) [] # Match fish tonight from the end of the string. >>> re.findall('fish tonight.$',source)['fish tonight.'] # The query starts with w or f, followed by ish matching >>> re.findall('[wf]ish',source)['wish', 'wish', 'fish'] # Matching of queries with several wsh combinations >>> re.findall('[wsh]+',source) ['w', 'sh', 'w', 'sh', 'h', 'sh', 'sh', 'h'] # The query starts with li ght, followed by a matching of non-numerals and letters. >>> re.findall('ght\W',source)['ght.'] # The query begins with I, followed by a match with wish >>> re.findall('I (?=wish)',source)['I ', 'I '] # Finally, the query ends with wish, and the matches preceding I (I occurs as few times as possible) >>> re.findall('(?<=I) wish',source)[' wish', ' wish']
Case-insensitive matching
>>> re.match('a','Abc',re.I).group()'A'
r source string, escape, if you want to escape, add two\n
>>> import re >>> pa = re.compile(r'yangwen') >>> pa.match("yangwen.me") <_sre.SRE_Match object; span=(0, 7), match='yangwen'> >>> ma = pa.match("yangwen.me") >>> ma <_sre.SRE_Match object; span=(0, 7), match='yangwen'> # Matched values are stored in group >>> ma.group()'yangwen' # Returns all positions of the string >>> ma.span() (0, 7) # Matched strings are placed in strings >>> ma.string 'yangwen.me' # Instances are placed in re >>> ma.rere.compile('yangwen')