Regular expression - basic use of common metacharacters

Keywords: Python

Common metacharacters are:

[]  ^  $  \  *  +  ?  {} .

Using regular expression in python needs to import re module

Here's how

[] specifies a character set, [ABC] represents the character set of ABC, and [^ ABC] is reversed, except for the character set of ABC.

>>> import re
>>> r = r"ABC[ABC]" #Define a regular
>>>
>>> re.findall(r,'ABCA') #Match with findall
['ABCA']
>>> re.findall(r,'ABCB')
['ABCB']
>>> re.findall(r,'ABCD')
[]
>>> r = r"ABC[^ABC]" #Take inverse
>>> re.findall(r,'ABCD')
['ABCD']
>>> re.findall(r,'ABCA')
[]
>>>

^Match the first ^ H of a line. If the first is h, then h is returned. If it is not h, then null is returned

>>> r = r"^h"
>>> re.findall(r,'hello')
['h']
>>> re.findall(r,'ehllo')
[]
>>>

It's not hard to understand that $matches the end of the line, and ^

>>> r = r"h$"
>>> re.findall(r,'hello')
[]
>>> re.findall(r,'olleh')
['h']
>>>

\Escape character

Add different characters after the backslash to indicate different special meanings

\d matches any decimal number equal to [0-9]
\D matches any non numeric character, equivalent to [^ 0-9]
\s matches any white space character, equivalent to [\ t\n\r\f\v]
\S matches any non whitespace character, equivalent to [^ \ t\n\r\f\v]
\w matches any alphanumeric character, equivalent to [a-zA-Z0-9]
\W matches any non alphanumeric character, equivalent to [^ a-zA-Z0-9]
>>> r1 = r"day=\d"
>>> r2 = r"day=\D"
>>> st = 'day=1 day=2 day=3 day=a day=b day=c'
>>> re.findall(r1,st)
['day=1', 'day=2', 'day=3']
>>> re.findall(r2,st)
['day=a', 'day=b', 'day=c']
>>>
>>> r3 = r"enter=\s"
>>> r4 = r"enter=\S"
>>>
>>> st = '''
enter=
enter=1 enter=  enter=3 enter=
enter=
'''
>>> re.findall(r3,st)
['enter=\n', 'enter= ', 'enter=\n', 'enter=\n']
>>> re.findall(r4,st)
['enter=1', 'enter=3']
>>>
>>> r5 = r"\w"
>>> r6 = r"\W"
>>> st = 'abcdefg1234567!@#$%^&'
>>> re.findall(r5,st)
['a', 'b', 'c', 'd', 'e', 'f', 'g', '1', '2', '3', '4', '5', '6', '7']
>>> re.findall(r6,st)
['!', '@', '#', '$', '%', '^', '&']
>>>

* + ? It's all about repetition

*Repeat 0 or more times, + 1 or more times? Repeat 0 or 1 times

>>> r1 = r"ab*"
>>> re.findall(r1,'a')
['a']
>>> re.findall(r1,'ab')
['ab']
>>> re.findall(r1,'abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
['abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb']
>>>
>>> r2 = r"ab+"
>>> re.findall(r2,'ab')
['ab']
>>> re.findall(r2,'a')
[]
>>> re.findall(r2,'abbbbbbbbbbbbbbbbbbbbbbbbbbbbb')
['abbbbbbbbbbbbbbbbbbbbbbbbbbbbb']
>>>
>>> r3 = r"ab?"
>>> re.findall(r3,'a')
['a']
>>> re.findall(r3,'ab')
['ab']
>>> re.findall(r3,'abb')
['ab']
>>> re.findall(r3,'abbbbbbbbbb')
['ab']
>>>

{} indicates the range of repetition, and {m,n} repeats at least m times and at most N times

Let's give you an example of the results, at least 1, at most 2.

>>> r1 = r"\d{1,3}"
>>> re.findall(r1,'100')
['100']
>>> re.findall(r1,'1')
['1']
>>> re.findall(r1,'')
[]
>>> re.findall(r1,'1000')
['100', '0']
>>> re.findall(r1,'1001')
['100', '1']
>>>

It will throw extra bits into the back elements of the list.

The usage of. Is to match all. See the following example

. * is matched 0 or more times,. + is 1 or more times, generally+

>>> r1 = r"src=.*"
>>> re.findall(r1,'src=img http qwerqwer')
['src=img http qwerqwer']
>>> re.findall(r1,'src=img http hello')
['src=img http hello']
>>>
>>> r1 = r".*.com"
>>> re.findall(r1,'www.com')
['www.com']
>>> re.findall(r1,'hello.com')
['hello.com']
>>>


There is always one on the way to study and keep fit

Posted by factoring2117 on Tue, 28 Apr 2020 10:22:16 -0700