Common metacharacters are:
[] ^ $ \ * + ? {} .
Using regular expression in python needs to import re module
Here's how
[] specifies a character set, [ABC] represents the character set of ABC, and [^ ABC] is reversed, except for the character set of ABC.
>>> import re >>> r = r"ABC[ABC]" #Define a regular >>> >>> re.findall(r,'ABCA') #Match with findall ['ABCA'] >>> re.findall(r,'ABCB') ['ABCB'] >>> re.findall(r,'ABCD') [] >>> r = r"ABC[^ABC]" #Take inverse >>> re.findall(r,'ABCD') ['ABCD'] >>> re.findall(r,'ABCA') [] >>>
^Match the first ^ H of a line. If the first is h, then h is returned. If it is not h, then null is returned
>>> r = r"^h" >>> re.findall(r,'hello') ['h'] >>> re.findall(r,'ehllo') [] >>>
It's not hard to understand that $matches the end of the line, and ^
>>> r = r"h$" >>> re.findall(r,'hello') [] >>> re.findall(r,'olleh') ['h'] >>>
\Escape character
Add different characters after the backslash to indicate different special meanings
\d matches any decimal number equal to [0-9] \D matches any non numeric character, equivalent to [^ 0-9] \s matches any white space character, equivalent to [\ t\n\r\f\v] \S matches any non whitespace character, equivalent to [^ \ t\n\r\f\v] \w matches any alphanumeric character, equivalent to [a-zA-Z0-9] \W matches any non alphanumeric character, equivalent to [^ a-zA-Z0-9]
>>> r1 = r"day=\d" >>> r2 = r"day=\D" >>> st = 'day=1 day=2 day=3 day=a day=b day=c' >>> re.findall(r1,st) ['day=1', 'day=2', 'day=3'] >>> re.findall(r2,st) ['day=a', 'day=b', 'day=c'] >>>
>>> r3 = r"enter=\s" >>> r4 = r"enter=\S" >>> >>> st = ''' enter= enter=1 enter= enter=3 enter= enter= ''' >>> re.findall(r3,st) ['enter=\n', 'enter= ', 'enter=\n', 'enter=\n'] >>> re.findall(r4,st) ['enter=1', 'enter=3'] >>>
>>> r5 = r"\w" >>> r6 = r"\W" >>> st = 'abcdefg1234567!@#$%^&' >>> re.findall(r5,st) ['a', 'b', 'c', 'd', 'e', 'f', 'g', '1', '2', '3', '4', '5', '6', '7'] >>> re.findall(r6,st) ['!', '@', '#', '$', '%', '^', '&'] >>>
* + ? It's all about repetition
*Repeat 0 or more times, + 1 or more times? Repeat 0 or 1 times
>>> r1 = r"ab*" >>> re.findall(r1,'a') ['a'] >>> re.findall(r1,'ab') ['ab'] >>> re.findall(r1,'abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb') ['abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb'] >>>
>>> r2 = r"ab+" >>> re.findall(r2,'ab') ['ab'] >>> re.findall(r2,'a') [] >>> re.findall(r2,'abbbbbbbbbbbbbbbbbbbbbbbbbbbbb') ['abbbbbbbbbbbbbbbbbbbbbbbbbbbbb'] >>>
>>> r3 = r"ab?" >>> re.findall(r3,'a') ['a'] >>> re.findall(r3,'ab') ['ab'] >>> re.findall(r3,'abb') ['ab'] >>> re.findall(r3,'abbbbbbbbbb') ['ab'] >>>
{} indicates the range of repetition, and {m,n} repeats at least m times and at most N times
Let's give you an example of the results, at least 1, at most 2.
>>> r1 = r"\d{1,3}" >>> re.findall(r1,'100') ['100'] >>> re.findall(r1,'1') ['1'] >>> re.findall(r1,'') [] >>> re.findall(r1,'1000') ['100', '0'] >>> re.findall(r1,'1001') ['100', '1'] >>>
It will throw extra bits into the back elements of the list.
.
The usage of. Is to match all. See the following example
. * is matched 0 or more times,. + is 1 or more times, generally+
>>> r1 = r"src=.*" >>> re.findall(r1,'src=img http qwerqwer') ['src=img http qwerqwer'] >>> re.findall(r1,'src=img http hello') ['src=img http hello'] >>>
>>> r1 = r".*.com" >>> re.findall(r1,'www.com') ['www.com'] >>> re.findall(r1,'hello.com') ['hello.com'] >>>
There is always one on the way to study and keep fit