Shell Programming Regular Expressions

Keywords: Linux Google shell

This article focuses on the use of regular expressions in shell scripts.

Summary

Regular expressions are divided into basic Regular Expression and Extended Regular Expression.

It is not a tool program, but a standard basis for string processing. It uses a single string to search and match a series of strings that conform to a certain grammatical rule.

It consists of common characters (a~z) and special characters (metacharacters).

linux Text Processing Tool

Text Processing Tool Basic regular expressions Extended regular expressions
vi editor Support /
grep Support /
egrep Support Support
sed Support /
awk Support Support

Basic regular expressions

  • Common metacharacters in basic regular expressions

^ Matches the starting position of the input string. Use in square bracket expressions to indicate that the character set is not included. To match the ^ character itself, use\^
$: Matches the end of the input string. If the Multiline property of the RegExp object is set, then $matches either n or r. To match the $character itself, use$
Match any single character exceptrn
\ Mark the next character as a special character, a literal character, a backward reference, and an octal escape character. For example, n matches the character n. \ n matches the newline character. Sequence\MatchingandMatching(
* Match the previous subexpression zero or more times. To match * characters, use\*
[]: Character set. Matches any character contained. For example, [abc] can match a in plain
[^]: Set of assignment characters. Matches an arbitrary character that is not included. For example, [^ abc] can match any letter of plin in plain
[n1-n2]: Character range. Matches any character within the specified range. For example, [a-z] can match any lowercase letter character in the range of a to Z.
{n}: n is a non-negative integer, matching the determined n times. For example, o{2} cannot match the o in Bob, but can match the two o in the food.
{n,}: n is a non-negative integer, matching at least n times. For example, o{2,} does not match the o in Bob, but matches all the o in foooood. o{1,} is equivalent to o +. o{0,} is equivalent to o*
{n,m}: m and N are non-negative integers, where n <= m, matching n times at least and m times at most

  • Using grep as an example, first prepare a test file
he was short and fat.
He was wearing a blue polo shirt with black pants.
The home of Football on BBC Sport online.
the tongue is boneless but it breaks bones.12!
google is the best tools for search keyword.
The year ahead will test our political establishment to the limit.
PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words


#woood #
#woooooood #
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

Specific character

- n: Display line number
- i: Case-insensitive
- v: Reverse selection

[root@localhost ~]# grep -n 'the' test.txt 
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@localhost ~]# grep -in 'the' test.txt 
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
[root@localhost ~]# grep -vn 'the' test.txt 
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
7:PI=3.141592653589793238462643383249901429
8:a wood cross!
9:Actions speak louder than words
10:
11:
12:#woood #
13:#woooooood #
14:AxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.

Set "[]"

  • No matter how many characters in the middle bracket "[]" match only one of them.
[root@localhost ~]# grep -n 'sh[io]rt' test.txt 
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
  • Matching repetitive single character oo
[root@localhost ~]# grep -n 'oo' test.txt 
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#woood #
13:#woooooood #
15:I bet this place is really spooky late at night!
  • Find strings that are not w before oo
[root@localhost ~]# grep -n '[^w]oo' test.txt 
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
12:#woood #
13:#woooooood #
15:I bet this place is really spooky late at night!
  • Find strings without lowercase letters before oo
[root@localhost ~]# grep -n '[^a-z]oo' test.txt 
3:The home of Football on BBC Sport online.
  • Find rows with numbers
[root@localhost ~]# grep -n '[0-9]' test.txt 
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429

Head "^"

  • Find the line that has the first line
[root@localhost ~]# grep -n '^the' test.txt 
4:the tongue is boneless but it breaks bones.12!
  • Find lines that begin with lowercase letters
[root@localhost ~]# grep -n '^[a-z]' test.txt 
1:he was short and fat.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
8:a wood cross!
  • Find lines that begin with capital letters
[root@localhost ~]# grep -n '^[A-Z]' test.txt 
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
6:The year ahead will test our political establishment to the limit.
7:PI=3.141592653589793238462643383249901429
9:Actions speak louder than words
14:AxyzxyzxyzxyzC
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
  • Find lines that do not begin with letters
[root@localhost ~]# grep -n '^[^a-zA-Z]' test.txt 
12:#woood #
13:#woooooood #

End of line "$"

  • To find lines ending with. you need to use escape characters
[root@localhost ~]# grep -n '\.$' test.txt 
1:he was short and fat.
2:He was wearing a blue polo shirt with black pants.
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
16:Misfortunes never come alone/single.
17:I shouldn't have lett so tast.
  • Find blank lines
[root@localhost ~]# grep -n '^$' test.txt 
10:
11:

Any character "..."

  • Find a string with two characters between w and d
[root@localhost ~]# grep -n 'w..d' test.txt 
5:google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words

Repetitive character "*"

  • Find strings of at least two or more o, * representing zero or more repetitions of the previous character
[root@localhost ~]# grep -n 'ooo*' test.txt 
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#woood #
13:#woooooood #
15:I bet this place is really spooky late at night!
  • Find two strings with at least one o between w and d
[root@localhost ~]# grep -n 'woo*d' test.txt 
8:a wood cross!
12:#woood #
13:#woooooood #
  • Find dispensable strings between w and d
[root@localhost ~]# grep -n 'w.*d' test.txt 
1:he was short and fat.
5:google is the best tools for search keyword.
8:a wood cross!
9:Actions speak louder than words
12:#woood #
13:#woooooood #
  • Find any number
[root@localhost ~]# grep -n '[0-9][0-9]*' test.txt 
4:the tongue is boneless but it breaks bones.12!
7:PI=3.141592653589793238462643383249901429

Continuous character range "{}"

  • Looking for strings with two consecutive o's requires escaping
[root@localhost ~]# grep -n 'o\{2\}' test.txt 
3:The home of Football on BBC Sport online.
5:google is the best tools for search keyword.
8:a wood cross!
12:#woood #
13:#woooooood #
15:I bet this place is really spooky late at night!
  • Find strings with 2-5 o between w and d
[root@localhost ~]# grep -n 'wo\{2,5\}d' test.txt 
8:a wood cross!
12:#woood #
  • Find strings with more than two o between w and d
[root@localhost ~]# grep -n 'wo\{2,\}d' test.txt 
8:a wood cross!
12:#woood #
13:#woooooood #

Extended regular expressions

  • Common metacharacters in extended regular expressions

+ Repeat one or more previous characters
?: The first character of zero or one
| Find multiple characters using or
(): Find a group of strings
()+: Identify multiple repetitive groups

  • Using rgrep as an example, query a string containing more than one o between w and d
[root@localhost ~]# egrep -n 'wo+d' test.txt 
8:a wood cross!
12:#woood #
13:#woooooood #
  • Query bet and best strings
[root@localhost ~]# egrep -n 'bes?t' test.txt 
5:google is the best tools for search keyword.
15:I bet this place is really spooky late at night!
  • Query for of or if or on strings
[root@localhost ~]# egrep -n 'of|is|on' test.txt 
3:The home of Football on BBC Sport online.
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword.
6:The year ahead will test our political establishment to the limit.
9:Actions speak louder than words
15:I bet this place is really spooky late at night!
16:Misfortunes never come alone/single.
  • Query task or test string
[root@localhost ~]# egrep -n 't(a|e)st' test.txt 
6:The year ahead will test our political establishment to the limit.
17:I shouldn't have lett so tast.
  • A at the beginning of the query ends in C, with more than one xyz string in the middle.
[root@localhost ~]# egrep -n 'A(xyz)+C' test.txt 
14:AxyzxyzxyzxyzC

Posted by cfemocha on Mon, 14 Oct 2019 17:45:41 -0700