Text processing tool -- grep

Keywords: Linux regex grep

1, Text processing tool grep

Linux has the tools grep, sed and awk called the three swordsmen of text processing. In this paper, grep

1. Brief introduction

grep text filtering tool, fully known as Global search REgrlat expression and Print out the line. You can match the content between large sections of text according to the specified filter conditions or patterns, and then output it. Patterns are filter conditions written in regular expressions.

2. Command usage:

Standard format: grep [OPTIONS] PATTERN [FILE...]

option common optionsOption meaning
-EUsing extended regular expressions
-GThe basic regular expression is used, but the default is to use the basic regular expression
-FUse fixed string
-PRegular expressions using perl
-iIgnore character case
-oShow only matching strings
-qsilent mode
-A #Display the matched line and the next # line
-B #Display the matched line and the first # line
-C #Display the matched lines and the front and back # lines
-vNegative means to display unmatched rows
–color=autoColor the matched text

2, Regular expression:

1. What is a regular expression

A pattern written by some special characters and text characters, some of which do not represent the literal meaning of the characters, but represent the function of control or general configuration. It is divided into basic regular expression and extended regular table

2. Basic regular expression: BRE

Character matchingThe meaning of characters
.Match any single character
[ ]Matches any single character within the specified range
[^ ]Any single character outside the matching range
[:alnum:]Numbers and letters
[:alpha:]Any case letter
[:blank:]
[:cntrl:]Control character
[:digit:]Any number
[:graph:]graphical
[:lower:]Any lowercase letter
[:print:]Printable character
[:punct:]punctuation
[:space:]Space
[:upper:]Any capital letter
[:xdigit:]Hexadecimal character
Times matchingUsed after the character to specify the number of times
*Any time
+At least once, at most infinite times
?not essential
{m}Exact m times
{m,n}At least m, at most n
{m,}At least m, unlimited
Position anchoringsignificance
^Row head anchoring
$End of line anchoring
< or \ bInitial anchoring
>Or \ bSuffix anchoring
groupingsignificance
\( \)Treat any character as a whole

The meaning of grouping is that backward references can be used, and \ 1 and \ 2 can be used to refer to the first group or the second group\ 1 reference the first group. The scope of the first group starts from the left
The first parenthesis and its corresponding contents in parentheses.

3. Extended regular expression: ERE

Character matchingMeaning (consistent with basic regular expressions)
.Match any single character
[ ]Matches any single character within the specified range
[^ ]Any single character outside the matching range
[:alnum:]Numbers and letters
[:alpha:]Any case letter
[:blank:]
[:cntrl:]Control character
[:digit:]Any number
[:graph:]graphical
[:lower:]Any lowercase letter
[:print:]Printable character
[:punct:]punctuation
[:space:]Space
[:upper:]Any capital letter
[:xdigit:]Hexadecimal character
Times matchingCompared with the basic regular expression, the meaning of \ isomitted.
*Any time
+At least once, at most infinite times
?not essential
{m}Exact m times
{m,n}At least m, at most n
{m,}At least m, unlimited
Position anchoringMeaning (consistent with basic regular expressions)
^Row head anchoring
$End of line anchoring
< or \ bInitial anchoring
>Or \ bSuffix anchoring
groupingMeaning (no backslash required)
( )Treat any character as a whole;
|: or, escape and parentheses are required in the basic regular expression;

3, Exercise:

  1. Display lines starting with case s in / proc/meminfo file in two ways;
[root@node1 ~]# grep -i ^s /proc/meminfo
[root@node1 ~]# grep ^[Ss] /proc/meminfo
[root@node1 ~]# grep -v '^[^Ss]' /proc/meminfo
  1. Display lines in the / etc/passwd file that do not end in / bin/bash;
[root@node1 ~]# grep -v '\(/bin/bash\)$' /etc/passwd
  1. Display the user name of the user with the largest id in the / etc/passwd file;
[root@node1 ~]# sort -t: -nk3 /etc/passwd | cut -d: -f1 | tail -1
  1. If the user root exists, the default shell program is displayed
[root@node1 ~]# id root &> /dev/null && grep '^root\>' /etc/passwd | cut -d: -f7 || echo 'no such user'
  1. Find the two or three digits in the / etc/passwd file
[root@node1 ~]# grep '\<[0-9]\{2,3\}\>' /etc/passwd
[root@node1 ~]# grep '\<[[:digit:]]\{2,3\}\>' /etc/passwd
[root@node1 ~]# egrep '\<[0-9]{2,3}\>' /etc/passwd
[root@node1 ~]# egrep '\<[[:digit:]]{2,3}\>' /etc/passwd
  1. Display the lines in the / etc/rc.d/rc.sysinit file that begin with at least one white space character and are followed by non white space characters;
[root@node1 ~]# grep '^[[:space:]]\+[^[:space:]]\+' /etc/rc.d/rc.sysinit
[root@node1 ~]# egrep '^[[:space:]]+[^[:space:]]+' /etc/rc.d/rc.sysinit
  1. Find the line ending with listen followed by 0 or 1 or more white space characters in the result of netstat -tan command;
[root@node1 ~]# netstat -tan | grep 'LISTEN[[:space:]]*$'
[root@node1 ~]# netstat -tan | egrep 'LISTEN[[:space:]]*$'
  1. Add users bash,testbash,basher and nologin, and require their shell to be / sbin/nologin /, and then find the user whose user name is shell name in / etc/passwd file;
[root@node1 ~]# usermod -s /bin/nologin nologin
[root@node1 ~]# grep '^\([[:alnum:]]\+\>\).*\1$' /etc/passwd
[root@node1 ~]# egrep '^([[:alnum:]]+\>).*\1$' /etc/passwd
  1. Displays the default shell and uid of root, centos, or user1 users on the current system
[root@node1 ~]# grep '^\(root\|centos\|user1\)\>' /etc/passwd | cut -d: -f3,7
[root@node1 ~]# egrep '^(root|centos|user1)\>' /etc/passwd | cut -d: -f3,7
  1. Find the line in the / etc/rc.d/init.d/functions / file where a word is followed by a parenthesis
[root@node1 ~]# grep '[[:alpha:]]\+\>()' /etc/rc.d/init.d/functions
[root@node1 ~]# egrep '[[:alpha:]]+\>\(\)' /etc/rc.d/init.d/functions
  1. Use echo to output a path, and use egrep to get the base name;
[root@node1 ~]# echo '/etc/sysconfig/network' | grep '[[:alnum:]]\+/\?$'
[root@node1 ~]# echo '/etc/sysconfig/network/' | egrep '[[:alnum:]]+/?$'
  1. Use echo to output a path, and use egrep to get the directory name
1.[root@node1 ~]# echo  "/etc/rc.d/init.d/functions/" | grep -Eo ".*[^/]" | grep -Eo ".*/"
2.[root@node1 ~]# echo  "/etc/rc.d/init.d/functions/" | grep -Eo ".*\<" 
I haven't worked out question 12 for a long time, so I found a solution online. Let's briefly talk about the idea of the first sentence:
1 Idea: the hardest thing about taking a pathname is the slash at the end. I don't know how to deal with it. Statement 1 skillfully uses two grep sentence,---\
First, the first sentence grep Based on greedy model.*Match the full path to, but[^/]The slash is removed again, so that the given path can be guaranteed---\
How, there must be no slash at the end of the result, and then enter the second time grep,Again, based on greedy mode, use.*Match all the contents until the slash, and the ones after the slash do not match.
2 Idea: without thorough understanding, I can only understand any character+Initial anchoring.
  1. Find the value between 1 and 255 in the ifconfig command result;
[root@node1 ~]# ifconfig | egrep '\<(1?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\>'
  1. Find the ip address in the ifconfig command result;
[root@node1 ~]# ifconfig | egrep -o '((25[0-5]|2[0-4][0-9]|1?[0-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1?[0-9]?[0-9])'

Posted by rocket on Thu, 11 Nov 2021 11:31:39 -0800