Text processing tool -- grep

1, Text processing tool grep

Linux has the tools grep, sed and awk called the three swordsmen of text processing. In this paper, grep

1. Brief introduction

grep text filtering tool, fully known as Global search REgrlat expression and Print out the line. You can match the content between large sections of text according to the specified filter conditions or patterns, and then output it. Patterns are filter conditions written in regular expressions.

2. Command usage:

Standard format: grep [OPTIONS] PATTERN [FILE...]

option common options	Option meaning
-E	Using extended regular expressions
-G	The basic regular expression is used, but the default is to use the basic regular expression
-F	Use fixed string
-P	Regular expressions using perl
-i	Ignore character case
-o	Show only matching strings
-q	silent mode
-A #	Display the matched line and the next # line
-B #	Display the matched line and the first # line
-C #	Display the matched lines and the front and back # lines
-v	Negative means to display unmatched rows
–color=auto	Color the matched text

2, Regular expression:

1. What is a regular expression

A pattern written by some special characters and text characters, some of which do not represent the literal meaning of the characters, but represent the function of control or general configuration. It is divided into basic regular expression and extended regular table

2. Basic regular expression: BRE

Character matching	The meaning of characters
.	Match any single character
[ ]	Matches any single character within the specified range
[^ ]	Any single character outside the matching range
[:alnum:]	Numbers and letters
[:alpha:]	Any case letter
[:blank:]
[:cntrl:]	Control character
[:digit:]	Any number
[:graph:]	graphical
[:lower:]	Any lowercase letter
[:print:]	Printable character
[:punct:]	punctuation
[:space:]	Space
[:upper:]	Any capital letter
[:xdigit:]	Hexadecimal character

Times matching	Used after the character to specify the number of times
*	Any time
+	At least once, at most infinite times
?	not essential
{m}	Exact m times
{m,n}	At least m, at most n
{m,}	At least m, unlimited

Position anchoring	significance
^	Row head anchoring
$	End of line anchoring
< or \ b	Initial anchoring
>Or \ b	Suffix anchoring

grouping	significance
	Treat any character as a whole

The meaning of grouping is that backward references can be used, and \ 1 and \ 2 can be used to refer to the first group or the second group\ 1 reference the first group. The scope of the first group starts from the left
The first parenthesis and its corresponding contents in parentheses.

3. Extended regular expression: ERE

Character matching	Meaning (consistent with basic regular expressions)
.	Match any single character
[ ]	Matches any single character within the specified range
[^ ]	Any single character outside the matching range
[:alnum:]	Numbers and letters
[:alpha:]	Any case letter
[:blank:]
[:cntrl:]	Control character
[:digit:]	Any number
[:graph:]	graphical
[:lower:]	Any lowercase letter
[:print:]	Printable character
[:punct:]	punctuation
[:space:]	Space
[:upper:]	Any capital letter
[:xdigit:]	Hexadecimal character

Times matching	Compared with the basic regular expression, the meaning of \ isomitted.
*	Any time
+	At least once, at most infinite times
?	not essential
{m}	Exact m times
{m,n}	At least m, at most n
{m,}	At least m, unlimited

Position anchoring	Meaning (consistent with basic regular expressions)
^	Row head anchoring
$	End of line anchoring
< or \ b	Initial anchoring
>Or \ b	Suffix anchoring

grouping	Meaning (no backslash required)
( )	Treat any character as a whole;
\|	: or, escape and parentheses are required in the basic regular expression;

3, Exercise:

Display lines starting with case s in / proc/meminfo file in two ways;

[root@node1 ~]# grep -i ^s /proc/meminfo
[root@node1 ~]# grep ^[Ss] /proc/meminfo
[root@node1 ~]# grep -v '^[^Ss]' /proc/meminfo

Display lines in the / etc/passwd file that do not end in / bin/bash;

[root@node1 ~]# grep -v '\(/bin/bash\)$' /etc/passwd

Display the user name of the user with the largest id in the / etc/passwd file;

[root@node1 ~]# sort -t: -nk3 /etc/passwd | cut -d: -f1 | tail -1

If the user root exists, the default shell program is displayed

[root@node1 ~]# id root &> /dev/null && grep '^root\>' /etc/passwd | cut -d: -f7 || echo 'no such user'

Find the two or three digits in the / etc/passwd file

[root@node1 ~]# grep '\<[0-9]\{2,3\}\>' /etc/passwd
[root@node1 ~]# grep '\<[[:digit:]]\{2,3\}\>' /etc/passwd
[root@node1 ~]# egrep '\<[0-9]{2,3}\>' /etc/passwd
[root@node1 ~]# egrep '\<[[:digit:]]{2,3}\>' /etc/passwd

Display the lines in the / etc/rc.d/rc.sysinit file that begin with at least one white space character and are followed by non white space characters;

[root@node1 ~]# grep '^[[:space:]]\+[^[:space:]]\+' /etc/rc.d/rc.sysinit
[root@node1 ~]# egrep '^[[:space:]]+[^[:space:]]+' /etc/rc.d/rc.sysinit

Find the line ending with listen followed by 0 or 1 or more white space characters in the result of netstat -tan command;

[root@node1 ~]# netstat -tan | grep 'LISTEN[[:space:]]*$'
[root@node1 ~]# netstat -tan | egrep 'LISTEN[[:space:]]*$'

Add users bash,testbash,basher and nologin, and require their shell to be / sbin/nologin /, and then find the user whose user name is shell name in / etc/passwd file;

[root@node1 ~]# usermod -s /bin/nologin nologin
[root@node1 ~]# grep '^\([[:alnum:]]\+\>\).*\1$' /etc/passwd
[root@node1 ~]# egrep '^([[:alnum:]]+\>).*\1$' /etc/passwd

Displays the default shell and uid of root, centos, or user1 users on the current system

[root@node1 ~]# grep '^\(root\|centos\|user1\)\>' /etc/passwd | cut -d: -f3,7
[root@node1 ~]# egrep '^(root|centos|user1)\>' /etc/passwd | cut -d: -f3,7

Find the line in the / etc/rc.d/init.d/functions / file where a word is followed by a parenthesis

[root@node1 ~]# grep '[[:alpha:]]\+\>()' /etc/rc.d/init.d/functions
[root@node1 ~]# egrep '[[:alpha:]]+\>\(\)' /etc/rc.d/init.d/functions

Use echo to output a path, and use egrep to get the base name;

[root@node1 ~]# echo '/etc/sysconfig/network' | grep '[[:alnum:]]\+/\?$'
[root@node1 ~]# echo '/etc/sysconfig/network/' | egrep '[[:alnum:]]+/?$'

Use echo to output a path, and use egrep to get the directory name

1.[root@node1 ~]# echo  "/etc/rc.d/init.d/functions/" | grep -Eo ".*[^/]" | grep -Eo ".*/"
2.[root@node1 ~]# echo  "/etc/rc.d/init.d/functions/" | grep -Eo ".*\<" 
I haven't worked out question 12 for a long time, so I found a solution online. Let's briefly talk about the idea of the first sentence:
1 Idea: the hardest thing about taking a pathname is the slash at the end. I don't know how to deal with it. Statement 1 skillfully uses two grep sentence,---\
First, the first sentence grep Based on greedy model.*Match the full path to, but[^/]The slash is removed again, so that the given path can be guaranteed---\
How, there must be no slash at the end of the result, and then enter the second time grep，Again, based on greedy mode, use.*Match all the contents until the slash, and the ones after the slash do not match.
2 Idea: without thorough understanding, I can only understand any character+Initial anchoring.

Find the value between 1 and 255 in the ifconfig command result;

[root@node1 ~]# ifconfig | egrep '\<(1?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])\>'

Find the ip address in the ifconfig command result;

[root@node1 ~]# ifconfig | egrep -o '((25[0-5]|2[0-4][0-9]|1?[0-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1?[0-9]?[0-9])'

Posted by rocket on Thu, 11 Nov 2021 11:31:39 -0800

Programmer Group