Three Swordsmen of Regular Expressions in Shell Programming-awk Tool

Keywords: Linux shell Unix

Overview of awk

In Linux/UNIX system, awk is a powerful editing tool. It reads input text line by line, searches according to the specified matching mode, formats and outputs qualified content or filters it. It can realize quite complex text operation without interaction. It is widely used in Shell scripts to complete various automatic configuration tasks.

1. Common usage of awk

Usually, the command format used by awk is as follows, where single quotation marks plus braces "{}" are used to set the processing actions for data. Awk can process the target file directly, or it can process the target file by reading the script "-f".

awk option'mode or condition {edit instructions}'file 1 file 2// filter and output the contents of file character conditions

Awk-f script file file 1 file 2// Call editing instructions from script, filter and output content

As mentioned earlier, the sed command is often used for processing an entire line, while awk prefers to divide a line into multiple "fields" and then process it, and by default the field separator is a space or tab key. The result of awk execution can be printed and displayed by print function. In the process of using awk command, we can use the logical operator "&", "and", "or", "and", "or", "and"." It can also carry out simple mathematical operations, such as +, -,*, /,%, ^ for addition, subtraction, multiplication, division, redundancy and multiplication, respectively.

In Linux system, / etc/passwd is a very typical formatted file. Each field is separated by ":" as a separator. Most of the log files in Linux system are formatted files. Extracting relevant information from these files is one of the daily work of operation and maintenance. If you need to find out the user name, user ID, group ID and other columns of / etc/passwd, execute the following awk command.

awk -F ':' '{print $1,$3,$4}' /etc/passwd

[root@localhost ~]# awk -F : '{print $1,$2,$3}' /etc/passwd
root x 0
bin x 1
daemon x 2
adm x 3

Awk reads information from input file or standard input, and like sed, information is read line by line. The difference is that awk treats a line in a text file as a record and a part (column) in a row as a field (field) in the record. To manipulate these different fields, awk borrows a location variable-like approach in the shell to represent the different fields in rows (records) in order of $1, $2, $3_. In addition, awk represents the entire line (record) with $0. Different fields are separated by specified characters. The default separator for awk is a space. Awk allows delimiters to be specified on the command line in the form of "-F delimiters". Therefore, in the above example, the awk command processes the / etc/passwd file as shown in the figure.

awk contains several special built-in variables (available directly) as follows: FS: Specifies a field separator for each line of text, defaulting to a space or tab.

NF: Number of fields in rows currently processed.
NR: The line number (ordinal number) of the row being processed.
$0: The entire line of the row being processed.
$n: The nth field (column n) of the current processing row.
FILENAME: The name of the file being processed.
RS: Data records are separated by default to n, which means one record per action.

2. Examples of usage

1) Output text by line

awk '{print}' test.txt //Output of all content, equivalent to cat test.txt
awk '{print $0}' test.txt //Output of all content, equivalent to cat test.txt
awk 'NR==1,NR==3{print}' test.txt //Output lines 1-3
awk '(NR>=1)&&(NR<=3){print}' test.txt //Output lines 1-3
awk 'NR==1||NR==3{print}' test.txt //Output lines 1 and 3
awk '(NR%2)==1{print}' test.txt //Output the contents of all odd lines
awk '(NR%2)==0{print}' test.txt //Output the contents of all even lines
awk '/^root/{print}' /etc/passwd //Output lines starting with root
awk '/nologin$/{print}' /etc/passwd//Output lines ending with nologin
awk 'BEGIN {x=0} ; /\/bin\/bash$/{x++};END {print x}' /etc/passwd
//Statistically, the number of rows ending in / bin/bash is equivalent to grep - C "/ bin/bash $"/ etc / passwd
awk 'BEGIN{RS=""};END{print NR}' /etc/squid/squid.conf
//Number of text paragraphs separated by blank lines

2) Output text by field

awk '{print $3}' test.txt //Output the third field in each row (separated by spaces or tabs)
awk '{print $1,$3}' test.txt //Output the first and third fields in each row

[root@localhost ~]# awk '{print $3}' test.txt.bak 

best

cross

[root@localhost ~]# awk '{print $1,$3}' test.txt.bak 
the 
you best
PI=3.1415926535897

awk -F ":" '$2==""{print}' /etc/shadow //shadow record of user whose password is empty
awk 'BEGIN {FS=":"}; $2==""{print}' /etc/shadow //shadow record of user whose password is empty

[root@localhost ~]# awk -F ":" '$2==""{print}' /etc/shadow
test::18179:0:99999:7:::
[root@localhost ~]# awk 'BEGIN {FS=":"}; $2==""{print}' /etc/shadow
test::18179:0:99999:7:::

awk -F ":" '$7~"/bash"{print $1}' /etc/passwd
//The output is colon-separated and the first field of the row containing / bash in the seventh field
awk '($1~"nfs")&&(NF==8){print $1,$2}' /etc/services
//The output contains eight fields and the first field contains the first and second fields of rows with nfs
awk -F ":" '($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd
//Output field 7 is neither / bin/bash nor all rows of / sbin/nologin

[root@localhost ~]# awk -F ":" '$7~"/bash"{print $1}' /etc/passwd
root
test1
lisi
tom
test
[root@localhost ~]# awk '($1~"nfs")&&(NF==8){print $1,$2}' /etc/services
nfs 2049/tcp
nfs 2049/udp
nfs 2049/sctp
[root@localhost ~]# awk -F ":" '($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

3) Call shell commands through pipes with double quotation marks

awk -F: '/bash$/{print | "wc -l"}' /etc/passwd
//Calling the wc-l command counts the number of users using bash, which is equivalent to grep-c "bash$"/etc/passwd
awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}}'
//Call the w command and use it to count the number of online users
awk 'BEGIN { "hostname" | getline ; print $0}'
//Call hostname and output the current hostname

[root@localhost ~]# awk -F: '/bash$/{print | "wc -l"}' /etc/passwd
5
[root@localhost ~]# awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}}'
1
[root@localhost ~]# awk 'BEGIN { "hostname" | getline ; print $0}'
localhost.localdomain

Thank you for reading!!!

Posted by quecoder on Thu, 10 Oct 2019 02:14:45 -0700