1. What is the Linux three swordsman
First of all, we want to know what the Linux three swordsmen are?
- The first tool is grep, which performs pattern pattern in each file or matching line, that is, searches the content according to the regular expression and prints it out
- The second tool is awk, which is abbreviated by three authors (aho, Kernighan, Weinberger). It can process the segments according to the located data row.
- The third tool is sed, which is an introduction to the stream editor. It can filter text according to the input content and add, delete, modify and query the filtered data.
These three tools are used in combination to deal with many data analysis scenarios in the Shell, so people collectively call these three tools Linux three swordsmen.
2. What's the use of Linux three swordsmen
Next, we will compare the three swordsmen with SQL to see what they can do?
- grep is equivalent to SQL select * from table. It can search and locate data.
- awk is equivalent to SQL select columns from table, which can slice data.
- sed is equivalent to SQL SELECT columns from table where columns = XX, update table set columns=new where columns=old. It can perform conditional query or modification of data
You can find that grep and awk can be used in combination to find and segment data. Grep can also be used in combination with sed to find and modify data. They can also be used together to complete a series of operations, which is equivalent to map reduce in big data processing, Let's look at how to use them.
3. How to use Linux three swordsmen
We first create a file that contains three pieces of data, and then take the contents of the file as a demonstration operation
#Method 1: VIM test.txt (create a new file in the current directory and enter the following three pieces of data) hello from hogwarts hello from sevenriby hello from testerhome # Method 2: use echo command input, where - e parameter means to enable the interpretation function of escape character echo -e 'hello from hogwarts\nhello from sevenriby\nhello from te sterhome' > test.txt # -------------------------Check------------------------- #rosaany@Rosefinch:~$ cat test.txt #hello from hogwarts #hello from sevenriby #hello from testerhome
3.1 grep
grep is used to find the relevant content and print the corresponding data according to the regular expression pattern.
- Find out if the hogwarts word is in the file
rosaany@Rosefinch:~$ grep hogwarts test.txt hello from hogwarts # Match to the hogwarts word and output a whole line
- Find out if the word beginning with hello is in the file
rosaany@Rosefinch:~$ grep '^he' test.txt hello from hogwarts hello from sevenriby hello from testerhome
- Find out if words containing i or y letters are in the file
rosaany@Rosefinch:~$ grep '[iy]' test.txt hello from sevenriby
- Add the - v parameter to filter out the matching content
rosaany@Rosefinch:~$ grep -v '[iy]' test.txt hello from hogwarts hello from testerhome
- -The o parameter indicates that only matching data is printed
rosaany@Rosefinch:~$ grep -o '[iy]' test.txt i y
- -The E parameter indicates that extended regular expressions are supported
grep here (pattern pattern pattern) regular expressions are divided into two categories. The first category is called basic expressions, which include typical regular identifiers.
- ^Indicates the beginning;
- $indicates the end;
- [] represents any matching item;
- *Represents 0 or more;
- . represents any character
The second type is extended expression, which makes some extensions on the basic expression to support higher-level syntax and more complex conditions.
- ? Indicates non greedy matching;
- +Represents one or more;
- () indicates grouping;
- {} represents a range constraint;
- |Represents any one that matches multiple expressions
- Add the - E parameter to find out whether a word including seven or home is in the file
rosaany@Rosefinch:~$ grep -E "(seven|home)" test.txt hello from sevenriby hello from testerhome
- Without adding the - E parameter, find out whether a word including seven or home is in the file
rosaany@Rosefinch:~$ grep "\(seven\|home\)" test.txt hello from sevenriby hello from testerhome # --------The same effect can be achieved by adding the \ escape character to transfer the matching conditions.
3.2 awk
awk is a language parsing engine. It is very powerful and has complete programming characteristics. It can execute commands, network requests and so on.
Next, let's look at the syntax of awk and the related knowledge of awk 'pattern{action}'. Pattern is the matching condition, and action represents the specific processing to be done.
- Using double / to represent a regular match, the effect is the same as the previous grep
rosaany@Rosefinch:~$ awk "/(seven|home)/" test.txt hello from sevenriby hello from testerhome
pettern syntax can replace grep to some extent, but it is not concise
- Find row 3 data
rosaany@Rosefinch:~$ awk 'NR>=3' test.txt hello from testerhome # The NR parameter represents the number of records
pattern has a very rich grammar. You can practice it yourself after class. At the same time, awk also has several standard built-in variables.
- FS indicates the field separator
- OFS represents the field separator of the output data
- RS indicates the record separator`
- ORS represents the row separator of the output field
- NF indicates the number of fields
- NR indicates the number of records
- Find the number of file records and fields
rosaany@Rosefinch:~$ awk '{print NR,NF}' test.txt 1 3 2 3 3 3 # The default space is used as the separator. The first line has 3 fields, the second line has 3 fields, and the third line has 3 fields
- Find the number of file records and fields (specify the separator o, and use the - F parameter)
rosaany@Rosefinch:~$ awk -Fo '{print NR,NF,$1,$2,$3,$4}' test.txt 1 4 hell fr m h gwarts 2 3 hell fr m sevenriby 3 4 hell fr m testerh me # $1~$n output corresponding records
- You can also use the BEGIN directive delimiter
rosaany@Rosefinch:~$ awk 'BEGIN{FS="o"}{print NR,NF,$1,$2,$3,$4}' test.txt 1 4 hell fr m h gwarts 2 3 hell fr m sevenriby 3 4 hell fr m testerh me # FS variable specifies the delimiter
- Specify field separator for output data|
rosaany@Rosefinch:~$ awk 'BEGIN{OFS="|"}{print NR,NF,$1,$2,$3,$4}'test.txt 1|3|hello|from|hogwarts| 2|3|hello|from|sevenriby| 3|3|hello|from|testerhome|
- The output data specifies the field separator |, which is directly specified by OFS
rosaany@Rosefinch:~$ awk 'OFS="|"{print NR,NF,$1,$2,$3,$4}' test.txt 1|3|hello|from|hogwarts| 2|3|hello|from|sevenriby| 3|3|hello|from|testerhome|
- awk also supports simple arithmetic functions
rosaany@Rosefinch:~$ awk 'BEGIN{print 10/3}' 3.33333
In addition to these, awk also supports dictionaries to count some features and data. It is similar to Java hash and Python dictionaries. Awk's syntax is very flexible. I hope you can print out the document and read it carefully after class. It can help you be more handy in data analysis in the future.
3.3 sed
The specific common methods of sed are as follows:
- sed[addr]X[options], where [] defines a range, X bit is the specific operation, and options represents the options for data modification.
- -e means that an expression can be specified.
- sed -n '2p' 2 means to print the data of the second line
- s means find and replace
- -i means to modify the source file directly
- -E supports extended expressions.
- Use s to find the previous content and replace with the following content
rosaany@Rosefinch:~$ sed 's#test#testing#' test.txt hello from hogwarts hello from sevenriby hello from testingerhome # '#’Either '/' or '/' can represent a separator # testerhome becomes testingerhome
- Replace the three characters beginning with t with xxx
rosaany@Rosefinch:~$ sed 's/t../xxx/g' test.txt hello from hogwarts hello from sevenriby hello from xxxxxxhome # s/../../g indicates global replacement # testerhome becomes xxxxxxhome
If you want to give a specific number of lines or range, replace and modify it through regular matching
rosaany@Rosefinch:~$ sed '3,$ s/t../xxx/g' test.txt # Directly specify the range ($indicates to the last line) hello from hogwarts hello from sevenriby hello from xxxxxxhome
- Delete specified row
rosaany@Rosefinch:~$ sed '1 d' test.txt hello from sevenriby hello from testerhome
awk focuses more on data extraction, while sed focuses more on data modification. The important role of SED is to complete data addition, deletion, modification and query, such as:
- d is deleted
- p is print
- s is find and replace
- \1 \ 2 grouping processing can be performed according to the matching data
4. Pipeline combination
Pipe symbol |, which means that the output of the previous instruction will automatically become the input of the next instruction in the shell.
- Given a specific number of lines or range, it can be replaced and modified by regular matching
rosaany@Rosefinch:~$ awk 'NF<2' test.txt | sed 's/t../xxx/g' rosaany@Rosefinch:~$ awk 'NF>2' test.txt | sed 's/t../xxx/g' hello from hogwarts hello from sevenriby hello from xxxxxxhome
- Combination of grep, awk and sed
rosaany@Rosefinch:~$ cat test.txt | grep hogwarts | awk '{print $3}' | sed 's/h../xxx/g' # -----------------Output----------------- xxxwarts # Enter the cat command to specify the output file, then grep only keeps the line where hogwarts is located, then awk prints the third field, and finally sed replaces the three characters beginning with h with xxx.
Through the pipeline, we can easily bring the functions of Linux three swordsmen to a new level. With the pipeline, many operations become very simple and easy to handle. Through the combination of pipeline and three swordsmen, we can achieve very good results. It can help us deal with some complex data processing work and improve our work efficiency.
Reference articles
Partial quotation: 46 lectures on core technology of test and development -- three swordsmen of Linux, from lague Education
Reference: man sed, man grep, man awk