Detailed explanation of string processing instruction sed,cut

cut
- Syntax cut option... [file]
- understand
  
  It can be understood as a split function.
handle

Divide by row as a table, and process the table by column. It is then treated as an array, starting with 1, and accessed by subscript.
input
- file
  
  filename
Standard input

Redirect or-
Subscript selection
- n
Equivalent to
```
string.splite(regex)[n]
```
- n0,n1,n2...
The subscripts are sorted first, then de duplicated, and then output traversal.
- n-m
Output in range, including relationships.
- -n
From 1 to n
- n-
From n to the end, I'm not afraid to cross the border. Crossing the border is empty.
option
- -b byte-list|--bytes=byte-list
  
  Process by byte
-c character-list|--characters=character-list

Process by character
-f field-list|--fields=field-list

Process multiple words after segmentation according to the custom separator.
-d input_delim_byte |-d input_delim_byte

Usually used with f to declare delimiters.
--output-delimiter=output_delim_string
The separator of the output. The - d separator is used by default.
```
[root@localhost sed]# echo "hello world" | cut -d ' ' -f1- --output-delimiter='----'
hello----world
```

--complement

The output is complete, not incomplete. Do not output if the range is selected.

[root@localhost sed]# ps aux | grep "sshd" | cut --complement -f2- -d ' '
root
root
root
root
[root@localhost sed]# ps aux | grep "sshd" | cut --complement -b2-
r
r
r
r
[root@localhost sed]# ps aux | grep "sshd" | cut --complement -b4-
roo
roo
roo
roo
[root@localhost sed]# ps aux | grep "sshd" | cut --complement -b5-
root
root
root
root
[root@localhost sed]# ps aux | grep "sshd" | cut --complement -b8-
root
root
root
root
[root@localhost sed]# ps aux | grep "sshd" | cut --complement -b5-15
root 0.0  0.4 112924  4296 ?        Ss   Jun11   0:00 /usr/sbin/sshd -D
root 0.0  0.6 163292  6120 ?        Ss   Jun11   0:01 sshd: root@pts/0
root 0.0  0.6 163292  6116 ?        Ss   06:19   0:00 sshd: root@pts/1
root 0.0  0.1 112816   960 pts/1    S+   07:55   0:00 grep --color=auto sshd
[root@localhost sed]# ps aux | grep "sshd" | cut --complement -b120-
root      1028  0.0  0.4 112924  4296 ?        Ss   Jun11   0:00 /usr/sbin/sshd -D
root      1434  0.0  0.6 163292  6120 ?        Ss   Jun11   0:01 sshd: root@pts/0
root      1974  0.0  0.6 163292  6116 ?        Ss   06:19   0:00 sshd: root@pts/1
root      2135  0.0  0.1 112816   956 pts/1    S+   07:56   0:00 grep --color=auto sshd

sed
- brief introduction
  
  Stream editor, stream editor. Can be used to process character and byte streams.
  
  Generally used to handle string substitution.
working principle
- Both data buffers are empty at the beginning. Processing data is also processing line by line.
- One is the read data area and the other is the secondary storage area. Secondary storage is generally useless. Some instructions will be used.
- One cycle
  - Reads a line of data from a byte stream, removing the trailing line breaks. Save it. Put it in the data buffer.
  - Then execute the instruction, each instruction has a line address information. That is, the scope of the respective rows processed.
  - Row is one of the important conditions for filtering, and only when it is satisfied can it be executed. For example, if only the first line is processed, the unqualified line numbers of other lines will be ignored and instructions will not be executed.
  - After all instructions are executed, if - n is not declared, the result will be output to standard output, and the previously deleted line bits will be added.
  - After execution, start from the first step. After each processing, the data in the data area will be cleared, that is, the last read row of data, and the data buffer will be retained. Of course, some instructions can request not to clear.

Most of the instructions that process multiple lines at the same time are in uppercase. Unlike lowercase, they can process multiple lines.

`N`

Add a row of data to the data area, that is, process two pieces of data at one time, and the line break in the middle is also added. If it is not read completely, exit directly. No processing

# All data can be processed in one cycle through a dead cycle. It can be understood in combination with the above cycle explanation.
[root@localhost sed]# sed -e 'H;s/\n/----/' number.txt
1
2
3
4
5
6
7
8
9
10
[root@localhost sed]# sed -e ':label;N;s/\n/ /;s/ 2$/22222-----/;b label' number.txt
122222----- 3 4 5 6 7 8 9 10
[root@localhost sed]# sed -e 'N;N;s/\n/ /;' number.txt
1 2
3
4 5
6
7 8
9
10
[root@localhost sed]# sed -e 'N;N;s/\n/ /g;' number.txt
1 2 3
4 5 6
7 8 9
10
[root@localhost sed]# sed -e 'N;N;s/\n/ /g;' number.txt
1 2 3
4 5 6
7 8 9
10
[root@localhost sed]# sed -n -e 'N;N;s/\n/ /gp;' number.txt
1 2 3
4 5 6
7 8 9

D

Clean up the contents of the data area and start the next cycle.

#The last one because there is no
[root@localhost sed]# sed -e 'N;N;s/\n/ /g;' number.txt
1 2 3
4 5 6
7 8 9
10
[root@localhost sed]# sed -e 'N;N;s/\n/ /g;D;' number.txt
10

input
- File input
```
sed SCRIPT INPUTFILE INPUTFILE... 
```
redirect
```
echo hello | sed SCRIPT -
```

Command options

sed OPTIONS... [SCRIPT] [INPUTFILE...]
-n|--quiet|--silent

Do not output unless explicitly declared by the instruction.
[root@localhost sed]# sed -e "2,5d" number.txt
1
6
7
8
9
10
[root@localhost sed]# sed -n -e "2,5d" -e "1,5p" number.txt
1
[root@localhost sed]# sed -n -e "2,4d" -e "1,5p" number.txt
1
5
[root@localhost sed]# cat number.txt
1
2
3
4
5
6
7
8
9
10

-e script|--expression=script

The string immediately following the declaration is parsed by script.
-f script-file|--file=script-file

The script reads from the file.
If the declaration script is not displayed, the first non option string is used as the script by default.
-i[SUFFIX]|--in-place[=SUFFIX]
- Add the original file as a backup, and treat the processed result as a new file. It is not recommended to match - n, which is to write the explicitly declared output results to a file.
```
 sed -i.bak -e "s/$*$/\1/" number.txt
 #Equivalent to
 sed -e "s/$*$/\1/" number.txt > temp.txt
 mv number.txt number.txt.bak
 mv temp.txt number.txt
```
- If there is no suffix, the processing result will replace the source file.
- In suffix, a * represents the source file name, a * represents a source file name, and multiple * represents multiple source file names.
```
# Default suffix
sed -i.bak -e "s/$*$/\1/" number.txt
# add prefix
 sed -ibak* -e "s/$*$/\1/" number.txt
# Multiple*
 sed -i**.bak -e "s/$*$/\1/" number.txt

#[root@localhost sed]#  sed -i**.bak -e "s/$*$/\1/" number.txt
#[root@localhost sed]# ls
#baknumber.txt  input.txt  number.txt  number.txt.bak  number.txtnumber.txt.bak
```
The best way to combine options is to put them before i, or they will be parsed according to the suffix. The - s option is added by default.
-E|-r|--regexp-extended
- Enhanced regular expressions support more matches.
- The default is the abbreviated version of regular.
-s|--separate
- By default, multiple files are treated as one file.
- After the option is added, the file is processed as multiple independent files.
- -i option is added by default.
Exit code
- 0
  
  Indicates successful processing.
1

Indicates illegal instruction, syntax error, regular error.
2

The file failed to read and write because of a permission problem or was not found.
4

Read and write errors at run time.

custom

echo | sed 'Q42' ; echo $?
# Exit and obtain the corresponding exit code.

script
- How to add declaration
  
  There are five types, namely: - e -f --expression --file and the first non option string when none of the first four are available.

Fixed format of script [addr]X[options]

# Replace s with x followed by options
s/aaa//
# Specify that line substitutions 1,5 are addr
1,5s/aaa//
#/regex / is addr. This is to use regular to select rows
/regex/s/aaa//

addr can be a line number, regular or interval

Multiple scripts

Multiple scripts can be declared in a string or multiple times.

# First string multiple scripts
sed '/^foo/d ; s/hello/world/' input.txt > output.txt
# Second multiple options declare multiple scripts
sed -e '/^foo/d' -e 's/hello/world/' input.txt > output.txt
# Multiple lines in the third file represent multiple scripts
echo '/^foo/d' > script.sed
echo 's/hello/world/' >> script.sed
sed -f script.sed input.txt > output.txt
# Fourth file and string mixed declaration multiple scripts
echo 's/hello/world/' > script2.sed
sed -e '/^foo/d' -f script2.sed input.txt > output.txt

a. C, I instructions can't be separated by a good partition because of their functions, so they are declared at the end or in a file, or wrapped.
The best way to declare a script is to use single quotes, not double quotes.

Script instruction|X

review

Instruction fixed format [addr]X[options], which summarizes the supported instructions.

Insert a after a line

# Remember that the two formats are
atext	\
text\
text
#
a text

case

[root@localhost sed]# sed -e '2,5ahello' number.txt
1
2
hello
3
hello
4
hello
5
hello
6
7
8
9
10
[root@localhost sed]# sed -e '2,5a\
oknice\
ojbk' number.txt
1
2
oknice
ojbk
3
oknice
ojbk
4
oknice
ojbk
5
oknice
ojbk
6
7
8
9
10

Delete instruction d

Delete specified row

Insert i

The syntax is the same as a, with the effect of inserting before a line.

Output line p
Replace s/regexp/replacement/[flags]

Replacement is quite common, later on.

Output line number=

First output the content of the line number in the corresponding line. Not in the same line.

[root@localhost sed]# sed -e "=" number.txt
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10

Dictionary mapping y

y/src/dst/
# case
[root@localhost sed]# sed -e 'y/123456789/abcdefghi/' number.txt
a
b
c
d
e
f
g
h
i
a0
# Translating src characters into dst characters src and dst should be the same length

Contents of output resu l ts

Output in a regular way. The content of the data area will not be affected.

[root@localhost sed]# sed -n -e 'N;N;p;l' number.txt
1
2
3
1\n2\n3$
4
5
6
4\n5\n6$
7
8
9
7\n8\n9$

Instructions / regexp / replacement / flags

This is a common instruction
Match and replace

Match a line of string, and replace the matched part with the following replacement.

regex

regex is a regular expression, regular has general regular and extended regular.

Match group

Groups can be formed by parentheses. The data in the group can be obtained as a separate parameter.
The replacement that follows can be used in the \ n way. The range of n is 1-9, which represents the entire matching string.

#Notice the difference between strong regular and weak regular
[root@localhost sed]# sed -E -e 's/([0-9])/\1\1/' number.txt
11
22
33
44
55
66
77
88
99
110
[root@localhost sed]# sed -e 's/\([0-9]\)/\1\1/' number.txt
11
22
33
44
55
66
77
88
99
110
[root@localhost sed]# sed -E -e "s/(.{5})/-&-/g" alpha.txt
-hello--world-
-hello--world-
[root@localhost sed]# sed -E -e "s/[hw](.{4})/-&-/g" alpha.txt
-hello--world-
-hello--world-
[root@localhost sed]# sed -E -e "s/[hw](.{4})/-\1&-/g" alpha.txt
-ellohello--orldworld-
-ellohello--orldworld-
[root@localhost sed]# sed -E -e "s/[hw](.{4})/-\1|&-/g" alpha.txt
-ello|hello--orld|world-
-ello|hello--orld|world-

Boundary symbol/

This can be modified with the first character after the s instruction as the delimiter. Include delimiters need to be escaped with a backslash.

[root@localhost sed]# sed -e 's#1#9#' number.txt
9
2
3
4
5
6
7
8
9
90
[root@localhost sed]# sed -e 's;1;9;' number.txt
9
2
3
4
5
6
7
8
9
90
[root@localhost sed]# sed -e 'sa1a9a' number.txt
9
2
3
4
5
6
7
8
9
90

Case conversion on replace

Change everything after the symbol to lowercase. lower

Convert the first after the symbol to lowercase

Capitalize everything after the symbol. upper

Convert the first after the symbol to uppercase

Stop the previous conversion behavior. end.

case

[root@localhost sed]# sed -E -e 's/(.*)/\U\1\E/' alpha.txt
HELLOWORLD
HELLOWORLD
[root@localhost sed]# sed -E -e 's/(.*)/\u\1\E/' alpha.txt
Helloworld
Helloworld

flags

There are many kinds of supported flag s

g

The default is to match only one, which matches all.

[root@localhost sed]# sed -E -e 's/(.)/\u\1\E/' alpha.txt
Helloworld
Helloworld
[root@localhost sed]# sed -E -e 's/(.)/\u\1\E/g' alpha.txt
HELLOWORLD
HELLOWORLD

number

Match only thenOf them. If and g Together, it's fromnAll behind the beginning.

[root@localhost sed]# sed -E -e 's/(.)/\u\1\E/g' alpha.txt
HELLOWORLD
HELLOWORLD
[root@localhost sed]# sed -E -e 's/(.)/\u\1\E/2g' alpha.txt
hELLOWORLD
hELLOWORLD
[root@localhost sed]# sed -E -e 's/(.)/\u\1\E/3g' alpha.txt
heLLOWORLD
heLLOWORLD

Output the specified result p

Output the replaced result. This specified declaration is useful when - n is used.
Ignore case I i

Strengthen regular useful
Multiline matching pattern M m

Match by row. For example, ^ $, doesn't work.

Supplementary instructions

`n`

Outputs the specified row in cycles.

[root@localhost sed]# sed -n -e 'n;n;p' number.txt
3
6
9
# Equivalent to
[root@localhost sed]# sed -n -e '0~3p' number.txt
3
6
9

A set of instructions
Use {} to change a set of commands into an instruction.
```
[root@localhost sed]# sed -e '1,3{s/1/2/;3d}' number.txt
2
2
4
5
6
7
8
9
10
```
This instruction is only for a few lines. Use the same as above.
Replace instruction c text

The original issue of this instruction is the same as i a. The effect is to replace it by lines.
Cyclic instruction
Instructions can be seen as sequential code. Create a label in the form of: label.

Then jump through B label and t label.
```
#Will execute the work repeatedly
:lable;
commands;
b label| t lable;
```
- b label
Jump to the label position unconditionally.
- t label
Unconditional jump. However, if the replacement of the previous statement is executed successfully, the current one will be executed. If it is not, the current one will be skipped.
```
while(NULL != readline(data))
{
    lable: //:lable
    /*
     commands
    */
    goto lable;
    if (end)
    {
        exit(0);
    }
}
```

Detect read state N

Add another row based on the previous data.

while(NULL != readline(data))
   {
       {//N;
           pre_replace_suuceed = readline(data+len(data)+1);
           if(!data.endswith("\n"))
           {
               end = true;
           }
       }
       /*
        commands
       */
    take_relace(data); //s/\n/L/;
       if (end)
       {
           exit(0);
       }
   }

Handle multiple behaviors 1 line

[root@localhost sed]#  sed -e ":lable;N;s/\n/L/;b lable" number.txt
1L2L3L4L5L6L7L8L9L10
[root@localhost sed]# cat number.txt
1
2
3
4
5
6
7
8
9
10

Equivalent to

while(NULL != readline(data))
{
   lable: //:lable
   {//N;
       pre_replace_suuceed = readline(data+len(data)+1);
       if(!data.endswith("\n"))
       {
           end = true;
       }
   }
   /*
    commands
   */
   take_relace(data); //s/\n/L/;
   goto lable;
   if (end)
   {
       exit(0);
   }
}

Select row
- Single line
  
  Only the specified row is processed, 1 for the first row and $for the last row. When you notice multiple files, it represents the first and last lines of the entire file stream, and the - i -s option represents the first and last lines of each file.
Multiline

Range selection
Regular matching

We will deal with those that are regular, and we will talk about them separately later.
Cycle first~step
- first means where to start.
- step represents period
Reverse

Reverse the above.

case

[root@localhost sed]# sed -n -e '1p' number.txt
1
[root@localhost sed]# sed -n -e '1,3p' number.txt
1
2
3
[root@localhost sed]# sed -n -e '/1/p' number.txt
1
10
[root@localhost sed]# sed -n -e '/1/!p' number.txt
2
3
4
5
6
7
8
9
[root@localhost sed]# sed -n -e '3~2p' number.txt
3
5
7
9

Regular selection line
- It shows that the typical distinction between general regular and enhanced regular is that there is no need to escape with backslash.
- /regexp/
  
  Any matching column will be counted. Regular contains / needs to be declared and escaped through \.

Range

If the front of the range is greater than the back, only one is selected.

#The selected row is the first matching row. It can also match line 0, which is special here.
(/regex/|number),(/regexp/,/number/)
#Next n from addr1
addr1,+N
 #Periodic selection starting with addr1.
addr1,~N

custom delimiter

# All are the same. Declare your own separator by escaping characters. \character
sed -n '/^\/home\/alice\/documents\//p'
sed -n '\%^/home/alice/documents/%p'
sed -n '\;^/home/alice/documents/;p'

Regardless of case / regexp / I \% regexp% I

Because there will be lowercaseiThis command. Lower case indicates insertion.

# Match b
[root@localhost sed]# printf "%s\n" a b c | sed '/b/Id'
a
c
# Insert D in front of the matching place, and a in back.
[root@localhost sed]# printf "%s\n" a b c | sed '/b/id'
a
d
b
c

Posted by eva21 on Sat, 13 Jun 2020 18:54:50 -0700

Programmer Group

Detailed explanation of string processing instruction sed,cut

cut

Syntax cut option... [file]

understand

handle

input

file

Subscript selection

n

n0,n1,n2...

n-m

-n

n-

option

-b byte-list|--bytes=byte-list

-c character-list|--characters=character-list

-f field-list|--fields=field-list

-d input_delim_byte |-d input_delim_byte

--output-delimiter=output_delim_string

--complement

sed

brief introduction

working principle

One cycle

Most of the instructions that process multiple lines at the same time are in uppercase. Unlike lowercase, they can process multiple lines.

N

D

input

File input

redirect

Command options

sed OPTIONS... [SCRIPT] [INPUTFILE...]

-n|--quiet|--silent

-e script|--expression=script

-f script-file|--file=script-file

If the declaration script is not displayed, the first non option string is used as the script by default.

-i[SUFFIX]|--in-place[=SUFFIX]

The best way to combine options is to put them before i, or they will be parsed according to the suffix. The - s option is added by default.

-E|-r|--regexp-extended

Enhanced regular expressions support more matches.

-s|--separate

-i option is added by default.

Exit code

0

1

2

4

custom

script

How to add declaration

Fixed format of script [addr]X[options]

addr can be a line number, regular or interval

Multiple scripts

a. C, I instructions can't be separated by a good partition because of their functions, so they are declared at the end or in a file, or wrapped.

The best way to declare a script is to use single quotes, not double quotes.

Script instruction|X

review

Insert a after a line

case

Delete instruction d

Insert i

Output line p

Replace s/regexp/replacement/[flags]

Output line number=

Dictionary mapping y

Contents of output resu l ts

Instructions / regexp / replacement / flags

This is a common instruction

Match and replace

regex

Match group

Boundary symbol/

Case conversion on replace

\L

\l

\U

\u

\E

case

`N`

`sed OPTIONS... [SCRIPT] [INPUTFILE...]`

`-n|--quiet|--silent`

`n`