Detailed explanation of string processing instruction sed,cut

Keywords: Programming

  1. cut

    • Syntax cut option... [file]

    • understand

      It can be understood as a split function.

  2. handle

    Divide by row as a table, and process the table by column. It is then treated as an array, starting with 1, and accessed by subscript.

  3. input

    • file

      filename

  4. Standard input

    Redirect or-

  5. Subscript selection

    • n

    Equivalent to

    string.splite(regex)[n]
    
    • n0,n1,n2...

    The subscripts are sorted first, then de duplicated, and then output traversal.

    • n-m

    Output in range, including relationships.

    • -n

    From 1 to n

    • n-

    From n to the end, I'm not afraid to cross the border. Crossing the border is empty.

  6. option

    • -b byte-list|--bytes=byte-list

      Process by byte

  7. -c character-list|--characters=character-list

    Process by character

  8. -f field-list|--fields=field-list

    Process multiple words after segmentation according to the custom separator.

  9. -d input_delim_byte |-d input_delim_byte

    Usually used with f to declare delimiters.

  10. --output-delimiter=output_delim_string

    The separator of the output. The - d separator is used by default.

    [root@localhost sed]# echo "hello world" | cut -d ' ' -f1- --output-delimiter='----'
    hello----world
    
  11. --complement

    The output is complete, not incomplete. Do not output if the range is selected.

    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -f2- -d ' '
    root
    root
    root
    root
    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -b2-
    r
    r
    r
    r
    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -b4-
    roo
    roo
    roo
    roo
    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -b5-
    root
    root
    root
    root
    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -b8-
    root
    root
    root
    root
    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -b5-15
    root 0.0  0.4 112924  4296 ?        Ss   Jun11   0:00 /usr/sbin/sshd -D
    root 0.0  0.6 163292  6120 ?        Ss   Jun11   0:01 sshd: root@pts/0
    root 0.0  0.6 163292  6116 ?        Ss   06:19   0:00 sshd: root@pts/1
    root 0.0  0.1 112816   960 pts/1    S+   07:55   0:00 grep --color=auto sshd
    [root@localhost sed]# ps aux | grep "sshd" | cut --complement -b120-
    root      1028  0.0  0.4 112924  4296 ?        Ss   Jun11   0:00 /usr/sbin/sshd -D
    root      1434  0.0  0.6 163292  6120 ?        Ss   Jun11   0:01 sshd: root@pts/0
    root      1974  0.0  0.6 163292  6116 ?        Ss   06:19   0:00 sshd: root@pts/1
    root      2135  0.0  0.1 112816   956 pts/1    S+   07:56   0:00 grep --color=auto sshd
    
  12. sed

    • brief introduction

      Stream editor, stream editor. Can be used to process character and byte streams.

      Generally used to handle string substitution.

  13. working principle

    • Both data buffers are empty at the beginning. Processing data is also processing line by line.

    • One is the read data area and the other is the secondary storage area. Secondary storage is generally useless. Some instructions will be used.

    • One cycle

      • Reads a line of data from a byte stream, removing the trailing line breaks. Save it. Put it in the data buffer.
      • Then execute the instruction, each instruction has a line address information. That is, the scope of the respective rows processed.
      • Row is one of the important conditions for filtering, and only when it is satisfied can it be executed. For example, if only the first line is processed, the unqualified line numbers of other lines will be ignored and instructions will not be executed.
      • After all instructions are executed, if - n is not declared, the result will be output to standard output, and the previously deleted line bits will be added.
      • After execution, start from the first step. After each processing, the data in the data area will be cleared, that is, the last read row of data, and the data buffer will be retained. Of course, some instructions can request not to clear.
  14. Most of the instructions that process multiple lines at the same time are in uppercase. Unlike lowercase, they can process multiple lines.

    • N

      Add a row of data to the data area, that is, process two pieces of data at one time, and the line break in the middle is also added. If it is not read completely, exit directly. No processing

      # All data can be processed in one cycle through a dead cycle. It can be understood in combination with the above cycle explanation.
      [root@localhost sed]# sed -e 'H;s/\n/----/' number.txt
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      [root@localhost sed]# sed -e ':label;N;s/\n/ /;s/ 2$/22222-----/;b label' number.txt
      122222----- 3 4 5 6 7 8 9 10
      [root@localhost sed]# sed -e 'N;N;s/\n/ /;' number.txt
      1 2
      3
      4 5
      6
      7 8
      9
      10
      [root@localhost sed]# sed -e 'N;N;s/\n/ /g;' number.txt
      1 2 3
      4 5 6
      7 8 9
      10
      [root@localhost sed]# sed -e 'N;N;s/\n/ /g;' number.txt
      1 2 3
      4 5 6
      7 8 9
      10
      [root@localhost sed]# sed -n -e 'N;N;s/\n/ /gp;' number.txt
      1 2 3
      4 5 6
      7 8 9
      
  15. D

    Clean up the contents of the data area and start the next cycle.

    #The last one because there is no
    [root@localhost sed]# sed -e 'N;N;s/\n/ /g;' number.txt
    1 2 3
    4 5 6
    7 8 9
    10
    [root@localhost sed]# sed -e 'N;N;s/\n/ /g;D;' number.txt
    10
    
  16. input

    • File input
      sed SCRIPT INPUTFILE INPUTFILE... 
      
  17. redirect
    echo hello | sed SCRIPT -
    
  18. Command options

    • sed OPTIONS... [SCRIPT] [INPUTFILE...]
    • -n|--quiet|--silent
      • Do not output unless explicitly declared by the instruction.
      [root@localhost sed]# sed -e "2,5d" number.txt
      1
      6
      7
      8
      9
      10
      [root@localhost sed]# sed -n -e "2,5d" -e "1,5p" number.txt
      1
      [root@localhost sed]# sed -n -e "2,4d" -e "1,5p" number.txt
      1
      5
      [root@localhost sed]# cat number.txt
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      
  19. -e script|--expression=script

    The string immediately following the declaration is parsed by script.

  20. -f script-file|--file=script-file

    The script reads from the file.

  21. If the declaration script is not displayed, the first non option string is used as the script by default.
  22. -i[SUFFIX]|--in-place[=SUFFIX]
    • Add the original file as a backup, and treat the processed result as a new file. It is not recommended to match - n, which is to write the explicitly declared output results to a file.
     sed -i.bak -e "s/\(*\)/\1/" number.txt
     #Equivalent to
     sed -e "s/\(*\)/\1/" number.txt > temp.txt
     mv number.txt number.txt.bak
     mv temp.txt number.txt
    
    • If there is no suffix, the processing result will replace the source file.
    • In suffix, a * represents the source file name, a * represents a source file name, and multiple * represents multiple source file names.
    # Default suffix
    sed -i.bak -e "s/\(*\)/\1/" number.txt
    # add prefix
     sed -ibak* -e "s/\(*\)/\1/" number.txt
    # Multiple*
     sed -i**.bak -e "s/\(*\)/\1/" number.txt
    
    #[root@localhost sed]#  sed -i**.bak -e "s/\(*\)/\1/" number.txt
    #[root@localhost sed]# ls
    #baknumber.txt  input.txt  number.txt  number.txt.bak  number.txtnumber.txt.bak
    
    The best way to combine options is to put them before i, or they will be parsed according to the suffix. The - s option is added by default.
  23. -E|-r|--regexp-extended
    • Enhanced regular expressions support more matches.
    • The default is the abbreviated version of regular.

  24. -s|--separate

    • By default, multiple files are treated as one file.

    • After the option is added, the file is processed as multiple independent files.

    • -i option is added by default.
  25. Exit code

    • 0

      Indicates successful processing.

  26. 1

    Indicates illegal instruction, syntax error, regular error.

  27. 2

    The file failed to read and write because of a permission problem or was not found.

  28. 4

    Read and write errors at run time.

  29. custom

    echo | sed 'Q42' ; echo $?
    # Exit and obtain the corresponding exit code.
    
  30. script

    • How to add declaration

      There are five types, namely: - e -f --expression --file and the first non option string when none of the first four are available.

  31. Fixed format of script [addr]X[options]

    # Replace s with x followed by options
    s/aaa//
    # Specify that line substitutions 1,5 are addr
    1,5s/aaa//
    #/regex / is addr. This is to use regular to select rows
    /regex/s/aaa// 
    
    addr can be a line number, regular or interval
  32. Multiple scripts

    Multiple scripts can be declared in a string or multiple times.

    # First string multiple scripts
    sed '/^foo/d ; s/hello/world/' input.txt > output.txt
    # Second multiple options declare multiple scripts
    sed -e '/^foo/d' -e 's/hello/world/' input.txt > output.txt
    # Multiple lines in the third file represent multiple scripts
    echo '/^foo/d' > script.sed
    echo 's/hello/world/' >> script.sed
    sed -f script.sed input.txt > output.txt
    # Fourth file and string mixed declaration multiple scripts
    echo 's/hello/world/' > script2.sed
    sed -e '/^foo/d' -f script2.sed input.txt > output.txt
    
  33. a. C, I instructions can't be separated by a good partition because of their functions, so they are declared at the end or in a file, or wrapped.
  34. The best way to declare a script is to use single quotes, not double quotes.

  35. Script instruction|X

    • review

    Instruction fixed format [addr]X[options], which summarizes the supported instructions.

    • Insert a after a line
    # Remember that the two formats are
    atext	\
    text\
    text
    #
    a text
    

    case

    [root@localhost sed]# sed -e '2,5ahello' number.txt
    1
    2
    hello
    3
    hello
    4
    hello
    5
    hello
    6
    7
    8
    9
    10
    [root@localhost sed]# sed -e '2,5a\
    oknice\
    ojbk' number.txt
    1
    2
    oknice
    ojbk
    3
    oknice
    ojbk
    4
    oknice
    ojbk
    5
    oknice
    ojbk
    6
    7
    8
    9
    10
    
    • Delete instruction d

    Delete specified row

    • Insert i

    The syntax is the same as a, with the effect of inserting before a line.

    • Output line p
    • Replace s/regexp/replacement/[flags]

    Replacement is quite common, later on.

    • Output line number=

    First output the content of the line number in the corresponding line. Not in the same line.

    [root@localhost sed]# sed -e "=" number.txt
    1
    1
    2
    2
    3
    3
    4
    4
    5
    5
    6
    6
    7
    7
    8
    8
    9
    9
    10
    10
    
    • Dictionary mapping y

    y/src/dst/
    # case
    [root@localhost sed]# sed -e 'y/123456789/abcdefghi/' number.txt
    a
    b
    c
    d
    e
    f
    g
    h
    i
    a0
    # Translating src characters into dst characters src and dst should be the same length
    
    • Contents of output resu l ts

    Output in a regular way. The content of the data area will not be affected.

    [root@localhost sed]# sed -n -e 'N;N;p;l' number.txt
    1
    2
    3
    1\n2\n3$
    4
    5
    6
    4\n5\n6$
    7
    8
    9
    7\n8\n9$
    
  36. Instructions / regexp / replacement / flags

    • This is a common instruction
    • Match and replace

    Match a line of string, and replace the matched part with the following replacement.

    • regex

    regex is a regular expression, regular has general regular and extended regular.

    • Match group
    • Groups can be formed by parentheses. The data in the group can be obtained as a separate parameter.
    • The replacement that follows can be used in the \ n way. The range of n is 1-9, which represents the entire matching string.
    #Notice the difference between strong regular and weak regular
    [root@localhost sed]# sed -E -e 's/([0-9])/\1\1/' number.txt
    11
    22
    33
    44
    55
    66
    77
    88
    99
    110
    [root@localhost sed]# sed -e 's/\([0-9]\)/\1\1/' number.txt
    11
    22
    33
    44
    55
    66
    77
    88
    99
    110
    [root@localhost sed]# sed -E -e "s/(.{5})/-&-/g" alpha.txt
    -hello--world-
    -hello--world-
    [root@localhost sed]# sed -E -e "s/[hw](.{4})/-&-/g" alpha.txt
    -hello--world-
    -hello--world-
    [root@localhost sed]# sed -E -e "s/[hw](.{4})/-\1&-/g" alpha.txt
    -ellohello--orldworld-
    -ellohello--orldworld-
    [root@localhost sed]# sed -E -e "s/[hw](.{4})/-\1|&-/g" alpha.txt
    -ello|hello--orld|world-
    -ello|hello--orld|world-
    
    • Boundary symbol/
    • This can be modified with the first character after the s instruction as the delimiter. Include delimiters need to be escaped with a backslash.
    [root@localhost sed]# sed -e 's#1#9#' number.txt
    9
    2
    3
    4
    5
    6
    7
    8
    9
    90
    [root@localhost sed]# sed -e 's;1;9;' number.txt
    9
    2
    3
    4
    5
    6
    7
    8
    9
    90
    [root@localhost sed]# sed -e 'sa1a9a' number.txt
    9
    2
    3
    4
    5
    6
    7
    8
    9
    90
    
    • Case conversion on replace

    • \L

    Change everything after the symbol to lowercase. lower

    • \l

    Convert the first after the symbol to lowercase

    • \U

    Capitalize everything after the symbol. upper

    • \u

    Convert the first after the symbol to uppercase

    • \E

    Stop the previous conversion behavior. end.

    • case

    [root@localhost sed]# sed -E -e 's/(.*)/\U\1\E/' alpha.txt
    HELLOWORLD
    HELLOWORLD
    [root@localhost sed]# sed -E -e 's/(.*)/\u\1\E/' alpha.txt
    Helloworld
    Helloworld
    
    • flags

    There are many kinds of supported flag s

    • g

      The default is to match only one, which matches all.

      [root@localhost sed]# sed -E -e 's/(.)/\u\1\E/' alpha.txt
      Helloworld
      Helloworld
      [root@localhost sed]# sed -E -e 's/(.)/\u\1\E/g' alpha.txt
      HELLOWORLD
      HELLOWORLD
      
  37. number

    Match only thenOf them. If and g Together, it's fromnAll behind the beginning.

    [root@localhost sed]# sed -E -e 's/(.)/\u\1\E/g' alpha.txt
    HELLOWORLD
    HELLOWORLD
    [root@localhost sed]# sed -E -e 's/(.)/\u\1\E/2g' alpha.txt
    hELLOWORLD
    hELLOWORLD
    [root@localhost sed]# sed -E -e 's/(.)/\u\1\E/3g' alpha.txt
    heLLOWORLD
    heLLOWORLD
    
  38. Output the specified result p

    Output the replaced result. This specified declaration is useful when - n is used.

  39. Ignore case I i

    Strengthen regular useful

  40. Multiline matching pattern M m

    Match by row. For example, ^ $, doesn't work.

  41. Supplementary instructions

    • n

      Outputs the specified row in cycles.

      [root@localhost sed]# sed -n -e 'n;n;p' number.txt
      3
      6
      9
      # Equivalent to
      [root@localhost sed]# sed -n -e '0~3p' number.txt
      3
      6
      9
      
  42. A set of instructions

    Use {} to change a set of commands into an instruction.
    [root@localhost sed]# sed -e '1,3{s/1/2/;3d}' number.txt
    2
    2
    4
    5
    6
    7
    8
    9
    10
    

    This instruction is only for a few lines. Use the same as above.

  43. Replace instruction c text

    The original issue of this instruction is the same as i a. The effect is to replace it by lines.

  44. Cyclic instruction

    Instructions can be seen as sequential code. Create a label in the form of: label.

    Then jump through B label and t label.

    #Will execute the work repeatedly
    :lable;
    commands;
    b label| t lable;
    
    • b label

    Jump to the label position unconditionally.

    • t label

    Unconditional jump. However, if the replacement of the previous statement is executed successfully, the current one will be executed. If it is not, the current one will be skipped.

    while(NULL != readline(data))
    {
        lable: //:lable
        /*
         commands
        */
        goto lable;
        if (end)
        {
            exit(0);
        }
    }
    
  45. Detect read state N

    Add another row based on the previous data.

    while(NULL != readline(data))
       {
           {//N;
               pre_replace_suuceed = readline(data+len(data)+1);
               if(!data.endswith("\n"))
               {
                   end = true;
               }
           }
           /*
            commands
           */
        take_relace(data); //s/\n/L/;
           if (end)
           {
               exit(0);
           }
       }
    
  46. Handle multiple behaviors 1 line
    [root@localhost sed]#  sed -e ":lable;N;s/\n/L/;b lable" number.txt
    1L2L3L4L5L6L7L8L9L10
    [root@localhost sed]# cat number.txt
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    • Equivalent to
    while(NULL != readline(data))
    {
       lable: //:lable
       {//N;
           pre_replace_suuceed = readline(data+len(data)+1);
           if(!data.endswith("\n"))
           {
               end = true;
           }
       }
       /*
        commands
       */
       take_relace(data); //s/\n/L/;
       goto lable;
       if (end)
       {
           exit(0);
       }
    }
    
  47. Select row

    • Single line

      Only the specified row is processed, 1 for the first row and $for the last row. When you notice multiple files, it represents the first and last lines of the entire file stream, and the - i -s option represents the first and last lines of each file.

  48. Multiline

    Range selection

  49. Regular matching

    We will deal with those that are regular, and we will talk about them separately later.

  50. Cycle first~step
    • first means where to start.
    • step represents period
  51. Reverse

    Reverse the above.

  52. case

    [root@localhost sed]# sed -n -e '1p' number.txt
    1
    [root@localhost sed]# sed -n -e '1,3p' number.txt
    1
    2
    3
    [root@localhost sed]# sed -n -e '/1/p' number.txt
    1
    10
    [root@localhost sed]# sed -n -e '/1/!p' number.txt
    2
    3
    4
    5
    6
    7
    8
    9
    [root@localhost sed]# sed -n -e '3~2p' number.txt
    3
    5
    7
    9
    
  53. Regular selection line

    • It shows that the typical distinction between general regular and enhanced regular is that there is no need to escape with backslash.
    • /regexp/

      Any matching column will be counted. Regular contains / needs to be declared and escaped through \.

  54. Range

    If the front of the range is greater than the back, only one is selected.

    #The selected row is the first matching row. It can also match line 0, which is special here.
    (/regex/|number),(/regexp/,/number/)
    #Next n from addr1
    addr1,+N
     #Periodic selection starting with addr1.
    addr1,~N
    
  55. custom delimiter

    # All are the same. Declare your own separator by escaping characters. \character
    sed -n '/^\/home\/alice\/documents\//p'
    sed -n '\%^/home/alice/documents/%p'
    sed -n '\;^/home/alice/documents/;p'
    
  56. Regardless of case / regexp / I \% regexp% I

    Because there will be lowercaseiThis command. Lower case indicates insertion.

    # Match b
    [root@localhost sed]# printf "%s\n" a b c | sed '/b/Id'
    a
    c
    # Insert D in front of the matching place, and a in back.
    [root@localhost sed]# printf "%s\n" a b c | sed '/b/id'
    a
    d
    b
    c
    

Posted by eva21 on Sat, 13 Jun 2020 18:54:50 -0700