Reading notes of Dahua data structure -- the fifth chapter series

Keywords: less ascii C

Article directory

5.2 definition of string

  1. A finite sequence of zero or more characters, also known as a string.

5.3 comparison of strings

Compare number and ASCII.

5.4 abstract data types of strings

ADT
DATA
Operation 
   StrAssign(T,*chars);//Generate string
   StrCopy(T,S);//Copy S to T
   ClearString(S);//empty
   StringEmpty(S);//Judge whether it is empty
   StrLength(S);
   StrCompare(S,T);
   Concat(T,S1,S2); //merge
   SubString(Sub,S,pos,len);//cutting
   Index(S,T,pos);
   Replace(S,T,V);
   StrInsert(S,pos,T);
   StrDelete(S,pos,len);

5.5 storage structure of string

5.5.1 sequential storage structure of strings

  1. Dynamic allocation of malloc(), free() in C language heap

5.5.2 chain storage structure of string

-> ABCD->IG##^
The last node is not full. Use "ා" or other non string value characters to complete

5.6 simple pattern matching algorithm

  1. The positioning operation of substring is usually called pattern matching of string
//Returns the position of the substring t after the pos character in the main string s, does not exist, returns - 1
public int index(String s,String t,int pos){
   int i = pos; //Match primary string position subscript from pos position
   int j = 0;   //Substring substring position subscript
   //If i is less than the length of the main string and t is less than the length of the substring
   while(i < s.length() && t < t.length()){
      //If the characters move equally
      if(s.charAt(i) == t.charAt(j)){
         i++;
         j++;
      } else { //Backoff operation if not equal
        i = i - j + 1; 
        j = 0;  
      }
      //Description found
      if(j ==  t.length()){
        return i - j; /
      } 
      return -1;
   }
}
//Best time complexity O(1)
//Average O(n+m)
//Worst O((n-m+1) * m)

5.7 KMP pattern matching algorithm

  1. When the letters in position i and j are the same, both pointers point to the next position to continue the comparison;
  2. When the letters in positions i and j are different, i remains unchanged, and j returns to the next[j] position for re comparison. (for the moment, we don't care about the next [], as long as we remember to define the next[0]=-1)
  3. When J returns to the subscript 0, if the letters in the positions i and j are still different, according to (2), there is j = next[0]=-1, then only i and J can continue to move backward for comparison (the same as step (1))
/*
 * Returns the position (including pos position) of the substring t after the pos character in the main string s. If not, return - 1
 */
public int index_KMP(String s, String t, int pos) {
    int i = pos;  //Pointer to main string
    int j = 0;    //Pointer to substring
    int[] next = getNext(t);  //Get next array of substrings
    while (i < s.length() && j < t.length()) {
       //When j = -1 or characters move equally
        if (j == -1 || s.charAt(i) == t.charAt(j)) {
        // j==-1 indicates that the first place of the substring does not match, which is obtained from the previous step j=next[0]=-1.
            i++;
            j++;
        } else {
            j = next[j];
        }
    }
    if (j == t.length()){
       return i - j;
    } 
    return -1;
}
//next[j]: when the element with subscript j does not match, the next position subscript to jump for J.
/*
 * Returns the next array of strings
 */
public int[] getNext(String str) {
    int length = str.length();
    int[] next = new int[length]; 
    int i = 0;   //i suffix pointer
    int j = -1;  //j prefix pointer
    next[0] = -1; // 1.next[0]=-1;
    while (i < length - 1) {         // Because there is next[i + +], it is not I < length
        if (j == -1 || str.charAt(i) == str.charAt(j)) { // j == -1 means that there is no equal part of the prefix and suffix, and the next value of i+1 position is 0   
           //2. When J = = - 1, the prefix does not have the same place as the suffix. If the maximum length is 0, then the next value of i+1 position can only be 0, which can also be expressed as next[i+1]=j+1. next[i+1]=j+1 when the character at position J in the current suffix is equal to the character at position I in the suffix         
            next[++i] = ++j;  //Equal to the length of the prefix
        } else {
            //When the characters in position j and position I are not equal, it means that the prefix cannot match the suffix in position j, so that j jumps to the next matching position, i.e. j= next[j].
            j = next[j];
        }
    }
    return next;
}

Next [J] = maximum length of the same prefix of the string preceding the j position.

5.7.1 improvement of KMP pattern matching algorithm

public class KMP2 {
    public int[] getNextval(String str) {
        int length = str.length();
        int[] nextval = new int[length];
        int i = 0;   //i pointer to suffix
        int j = -1;  //Pointer to the j prefix
        nextval[0] = -1;
        while (i < length - 1) {        
            if (j == -1 || str.charAt(i) == str.charAt(j)) {   
                i++;
                j++;
                if(str.charAt(i)!=str.charAt(j)) { //The judgment of whether an extra character is equal
                    nextval[i] = j;  //Equal to the length of the prefix
                }else {
                    nextval[i]=nextval[j];   
                }  
            } else {
                j = nextval[j];
            }
        }
        return nextval;
    }
  1. Core: an array of partial match tables. The value in PMT is the length of the longest element in the intersection of the prefix set and the suffix set of the string.
    How to better understand and master KMP algorithm?
38 original articles published, praised 13, visited 40000+
Private letter follow

Posted by delboy1978uk on Wed, 05 Feb 2020 23:53:11 -0800