# Reading notes of Dahua data structure -- the fifth chapter series

Keywords: less ascii C

## 5.2 definition of string

1. A finite sequence of zero or more characters, also known as a string.

## 5.3 comparison of strings

Compare number and ASCII.

## 5.4 abstract data types of strings

```ADT
DATA
Operation
StrAssign(T,*chars);//Generate string
StrCopy(T,S);//Copy S to T
ClearString(S);//empty
StringEmpty(S);//Judge whether it is empty
StrLength(S);
StrCompare(S,T);
Concat(T,S1,S2); //merge
SubString(Sub,S,pos,len);//cutting
Index(S,T,pos);
Replace(S,T,V);
StrInsert(S,pos,T);
StrDelete(S,pos,len);
```

## 5.5 storage structure of string

### 5.5.1 sequential storage structure of strings

1. Dynamic allocation of malloc(), free() in C language heap

### 5.5.2 chain storage structure of string

```-> ABCD->IG##^
The last node is not full. Use "ා" or other non string value characters to complete
```

## 5.6 simple pattern matching algorithm

1. The positioning operation of substring is usually called pattern matching of string
```//Returns the position of the substring t after the pos character in the main string s, does not exist, returns - 1
public int index(String s,String t,int pos){
int i = pos; //Match primary string position subscript from pos position
int j = 0;   //Substring substring position subscript
//If i is less than the length of the main string and t is less than the length of the substring
while(i < s.length() && t < t.length()){
//If the characters move equally
if(s.charAt(i) == t.charAt(j)){
i++;
j++;
} else { //Backoff operation if not equal
i = i - j + 1;
j = 0;
}
//Description found
if(j ==  t.length()){
return i - j; /
}
return -1;
}
}
//Best time complexity O(1)
//Average O(n+m)
//Worst O((n-m+1) * m)
```

### 5.7 KMP pattern matching algorithm

1. When the letters in position i and j are the same, both pointers point to the next position to continue the comparison;
2. When the letters in positions i and j are different, i remains unchanged, and j returns to the next[j] position for re comparison. (for the moment, we don't care about the next [], as long as we remember to define the next=-1)
3. When J returns to the subscript 0, if the letters in the positions i and j are still different, according to (2), there is j = next=-1, then only i and J can continue to move backward for comparison (the same as step (1))
```/*
* Returns the position (including pos position) of the substring t after the pos character in the main string s. If not, return - 1
*/
public int index_KMP(String s, String t, int pos) {
int i = pos;  //Pointer to main string
int j = 0;    //Pointer to substring
int[] next = getNext(t);  //Get next array of substrings
while (i < s.length() && j < t.length()) {
//When j = -1 or characters move equally
if (j == -1 || s.charAt(i) == t.charAt(j)) {
// j==-1 indicates that the first place of the substring does not match, which is obtained from the previous step j=next=-1.
i++;
j++;
} else {
j = next[j];
}
}
if (j == t.length()){
return i - j;
}
return -1;
}
//next[j]: when the element with subscript j does not match, the next position subscript to jump for J.
/*
* Returns the next array of strings
*/
public int[] getNext(String str) {
int length = str.length();
int[] next = new int[length];
int i = 0;   //i suffix pointer
int j = -1;  //j prefix pointer
next = -1; // 1.next=-1;
while (i < length - 1) {         // Because there is next[i + +], it is not I < length
if (j == -1 || str.charAt(i) == str.charAt(j)) { // j == -1 means that there is no equal part of the prefix and suffix, and the next value of i+1 position is 0
//2. When J = = - 1, the prefix does not have the same place as the suffix. If the maximum length is 0, then the next value of i+1 position can only be 0, which can also be expressed as next[i+1]=j+1. next[i+1]=j+1 when the character at position J in the current suffix is equal to the character at position I in the suffix
next[++i] = ++j;  //Equal to the length of the prefix
} else {
//When the characters in position j and position I are not equal, it means that the prefix cannot match the suffix in position j, so that j jumps to the next matching position, i.e. j= next[j].
j = next[j];
}
}
return next;
}
```

Next [J] = maximum length of the same prefix of the string preceding the j position. ### 5.7.1 improvement of KMP pattern matching algorithm

```public class KMP2 {
public int[] getNextval(String str) {
int length = str.length();
int[] nextval = new int[length];
int i = 0;   //i pointer to suffix
int j = -1;  //Pointer to the j prefix
nextval = -1;
while (i < length - 1) {
if (j == -1 || str.charAt(i) == str.charAt(j)) {
i++;
j++;
if(str.charAt(i)!=str.charAt(j)) { //The judgment of whether an extra character is equal
nextval[i] = j;  //Equal to the length of the prefix
}else {
nextval[i]=nextval[j];
}
} else {
j = nextval[j];
}
}
return nextval;
}
```
1. Core: an array of partial match tables. The value in PMT is the length of the longest element in the intersection of the prefix set and the suffix set of the string.
How to better understand and master KMP algorithm?  38 original articles published, praised 13, visited 40000+

Posted by delboy1978uk on Wed, 05 Feb 2020 23:53:11 -0800