[data structure and algorithm] string matching KMP algorithm

String matching algorithm (KMP algorithm)

Tag: implementation of strStr KMP algorithm

The previous article outlines BF algorithm and its disadvantages. This article will explain KMP algorithm and its optimization.

If you don't know what the BF algorithm is: Click on me First understand the violent matching method of matching string

KMP algorithm

This algorithm is an optimization of BF algorithm. Why is it an optimization? When two characters of KMP algorithm are not equal, i don't need backtracking, just j backtracking. Where does j backtracking go? Let's not look at this problem first, but look at the example of KMP algorithm.

Suppose the given str is a B a B C a B C a C B C B
The given pattern is a b c a c

A copy of the derivation is written:

Step one:

i points to the index of the current str, J points to the index of pattern, equal from 0, then i++,j + +. When i==2 j==2, inequality is encountered:

The second step:

The third step:

Why


In conclusion, in fact, the largest prefix, let's think about the above example in detail.

Now str is a b c x y z a b c x y z a b c d c d
pattern is a b c x y z a b c d

The comparison to the last characters x and d is unequal. In this case, the characters before d in pattern have completely matched the characters before x in str. At this time, we only need to pay attention to the change of the position of j in the pattern, that is, to find the next position of the index j of the d character, that is, the next table, even the relationship table of the position to which the corresponding j of the d character is to be moved. d is the index of j. if it is not equal to the corresponding i value, then you should skip back to the maximum common prefix of characters before j, that is, the value of next[j] in the next table.

BF algorithm is as follows:

	const strStr = (str, pattern) =>  {
		let i = 0, j = 0
		while (i < str.length && j < pattern.length ) {
			if (str[i] == pattern[j]) {
				i++
				j++
			} else {
				i = i - j + 1 // Go back to the next location at the start of the current comparison
				j = 0
			}
		}
		if (j == pattern.length) return i - j
		return -1
	}

If KMP algorithm is used, i does not need to change, j backtracking, the code will be as follows:

	const strStr = (str, pattern) =>  {
		let i = 0, j = 0
		while (i < str.length && j < pattern.length ) {
			if (str[i] == pattern[j] || j == -1) { 
			// Why j==-1 is one of the criteria. Ignore it first and look back
				i++
				j++
			} else {
				// Before optimization
				// i = i - j + 1 / / goes back to the next location at the start of the current comparison
				// j = 0
				
				// After optimization	
				j = next[j] // The next table stores the next backtrace location for this location
			}
		}
		if (j == pattern.length) return i - j
		return -1
	}

So the problem is how to find the next table, that is, the next backtracking position corresponding to position j. the code is as follows:

function getPrefixTable (pattern) {
  
  let prefix = []
  prefix[0] = 0
  let len = 0; // The equal number of maximum prefixes [the length of the maximum equal prefixes of the string containing the current letter]
  let i = 1; // i start from the second letter of the pattern, why? Because the first letter is not equal, j does not need to go back, i and j can move back one.
  while (i < pattern.length) {
    /**
     *  https://www.youtube.com/watch?v=3IFxpozBs2I  7:59s
     *  ABABCA [What is the prefix value of this character] and the maximum repetition prefix before this character is 1 len = 1
     *  If this character is equal to pattern[1], there must be two identical prefixes
     *  So the same prefix in each position is associated with the same maximum prefix length of the preceding character 
     * 
     */
    if (pattern[i] == pattern[len]) {
      len++
      prefix[i] = len
      i++;
    } else {
      // len may be equal to - 1 
      // WRONG
      // len = prefix[len - 1]

      if (len > 0) {
        len = prefix[len - 1]
      } else {
        //  Let t ='ababcabaa 'I = 1; len = 0 when the program enters, it will die
        // therefore
        prefix[i] = len 
        i++;
        // prefix[i] = 0 is OK
      }
    }
  }
  console.log(prefix)
  return moveTable(prefix)
}


// The value of the next table corresponding to j is the maximum common string from pattern[0] to pattern[j - 1].
// Before calling moveTable, it contains its own characters
function moveTable(prefix) {
  let j = prefix.length;
  for (let i = j - 1; i > 0; i--) {
    prefix[i] = prefix[i - 1]
  }
  prefix[0] = -1
  return prefix
}

let next =  getPrefixTable(pattern)

OK, I know you must be confused. It's easy to understand the process of determining the equality of if statements in the while loop. It's a bit confusing when it's not equal. Look at this directly Huang Haojie's explanation of tubing up , the landlord also saw a lot of videos and articles, which is the easiest point to understand.

Information:

Teacher Wang Zhuo KMP algorithm
Huang Haojie's explanation of tubing up

Mr. Wang Zhuo's explanation of bp algorithm is easy to understand, and the principle of KMP algorithm is also easy to understand [it's very important to know how to skip meaningless comparison], but the next table, i.e. the array solution of j backtracking position, is a bit muddled, and Mr. Wang's calculation starts with index 1, and the array is generally 0, so it needs to be converted.

It is suggested to first look at Mr. Wang's principle, and then look at Huang Haojie's next table solution principle.

123 original articles published, 47 praised, 400000 visitors+
Private letter follow

Posted by imurkid on Sat, 18 Jan 2020 01:21:43 -0800