String matching in algorithm

Keywords: C++ Algorithm data structure leetcode string

28 implementation str ()

Determines whether a string is a substring of another string and returns its position.

Enter a parent string and a substring, and output an integer indicating the position of the substring in the parent string. If it does not exist, return - 1.

Input: haystack = "hello", need = "ll"
Output: 2

Resolution:

A simple way to solve this problem is violent matching: first, align the left end of the substring and the parent string; Then compare the corresponding characters one by one. If a mismatch is found, move the starting matching position of the substring one bit behind the parent string, and trace the comparison pointer back to the head of the substring; Repeat the matching process until the corresponding substring is found. If it does not exist, return - 1.

class Solution {
public:
    int strStr(string haystack, string needle) {
        int m = haystack.length(), n = needle.length();
        for(int i=0;i+n<=m;++i){
            bool flag = true;
            for(int j=0;j<n;++j){
                if(haystack[j+i]!=needle[j]){
                    flag = false;
                    break;
                }
            }
            if(flag){
                return i;
            }
        }
        return -1;
    }
};

The continuous backtracking process of the comparison pointer of the above methods will increase the time complexity. An optimization method is KMP algorithm. The core idea of KMP algorithm is to find the same pre suffix in the matched part of the substring, and directly move the substring from the prefix part to the suffix in case of character mismatch, so as to avoid the comparison pointer directly tracing back to the head of the substring.

As shown in the following example, there is a mismatch between the parent string and the substring in the 8th character. The next matching operation: if violent matching is used, the matching pointer is traced back to the head of the substring and moved back one bit to find alignment with the left end of the parent string again; If the KMP algorithm is used, it can be seen that the same Longest Prefix suffix of the substring that has been matched is ABC. Directly move the whole substring from the prefix position to the suffix and move four bits at one time to avoid the comparison pointer matching from the beginning.

Original stringViolent matchingKMP algorithm
ABCFABCFABCAABCFABCFABCAABCFABCFABCA
ABCFABCA0ABCFABCA0000ABCFABCA

The key of KMP algorithm is to find the longest prefix in the substring. Here, the idea of dynamic programming can be adopted:

Setting status: build an array. next[i] represents the length of the longest prefix in the part of the string before the corresponding position I in the substring.

State transition equation: for position i, if the next prefix is the same, update the length of the same maximum prefix; If the next bit is different, it will be backtracked.

Initial condition: if there is only one character and there is no pre suffix, next[0]=-1. The prefix pointer starts at - 1 and the suffix pointer starts at 1 to traverse the substring.

// Calculate prefix table next
void getNext(string needle, vector<int>& next){
    int head = -1;
    next[0] = -1;
    for(int tail = 1; tail<needle.length();++tail){
        // If the next bit is different, go back until there is no prefix (head=-1)
        while(head>-1 && needle[head+1]!=needle[tail]){
            head = next[head];
        }
        // If the next bit is the same, update the same maximum prefix and maximum suffix length, and move the prefix pointer at the same time
        if(needle[head+1]==needle[tail]){
            ++head;
        }
        next[head] = head;
    }
}

An example of calculating ABCFABCA prefix table above:

next indexPartial substringPosition of the last element of the longest prefix next[i]
0A-1
1AB-1
2ABC-1
3ABCF-1
4ABCFA0
5ABCFAB1
6ABCFABC2
7ABCFABCA2
class Solution {
public:
	// Calculate prefix table next
    void getNext(string needle, vector<int>& next){
        int head = -1;
        next[0] = -1;
        for(int tail=1;tail<needle.length();++tail){
            // If the next bit is different, go back until there is no prefix (head=-1)
            while(head>-1 && needle[head+1]!=needle[tail]){
                head = next[head];
            }
            if(needle[head+1]==needle[tail]){
                ++head;
            }
            next[tail] = head;
        }
    }

    int strStr(string haystack, string needle) {
        int cur = -1;
        int m = haystack.size(), n = needle.size();
        // If the substring is empty, return 0 
        if(n==0) return 0;
        // Get prefix table
        vector<int> next(n,-1);
        getNext(needle,next);
        for(int i=0;i<m;++i){
            while(cur>-1 && haystack[i]!=needle[cur+1]){
                cur = next[cur];
            }
            if(haystack[i]==needle[cur+1]){
                ++cur;
            }
            // Description = cur moves to the end of the need. At this time, i also points to the last position of the matching substring in the parent string and returns the leftmost position of the matching substring at this time
            if(cur == n-1){
                return i - cur;
            }
        }
        return -1;
    }
};

reference material

LeetCode 101: easily brush questions with you (C + +) Chapter 12 heady strings

Posted by jlpulido on Thu, 04 Nov 2021 21:04:14 -0700