Detailed explanation of ac automata (multi-mode string matching with mismatch pointer)

Keywords: C Algorithm acm

ac automata is an algorithm used to calculate multiple string matches. As we know, K M P KMP KMP realizes the matching of single-mode strings in an ingenious way (using a single string to ask whether the string appears in other strings). Now we want to realize multi-mode string matching (that is, using multiple strings to ask whether these strings appear in other strings) K M P KMP KMP is obviously not enough, so a new algorithm inheritance is needed K M P KMP The idea of KMP should also be based on a data structure, which is t r i e trie trie dictionary tree.
Dictionary tree is a data structure that decomposes a string into a single character and exists in the tree. For example, we have
{"ab", "ac", "ba", "cba"} four strings, we can build the following tree to store these strings.

p s : ps: ps: because it is pre knowledge, I won't repeat it here.

Next, how do we compare these strings with other strings, such as c b b a cbba What about cbba matching? We can search the tree violently, but it will waste a lot of time and space. Therefore, we introduce the concept of mismatch pointer. For example, when we check the number, cb first finds it on the rightmost subtree, but a does not match b in the string. It is wasteful to re check. At this time, we see that there is a b in the middle subtree, We can jump to the middle subtree and continue to query downward, so as to promote it. Every time we see that the maximum suffix of the string formed by the subtree protruding from the root nearby is equal to the subtree currently being queried, we can jump to the subsequent subtree and continue to query through the mismatch pointer to avoid waste to the greatest extent, Since the following mismatch pointers must be obtained from the previous mismatch pointers, we consider using the queue to obtain the mismatch pointers of each node, as shown in the figure:

Here is a template question and ac Code:
Title portal: hdu2222 Keywords Search
a c ac ac Code:

#include<bits/stdc++.h>

using namespace std;

#define MAX 1000009
#define TOTAL 500009 

struct Aho{
	struct state{
		int next[26];
		int fail,cnt;
	}stateTable[TOTAL];
	int size; 
	queue<int>q;
	void init(){
		for(int i=0;i<TOTAL;i++){
			memset(stateTable[i].next,0,sizeof stateTable[i].next);
			stateTable[i].fail=stateTable[i].cnt=0;
		}
		size = 1;
	}
	void insert(char *S){
		int n = strlen(S);
		int now = 0;
		for(int i=0;i<n;++i){
			char c = S[i];
			if(!stateTable[now].next[c-'a'])
			    stateTable[now].next[c-'a'] = size++;
			now = stateTable[now].next[c-'a'];
		}
		stateTable[now].cnt++;
	}
	void build(){
		stateTable[0].fail = -1;
		q.push(0);
				
		while(q.size()){
			int u = q.front();
			q.pop();
			for(int i=0;i<26;++i){
				if(stateTable[u].next[i]){
					if(u == 0) stateTable[stateTable[u].next[i]].fail = 0;
					else{
						int v = stateTable[u].fail;
						while(v!=-1){
							if(stateTable[v].next[i]){
								stateTable[stateTable[u].next[i]].fail = stateTable[v].next[i];
								break;
							}
							v = stateTable[v].fail;
						}
						if(v == -1) stateTable[stateTable[u].next[i]].fail = 0;
					}
					q.push(stateTable[u].next[i]);
				}
			}
		}
	}
	
	int Get(int u){
		int res = 0;
		while(u){
			res = res + stateTable[u].cnt;
			stateTable[u].cnt = 0;
			u = stateTable[u].fail;
		}
		return res;
	}
	
	int match(char *S){
		int n = strlen(S);
		int res = 0, now = 0;
		for(int i=0;i<n;++i){
			char c =S[i];
			if(stateTable[now].next[c-'a']){
			    now = stateTable[now].next[c-'a'];				
			}
			else{
				int p = stateTable[now].fail;
				while(p!=-1&&stateTable[p].next[c-'a']==0) p = stateTable[p].fail;
				if(p==-1) now = 0;
				else now = stateTable[p].next[c-'a'];
			}    
			if(stateTable[now].cnt){
				res += Get(now);
			}
		}
		return res;
	}
}aho;

char S[MAX];

int main(){
	int T;
	scanf("%d",&T);
	while(T--){
		aho.init();
		int n;scanf("%d",&n);
		for(int i=0;i<n;++i){
			scanf("%s",S);
			aho.insert(S);
		}
		aho.build();
		scanf("%s",S);
		printf("%d\n",aho.match(S));
	}
	
	
	return 0;
}

above

Posted by Hopps on Sat, 20 Nov 2021 14:12:33 -0800