POJ2778 DNA Sequence (AC automata + matrix fast power)

Keywords: REST

POJ2778 DNA Sequence

Original address:
http://poj.org/problem?id=2778

Title:
We give the DNA sequences of m kinds of diseases, and ask how many kinds of DNA sequences of length n do not contain any kind of DNA sequences of diseases. (only four characters A,T,C,G)

Data range
0 < = m < = 10, 1 < = n < = 2000000000, the length of the given disease string < = 10

Explanation:
First of all, you need to know the preparatory knowledge:
The number of paths with length n from u to v in (directed / undirected) graph
=The value of mat[u][v] after the adjacency matrix of the original graph multiplies itself n times
Specific proof can Look here
It's probably a multiplication principle + addition principle.

So for this question,
First, we need to remove the disease node,
Of course, we need to consider that the fail ure of a point is a disease node, so the disease marker is to be passed down.

Our complete AC automaton is obviously a DAG,
For the rest of the graph, let's figure out its adjacency matrix,
Matrix fast power n,
The answer is the sum of the number of solutions from root to each point.
ans=∑mat[0][v]

code:

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
#include<queue>
#define LL long long
using namespace std;
const int N=110;
const int mod=100000;
queue<int> Q;
int ch[N][4],fail[N],isword[N],num[130],n,m,tail,root,sz;
char s[N];
struct Mat
{
    long long a[N][N];
    void init(){memset(a,0,sizeof(a));}
}ret,base;
Mat operator*(const Mat &A,const Mat &B)
{
    Mat C; C.init();
    for(int i=0;i<=sz;i++)
    for(int j=0;j<=sz;j++)
    for(int k=0;k<=sz;k++)
    C.a[i][j]=(C.a[i][j]+1LL*A.a[i][k]*B.a[k][j])%mod;
    return C;
}
void init()
{
    memset(fail,0,sizeof(fail));
    memset(isword,0,sizeof(isword));
    memset(ch,0,sizeof(ch));
    tail=0; root=0; ret.init(); base.init();
    while(!Q.empty()) Q.pop();
}
void insert()
{
    int len=strlen(s); int tmp=root;
    for(int i=0;i<len;i++)
    {
        int c=num[s[i]];
        if(!ch[tmp][c]){ch[tmp][c]=++tail;}
        tmp=ch[tmp][c];
    }
    isword[tmp]=1;
}
void getfail()
{
    for(int i=0;i<4;i++) 
    if(ch[root][i]) {fail[ch[root][i]]=root; Q.push(ch[root][i]);}
    while(!Q.empty())
    {
        int top=Q.front(); Q.pop();
        for(int i=0;i<4;i++)
        {
            if(!ch[top][i]) {ch[top][i]=ch[fail[top]][i]; continue;}
            int u=ch[top][i]; 
            fail[u]=ch[fail[top]][i]; 
            if(isword[fail[u]]) isword[u]=1;
            Q.push(u);
        }
    }
}
int main()
{
    num['A']=0; num['G']=1; num['C']=2; num['T']=3;
    while(~scanf("%d%d",&m,&n))
    {
        init();
        for(int i=1;i<=m;i++) {scanf("%s",s); insert();}
        getfail();  
        for(int i=0;i<=tail;i++)
        {
            if(isword[i]) continue;
            for(int c=0;c<4;c++)  
            {
                if(isword[ch[i][c]]) continue;
                base.a[i][ch[i][c]]++;
            }
        }
        for(int i=0;i<=tail;i++) ret.a[i][i]=1; sz=tail;
        for(int j=n;j;j>>=1)
        {
            if(j&1) ret=ret*base;
            base=base*base;
        }
        int ans=0;
        for(int i=0;i<=tail;i++) ans=(ans+ret.a[0][i])%mod;
        printf("%d\n",ans);
    }
    return 0;
}

Posted by el_quijote on Thu, 14 May 2020 08:04:07 -0700