Given two strings A and B, find the longest common subsequence of A and B (the subsequence does not need to be continuous).
For example, two strings are:
abcicba
abdkscab
ab is a subsequence of two strings, so is abc and abca. ABCA is the longest subsequence of the two strings.
Input
Line 1: String A Line 2: String B (Length of A,B <= 1000)
Output
Output the longest subsequence, if there are more than one, output one at will.
Input example
abcicba abdkscab
Output example
abca
Problem definition
Subsequence
– X=(A, B, C, B, D, B)
Z=(B, C, D, B) is a subordinate of X.
W=(B, D, A) is not a subordinate case of X
Common subsequences
- Z is a common subsequence of sequence X and Y if Z is X's
The suborder is also a subsequence of Y.
Longest common subsequence (LCS) problem
Input: X = x1, x2,..., xn, Y=
(y1,y2,...ym)
Output: The longest common suborder of Z = X and Y
column
Structural analysis of longest common subsequence
_pref i x
Let X= (x1, x2,..., xn) be a sequence, before the first i of X
Affix Xi
Is a sequence, defined as Xi= (x1,..., xi)
Example. X=(A, B, D, C, A), X1=(A), X2=(A, B), X3=(A,
B, D)
Optimizing Substructure
Theorem 1 (Optimized Substructure) Let X= (x1,..., xm)
Y=(y1,...,yn) is two sequences, Z=(z1,...,zk) is X and Y.
LCS, we have:
(1) If xm=yn, then zk=xm=yn, Zk-1
It's Xm-1.
And Yn-1
Of
LCS, that is, LCSXY = LCSXm-1Yn-1
+ <xm=yn>.
(2) If xm.yn
And zk.xm
Then Z is Xm-1
And Y.
LCS, that is, LCSXY= LCSXm-1Y
(3) If xm.yn and zk.yn, then Z is X and Yn-1.
LCS,
LCSXY= LCSXYn-1
Prove:
(1) X=<x1,... Xm-1, XM >, Y=<y1,... Yn-1, XM >, then
LCSXY = LCSXm-1Yn-1
+ <xm=yn>.
Set zk xm
Then add xm=yn
To Z, we get a k+1 long one.
The common sequence of X and Y contradicts the LCS that Z is X and Y. Therefore
zk=xm=yn
.
Now prove Zk-1
It's Xm-1.
And Yn-1
LCS. Obviously Zk-1
It's Xm-.
1
And Yn-1
Common sequence. We need to prove Zk-1
It's LCS.
Otherwise, there is Xm-1
And Yn-1
Common subsequences W, W
The length is longer than k-1. Increase xm=yn
By W, we got a long one.
The common sequence of X and Y larger than k is contradictory to Z, which is LCS. to
Yes, Zk-1.
It's Xm-1.
And Yn-1
LCS.
⑵ X=<x1, …, xm-1, xm>, Y=<y1, …, yn-1, yn>,
xmyn
,zkxm
Then LCSXY= LCSXm-1Y
Because of zk_xm
Z is Xm-1.
Common subsequences with Y. I
Let's prove that Z is Xm-1.
With the LCS of Y. Set Xm-1
There is one with Y.
If the length of W is longer than k, then W is also X.
Common subsequences of Y and Z are contradictory to LCS.
(3) The same can be proved.
The optimized structure of LCS for X and Y is as follows:
LCSXY=LCSXm-1Yn-1
+ <xm=yn> if xm=yn
LCSXY=LCSXm-1Y if xm≠yn, zk≠xm
LCSXY=LCSXYn-1 if xm≠yn, zk≠yn
Establishment of Recursive Equation of LCS Length
* C [i, j] = the length of LCS for Xi and Yj
Recursive Equation of LCS Length
C[i, j] = 0 if i=0 or j=0
C [i, j] = C [i-1, j-1] + 1 if i, J > 0 and xi = yj
C [i, j] = Max (C [i, j-1], C [i-1, j]) if i, J > 0 and xi_yj
Calculating LCS Length from Bottom to Top
Algorithms for Calculating LCS Length
- Data structure
C[0:m,0:n]: C[i,j] is Xi
And Yj
Length of LCS
B[1:m,1:n]: B[i,j] is a pointer to computation
The Optimal Solution of the Subproblem Selected in C[i,j]
Table Items of the corresponding C Table
LCS-length(X, Y)
m←length(X);n←length(Y);
For i←1 To m Do C[i,0]←0;
For j←1 To n Do C[0,j]←0;
For i←1 To m Do
For j←1 To n Do
If xi = yj
Then C[i,j]←C[i-1,j-1]+1;B[i,j]←"↖";
Else If C[i-1,j]≥C[i,j-1]
Then C[i,j]≥C[i-1,j]; B[i,j]←"↑";
Else C[i,j]≥C[i,j-1]; B[i,j]←"←";
Return C and B.
Constructing an Optimal Solution
Fundamental Thoughts
Starting with B[m, n], search by pointer
If B[i, j]= "", then xi=yj
It's a unit of LCS.
element
The "LCS" found so far is the LCS of X and Y.
Print-LCS(B, X, i, j)
IF i=0 or j=0 THEN Return;
IF B[i, j]="↖"
THEN Print-LCS(B, X, i-1, j-1);
Print xi;
ELSE If B[i, j]="↑"
THEN Print-LCS(B, X, i-1, j);
ELSE Print-LCS(B, X, i, j-1).
Print-LCS(B, X, length(X), length(Y))
The LCS of X and Y can be printed out.
1 /*Function: Calculate the optimal value 2 *Parameters: 3 * x:String x: Maximum length of string x 4 * y:String y Y: Maximum length of string y 5 * b:Tag array 6 * xlen:Length of string x 7 * ylen:Length of string y 8 *Return value: Length of the longest common subsequence 9 * 10 */ 11 int Lcs_Length(string x, string y, int b[][Y+1],int xlen,int ylen) 12 { 13 int i = 0; 14 int j = 0; 15 16 int c[X+1][Y+1]; 17 for (i = 0; i<=xlen; i++) 18 { 19 c[i][0]=0; 20 } 21 for (i = 0; i <= ylen; i++ ) 22 { 23 c[0][i]=0; 24 } 25 for (i = 1; i <= xlen; i++) 26 { 27 28 for (j = 1; j <= ylen; j++) 29 { 30 if (x[i - 1] == y[j - 1]) 31 { 32 c[i][j] = c[i-1][j-1]+1; 33 b[i][j] = 1; 34 } 35 else 36 if (c[i-1][j] > c[i][j-1]) 37 { 38 c[i][j] = c[i-1][j]; 39 b[i][j] = 2; 40 } 41 else 42 if(c[i-1][j] <= c[i][j-1]) 43 { 44 c[i][j] = c[i][j-1]; 45 b[i][j] = 3; 46 } 47 48 } 49 } 50 51 cout << "The effect chart for calculating the optimal value is as follows:" << endl; 52 for(i = 1; i <= xlen; i++) 53 { 54 for(j = 1; j < ylen; j++) 55 { 56 cout << c[i][j] << " "; 57 } 58 cout << endl; 59 } 60 61 return c[xlen][ylen]; 62 }
Complete code
//Only one longest common subsequence can be printed #include <iostream> using namespace std; const int X = 1000, Y = 1000; //Maximum length of string char result[X+1]; //Used to save results int count=0; //Number of common longest substrings to save int c[X+1][Y+1]; int b[X + 1][Y + 1]; /*Function: Calculate the optimal value *Parameters: * x:String x * y:String y * b:Tag array * xlen:Length of string x * ylen:Length of string y *Return value: Length of the longest common subsequence * */ int Lcs_Length(string x, string y, int b[][Y+1],int xlen,int ylen) { int i = 0; int j = 0; //int c[X+1][Y+1]; for (i = 0; i<=xlen; i++) { c[i][0]=0; } for (i = 0; i <= ylen; i++ ) { c[0][i]=0; } for (i = 1; i <= xlen; i++) { for (j = 1; j <= ylen; j++) { if (x[i - 1] == y[j - 1]) { c[i][j] = c[i-1][j-1]+1; b[i][j] = 1; } else if (c[i-1][j] > c[i][j-1]) { c[i][j] = c[i-1][j]; b[i][j] = 2; } else if(c[i-1][j] <= c[i][j-1]) { c[i][j] = c[i][j-1]; b[i][j] = 3; } } } /* cout << "The result of calculating the optimal value is as follows: "<< endl; for(i = 1; i <= xlen; i++) { for(j = 1; j < ylen; j++) { cout << c[i][j] << " "; } cout << endl; } */ return c[xlen][ylen]; } void Display_Lcs(int i, int j, string x, int b[][Y+1],int current_Len) { if (i ==0 || j==0) { return; } if(b[i][j]== 1) { current_Len--; result[current_Len]=x[i- 1]; Display_Lcs(i-1, j-1, x, b, current_Len); } else { if(b[i][j] == 2) { Display_Lcs(i-1, j, x, b, current_Len); } else { if(b[i][j]==3) { Display_Lcs(i, j-1, x, b, current_Len); } else { Display_Lcs(i-1,j,x,b, current_Len); } } } } int main(int argc, char* argv[]) { string x; string y; cin>>x>>y; int xlen = x.length(); int ylen = y.length(); //int b[X + 1][Y + 1]; int lcs_max_len = Lcs_Length( x, y, b, xlen,ylen ); //cout << lcs_max_len << endl; Display_Lcs( xlen, ylen, x, b, lcs_max_len ); //The print results are as follows for(int i = 0; i < lcs_max_len; i++) { cout << result[i]; } cout << endl; return 0; }
Algorithmic complexity:
* Time complexity
- Time to calculate costs
(i, j) two-level cycle, i-cycle m-step, j-cycle n-step
• O(mn)
Time to construct the optimal solution: O(m+n)
Total time complexity: O(mn)
Airborne complexity
Use arrays C and B
- Need space O(mn)