首页 > 代码库 > WOJ 1047 LCS problem (LCS 算法总结 )
WOJ 1047 LCS problem (LCS 算法总结 )
http://acm.whu.edu.cn/land/problem/detail?problem_id=1047
Description
Recently, Flymouse reads a book about Algorithm and Data Structure. The book reads: there are two types of LCS Problems. One is Longest Common
Subsequence problem. By the way of Dynamic Programming, we could solve this problem. The other is Longest Common Substring problem, which is
to find the longest string that is substrings of the two strings. For example, given the following two strings:
1. flymouseEnglishpoor
2. comeonflymouseinenglish
The longest common substring is flymouse, the length of this string is 8.
Input
The first line contains a single integer t (1 <= t <= 100), the number of test cases.There will be two lines for each test case,each line contains
a string (The length of the two strings are no more than 1000 and you can assure all strings will not contains any punctuation or other separators).
Output
For each test case, you should output one line containing the longest common substring’s length of the two strings of the test case.
Sample Input
1
flymouseEnglishpoor
comeonflymouseinenglish
Sample Output
8
C++
LCS算法:
通常两个字符串的最大公共子串的问题是通过下面的算法来完成的: 把字符串1(长度m)横排,串2(长度n)竖排,得到一个m×n的矩阵c,矩阵的每个元素的值如下,如果m[i]=n[j],则c[j][i]=1,否则,c[j][i]=0。然后找出矩阵中连续是1的对角线最长的一个,则对角线的长度就是公共子串的长度.
下面是字符串21232523311324和字符串312123223445的匹配矩阵,前者为X方向的,后者为Y方向的。不难找到,红色部分是最长的匹配子串。通过查找位置我们得到最长的匹配子串为:21232
0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 1 1 0 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 0 1 1 0 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
但是在0和1的矩阵中找最长的1对角线序列又要花去一定的时间。通过改进矩阵的生成方式和设置标记变量,可以省去这部分时间。下面是新的矩阵生成方式:
0 0 0 1 0 0 0 1 1 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 2 1 0 0 0 0
1 0 2 0 1 0 1 0 0 0 0 0 1 0 0
0 2 0 0 0 0 0 0 0 1 1 0 0 0 0
1 0 3 0 1 0 1 0 0 0 0 0 1 0 0
0 0 0 4 0 0 0 2 1 0 0 1 0 0 0
1 0 1 0 5 0 1 0 0 0 0 0 2 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0 0
0 0 0 2 0 0 0 2 1 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
不用多说,你大概已经看出来了。当字符匹配的时候,我们并不是简单的给相应元素赋上1,而是赋上其左上角元素的值加一。我们用两个标记变量来标记矩阵中值最大的元素的位置,在矩阵生成的过程中来判断当前生成的元素的值是不是最大的,据此来改变标记变量的值,那么到矩阵完成的时候,最长匹配子串的位置和长度就已经出来了。
这样做速度比较快,但是花的空间太多。我们注意到在改进的矩阵生成方式当中,每生成一行,前面的那一行就已经没有用了。因此我们只需使用一维数组即可.
#include <iostream> #include <cstdio> #include <cstring> #include <cstdlib> #define MAXN 1111 #define RST(N)memset(N, 0, sizeof(N)) using namespace std; char s1[MAXN], s2[MAXN]; int LCS(char *s1, char *s2) { int L1 = strlen(s1), L2 = strlen(s2); int* c = new int[L2]; int begin, end = 0, len = 0; for(int i=0; i<L1; i++) { for(int j=L2-1; j>=0; j--) { if(s1[i] == s2[j]) { if(i==0 && j==0) c[j] = 1; else c[j] = c[j-1] + 1; }else c[j] = 0; if(c[j] > len) { len = c[j]; end = j; } } } return len; /* char* pos = new char[len+1]; begin = end - len + 1; for(int i=begin; i<end; i++) { pos[i-begin] = s2[i]; } pos[len] = '\0'; return pos; */ } int main() { int cas; scanf("%d", &cas); getchar(); while(cas--) { scanf("%s %s", s1, s2); printf("%d\n", LCS(s1, s2)); } return 0; }
WOJ 1047 LCS problem (LCS 算法总结 )