首页 > 代码库 > poj-3080 blue jeans

poj-3080 blue jeans

Description

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated. 

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers. 

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC. 

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.

Input

Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:
  • A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
  • m lines each containing a single base sequence consisting of 60 bases.

Output

For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.

Sample Input

3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Sample Output

no significant commonalities
AGATAC
CATCATCAT

题目大意:

给出一个DNA碱基序列的集合,确定在所有序列中都出现的最长的碱基序列!


说一下输出的问题吧:输出最长的相同的碱基子序列,如果最长的相同的碱基子序列的长度小于3,则输出那一串英文;如果相同的最长长度的子序列有多个,则仅输出按字典序排序的第一个即可!


这个题我用的是BF算法,很黄很暴力的说!

但是由于字符串的长度仅为60,所以不用担心超时的问题了!

这个题我使用了#include<cstring>头文件里的几个函数-->strncpy(),strstr(),strcpy()这三个函数不知道的话可以百度下,这样同时可以加深自己对这三个函数的理解!至于是干什么的,下面的代码里我会有注释的!


这样写起来,方便得多,代码看起来也一目了然,不会太繁琐、枯燥!

代码如下:(关键步骤都是有详细的注释的!)


#include<stdio.h>
#include<string.h>
#include<iostream>
using namespace std;
int main()
{
    int T,m;
    char ss[13][520];
    cin>>T;
    while(T--)
    {
        cin>>m;
        for(int i=0; i<m; i++)
            cin>>ss[i];
        char str[520],te[520];//te[]为临时字符串!
        int len=strlen(ss[0]);
        int maxlong=0;//设当前最长公共字串为0!
        for(int i=0; i<len; i++)
        {
            //由于要求最长字符串长度必须大于3,所以j=i+2;
            for(int j=i+2; j<len; j++)
            {
                int telen=j-i+1;//临时字符串长度!
                strncpy(te,ss[0]+i,telen);//这个函数作用是复制指定长度的字符串字串!
                te[telen]='\0';
                int flag=0;//标记te字符串是不是p[1]到p[m-1]的公共字串!
                for(int k=1; !flag&&k<m; k++)
                {
                    //如果未找到
                    if(strstr(ss[k],te)==NULL)
                        flag=1;//strstr函数是搜索一个字符串在另一个字符串中是否第一次出现!如果没出现过则返回null或者false!
                }//这个函数前后的参数分别是strstr(另一个字符串首地址,一个字符串首地址)!
                if(!flag)//搜索到了!
                    if(telen>maxlong||(telen==maxlong&&strcmp(te,str)<0))//按字典序排序!
                    {
                        maxlong=telen;
                        strcpy(str,te);
                    }
            }
        }
        if(maxlong>=3)
            cout<<str<<endl;
        else
            cout<<"no significant commonalities"<<endl;
    }
    return 0;
}

下面附上用KMP算法解决的代码!

#include<cstring>
#include<stdio.h>
#include<iostream>
using namespace std;
const int maxn = 70;
char b[10][maxn], p[maxn];
int n, ma, lenp, next[maxn];
void kk()
{
    int i = 0, j = -1;
    next[0] = -1;
    while(i < lenp)
    {
        if(j == -1 || p[i] == p[j])
        {
            i ++; j ++;
            next[i] = j;
        }
        else j = next[j];
    }
    int k, m;
    ma = 100;
    for(int k = 1; k < n; k ++)
    {
        i = 0; j = 0; m = 0;
        while(i < 60 && j < lenp)
        {
            if(j == -1 || b[k][i] == p[j])
            {
                i ++; j ++;
            }
            else j = next[j];
            if(j > m) m = j;
        }
        if(m < ma) ma = m;
    }
}
int main()
{
    int t;
    char a[maxn];
    scanf("%d", &t);
    while(t --)
    {
        scanf("%d", &n);
        for(int i = 0; i < n; i ++)
        scanf("%s", b[i]);
        int ans = 0;
        for(int i = 0; i <= 57; i ++)
        {
            strcpy(p, b[0] + i);

            lenp = 60 - i;
            kk();
            if(ans < ma)
            {
                ans = ma;
                strncpy(a, b[0] + i, ans);
                a[ans] = '\0';
            }
            else if(ans == ma)
            {

                char tmp[maxn];
                strncpy(tmp, b[0] + i, ans);
                tmp[ans] = '\0';
                if(strcmp(tmp, a) < 0)
                strcpy(a, tmp);
            }
        }
        if(ans >= 3)
        printf("%s\n", a);
        else
        printf("no significant commonalities\n");
    }
    return 0;
}