7-12 碱基序列匹配 (25 分)

7-12 碱基序列匹配 (25 分)

碱基序列匹配

地理项目是IBM和国家地理学会的合作研究项目,从成千上万捐献的DNA分析地球上人类是如何繁衍的。作为一个IBM的研究人员,请你写一个程序找出给定的DNA片段之间的相同之处,使得对个体的调查相关联。一个DNA碱基序列是指把在分子中发现的氮基的序列给罗列出来。有四种氮基:腺嘌呤 (A)、胸腺嘧啶(T)、鸟嘌呤(G)和胞嘧啶(D),例如,一个6碱基DNA序列可以表示为 TAGACC。给出一个DNA碱基序列的集合,确定在所有序列中都出现的最长的碱基序列。

输入格式:
输入的第一行给出了整数n,表示测试数据集合的数目。每个测试数据集合由下述两部分组成:一个正整数m(2≤m≤10),给出数据集合中碱基序列的数目。m行,每行给出一个60碱基的碱基序列。

输出格式:
对于输入的每个测试数据集合的所有的碱基序列,输出最长的相同的碱基子序列。如果最长的相同的碱基子序列的长度小于3,则输出“no significant commonalities”来代替碱基子序列。如果相同最长长度的子序列有多个,则仅输出按字母排序的第一个。

输入样例1:

3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

输出样例1:

no significant commonalities
AGATAC
CATCATCAT

输入样例2:

5
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
2
GATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGAT
GATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATT
10
GATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGAT
GATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATT
GATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTT
GATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTT
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAT
AAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGAT
AAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGAT
CGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGAT
CCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGAT
CCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGAT

输出样例2:

no significant commonalities
AGATAC
CATCATCAT
TGAT
GAT

输入样例3:

11
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
2
GATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGAT
GATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATT
10
GATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGATGAT
GATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATTGATT
GATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTTGATTT
GATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTTGATTTT
AGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGAT
AAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGATAAGAT
AAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGATAAAGAT
CGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGAT
CCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGATCCGAT
CCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGATCCCGAT
10
GATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC
GCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA
TAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
CGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGAT
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
GAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
GCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGC
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
ACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAC
TCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTC
10
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
10
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
GAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
GAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
GGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
GGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGG
GGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGG
GGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGG
GGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGG
GGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGG
GGGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGGG
4
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
10
AAAAAATAAAAAATAAAAAATAAAAAATAAAAAATAAAAAATAAAAAATAAAAAATAAAA
AAAAACAAAAACAAAAACAAAAACAAAAACAAAAACAAAAACAAAAACAAAAACAAAAAC
AAAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAAAGAAAAAG
AAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAATAAAAAT
AAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAACAAAAC
AAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAG
AAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAATAAAAT
AAACAAACAAACAAACAAACAAACAAACAAACAAACAAACAAACAAACAAACAAACAAAC
AAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAATAAAT
AAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAG
2
GATGATGCATCATGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGACTACTAA
GATGATCATCATACTACTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACTACTAGATGATGCATCATGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGA
ACTACTCGATGATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

输出样例3:

no significant commonalities
AGATAC
CATCATCAT
TGAT
GAT
no significant commonalities
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
no significant commonalities
AAA
ACTACT

此题博主是用字符串和KMP算法及遍历每一个子串做出来的。
代码如下:

#include
using namespace std;

void getNext(string S,int* next)  //得到子串下面的数组
{
    int j,k;
    j=0;k=-1;
    next[0]=-1;  //子串0号元素下面数为-1
    while(j<(S.size()-1))  //对子串所有元素下面赋值
    {
        if(k==-1||S[j]==S[k])  //如果k回到了第一个元素或者第j个元素等于第k个元素
        {
            j++;k++;  //j++;k++;
            next[j]=k;  //子串第j个元素下面的数为k
        }
        else
            k=next[k];  //k为第子串第k个元素下面的数
    }
}

bool bijiao(string T,string *S,int n)  //返回该子串是否是每一个序列的子串
{
    int a=T.size();  //得到子串T的长度
    int next[a];  //建立子串的数组下标
    getNext(T,next);  //给子串数组赋值
    int hhh[n];  //建立一个大小为n的数组判断子串是不是n-1个主串的公共子串
    for(int i=0;i=3;i--)  //从最长字符串长度开始作为子串长度
    {
        if(w!=0&&i=key.size())&&(try1>N; //输入数据集合的数目N
    for(int z=0;z>n;  //输入数据集合中碱基序列的数目n
        string jjsz[n];  //建立jjsz[n]数组存放每一组碱基序列
        for(int x=0;x>jjsz[x];  //存放每一个碱基序列
        }
        chuli(jjsz,n);  //开始处理
    }
    return 0;
}

你可能感兴趣的:(7-12 碱基序列匹配 (25 分))