DNA Sorting
Time Limit: 1000MS |
|
Memory Limit: 10000K |
Total Submissions: 79623 |
|
Accepted: 31994 |
Description
One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).
You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.
Input
The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.
Output
Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.
Sample Input
10 6
AACATGAAGG
TTTTGGCCAA
TTTGGCCAAA
GATCAGATTT
CCCGGGGGGA
ATCGATGCAT
Sample Output
CCCGGGGGGA
AACATGAAGG
GATCAGATTT
ATCGATGCAT
TTTTGGCCAA
TTTGGCCAAA
Source
East Central North America 1998
分析如下:
此题在解得时候我尝试用二位数组存储字符串,发现在处理时十分麻烦,后来选择结构体处理,并且在C++中可以用string这个类型
解法一:
#include <iostream>
#include <string>
#include <algorithm>
#include <cstdio>
using namespace std;
typedef struct
{
string str;
int count;
}DNA;
//进行字符串比较如果小于返回负数,如果相等返回0,如果大于返回正数
bool compare(DNA a,DNA b)
{
return a.count<b.count;
}
int main()
{
DNA a[100];
int m,n;
cin>>m>>n;
int i;
for(i=0;i<n;i++)
{
cin>>a[i].str;
a[i].count=0;
int k,j;
for(j=1;j<m;j++)
{
for(k=0;k<j;k++)
{
if(a[i].str[j]<a[i].str[k])
a[i].count++;
}
}
}
//STL中的排序函数
sort(a,a+n,compare);
for(i=0;i<n;i++)
{
cout<<a[i].str<<endl;
}
return 0;
}
解法二:
这个解法我是在网上看到网友发出来的,时间复杂度为O(n);比较快,此处借阅:
地址为:http://blog.csdn.net/china8848/article/details/2227131
什么是逆序数:
跟标准列相反序数的总和,比如说,标准列是1 2 3 4 5,那么 5 4 3 2 1 的逆序数算法:看第二个,4之前有一个5,在标准列中5在4的后面,所以记1个,类似的,第三个 3 之前有 4 5 都是在标准列中3的后面,所以记2个,同样的,2 之前有3个,1之前有4个
将这些数加起来就是逆序数=1+2+3+4=10。
再举一个 2 4 3 1 5 。4 之前有0个 3 之前有1个 1 之前有3个 5 之前有0个
所以逆序数就是1+3=4。
归并排序(MegerSort)求逆序数思想:
如果比较任意两个数字,那么时间复杂度是O(n^2),对于较大的n是无法接受的,MegerSort的时间复杂度是O(nlogn).
合并的时候一个指针指i向左边的元素l,有个指针j指向右边的元素r,当r小于l的时候,左边i到mid中的元素则与r构成逆序对。只用将逆序数总数加上mid-i+1就可以了。关于这个的应用参见上一篇解题报告。
对于本题,由于字符串中字符的范围是确定的,只有A C G T,所以可以在O(n)内求出逆序数方法是:至后向前插入字符,以下四个参数分别代表如果在左边加上A C G T,逆序数会加上多少,left_A left_C left_G left_T.首先初始化为0,然后至后向前插入字符,如果插入的是A:以后在前面插入C G T,逆序数都会增加,所以:left_C++;left_G++;left_T++;如果插入的是C:逆序数加上left_C,以后在前面插入G T,逆序数都会增加,所以:left_G++;left_T++;如果插入的是G:逆序数加上left_G,以后在前面插入T,逆序数都会增加,所以:left_T++;如果插入的是T:逆序数加上left_T,以后在前面插入字符,逆序数都不会增加。因此有代码:
for(i=length-1;i>=0;i--)
{
a=str[i];
switch(a)
{
case 'A':
left_C++;
left_G++;
left_T++;
break;
case 'C':
left_G++;
left_T++;
count+=left_C;
break;
case 'G':
left_T++;
count+=left_G;
break;
case 'T':
count+=left_T;
break;
}
}
return count;