J. Jurassic Jigsaw
Problem Description
The famous Jurassic park biologist Dean O’Saur has dis- covered new samples of what he expects to be the DNA of a dinosaur. With the help of his assistant Petra Dactil, he managed to sequence the samples, and now they are ready for analysis. Dean thinks this dinosaur was affected with a particular disease mutating the DNA of some cells.
To verify his theory, he needs to compute the most likely evolutionary tree from the samples, where the nodes are the samples of DNA. Because there is no temporal data of the DNA samples, he is not concerned where the root of the tree is.
Dean considers the most likely evolutionary tree, the tree with smallest unlikeliness: the unlikeliness of a tree is defined as the sum of the weights of all edges, where the weight of an edge is the number of positions at which the two DNA strings are different.
As a world expert in data trees, he asks you to reconstruct the most likely evolutionary tree.
In the first sample, the optimal tree is AA - AT - TT - TC . The unlikeliness of the edge between AA and AT edge is 1, because the strings AA and AT differ in exactly 1 position. The weights of the other two edges are also 1, so that the unlikeliness of the entire tree is 3. Since there is no tree of unlikeliness less than 3, the minimal unlikeliness of an evolutionary tree for this case is 3.
Input
• The first line consists of two integers 1 ≤ n ≤ 1000 and 1 ≤ k ≤ 10, the number of samples and the length of each sample respectively.
• Each of the next n lines contains a string of length k consisting of the characters in
ACTG.
Output
• On the first line, print the minimal unlikeliness of the evolutionary tree.
• Then, print n − 1 lines, each consisting of two integers 0 ≤ u, v < n, indicating that in the most likely evolutionary tree, there is an edge between DNA string u and v. If there are multiple answers possible, any of them will be accepted.
Sample Input
4 2
AA
AT
TT
TC
Sample Output
3
0 1
1 2
2 3
Sample Input2
4 1
A
A
G
T
Sample Output2
2
0 1
0 2
0 3
Sample Input3
5 6
GAACAG
AAAAAA
AACATA
GAAAAG
ATAAAT
Sample Output3
7
0 3
1 2
1 3
1 4
#include
#include
#include
#include
#include
#include
#include
using namespace std;
typedef long long ll;
int father[10000];
string s[10000];
int n,m,maxn,k=0,q;
vector<pair<int,int> >ed;
bool cmp(const pair<int,int> pp, const pair<int ,int> qq)
{
return pp.first<qq.first; /// first 小的在前
}
struct node
{
int x,y,z;
bool operator<(const node &x) const {
return z<x.z;///只比较权值
}
}a[1000010];
void init()
{
for(int i=0;i<=n;i++)
{
father[i]=i;
}
}
int found(int x)
{
if(x==father[x])
return x;
else
{
father[x]=found(father[x]);
return father[x];
}
}
int unite(int x,int y)
{
int c=found(x);
int d=found(y);
if(c!=d)
father[d]=c;///y的祖先节点的父节点是x
return 0;
}
int main()
{
cin>>n>>q;
init();
for(int i=0;i<n;i++)
{
cin>>s[i];
}
int w,cnt=0;
for(int i=0;i<n-1;i++)
{
for(int j=i+1;j<n;j++)
{
w=0;
for(int c=0;c<q;c++)
{
if(s[i][c]!=s[j][c])w++;
}
cnt++;
a[cnt].x=i;
a[cnt].y=j;
a[cnt].z=w;
}
}
sort(a+1,a+cnt+1);
ll ans=0;
for(int i=1;i<=cnt;i++)
{
if(found(father[a[i].x])!=found(father[a[i].y]))
{
unite(a[i].x,a[i].y);
ans+=a[i].z;
ed.push_back(make_pair(a[i].x,a[i].y));
k++; ///记录连接的变数
}
if(k==n-1)break; ///最短边是n-1条,此时最小生成树已生成
}
sort(ed.begin(),ed.end(),cmp);
cout<<ans<<endl;
for(int i=0;i<ed.size();i++)
{
cout<<ed[i].first<<" "<<ed[i].second<<endl;
}
return 0;
}