http://www.codeforces.com/contest/644/problem/C
There are some websites that are accessible through several different addresses. For example, for a long time Codeforces was accessible with two hostnames codeforces.com and codeforces.ru.
You are given a list of page addresses being queried. For simplicity we consider all addresses to have the form http:// [/ ], where:
— server name (consists of words and maybe some dots separating them),
/ — optional part, where consists of words separated by slashes.
We consider two to correspond to one website if for each query to the first there will be exactly the same query to the second one and vice versa — for each query to the second there will be the same query to the first one. Take a look at the samples for further clarifications.
Your goal is to determine the groups of server names that correspond to one website. Ignore groups consisting of the only server name.
Please note, that according to the above definition queries http:// and http:// / are different.
The first line of the input contains a single integer n (1 ≤ n ≤ 100 000) — the number of page queries. Then follow n lines each containing exactly one address. Each address is of the form http:// [/ ], where:
consists of lowercase English letters and dots, there are no two consecutive dots, doesn't start or finish with a dot. The length of is positive and doesn't exceed 20.
consists of lowercase English letters, dots and slashes. There are no two consecutive slashes, doesn't start with a slash and its length doesn't exceed 20.
Addresses are not guaranteed to be distinct.
First print k — the number of groups of server names that correspond to one website. You should count only groups of size greater than one.
Next k lines should contain the description of groups, one group per line. For each group print all server names separated by a single space. You are allowed to print both groups and names inside any group in arbitrary order.
10
http://abacaba.ru/test
http://abacaba.ru/
http://abacaba.com
http://abacaba.com/test
http://abacaba.de/
http://abacaba.ru/test
http://abacaba.de/test
http://abacaba.com/
http://abacaba.com/t
http://abacaba.com/test
1
http://abacaba.de http://abacaba.ru
现在给你n个网站地址,这个网站地址包括域名和他的子地址
然后如果有两个网站的子地址都是一样的话,那么就说明这两个网站其实是一样的
现在问你一共有多少个一样的网址,输出出来。
可以hash,但是这道题卡单hash哦
其实我们可以用map乱搞一波……
先存每个域名的子地址集
然后再通过子地址集存每一个域名就好了。
#include<bits/stdc++.h>
using namespace std;
map<string,vector<string> >mp;
map<vector<string>,vector<string> >ans;
int main()
{
int n;scanf("%d",&n);
for(int i=0;i<n;i++)
{
string s;
cin>>s;
s=s.substr(7)+'/';
int pos = s.find_first_of('/');
mp[s.substr(0,pos)].push_back(s.substr(pos));
}
for(auto &p:mp)
{
sort(p.second.begin(),p.second.end());
p.second.erase(unique(p.second.begin(),p.second.end()),p.second.end());
ans[p.second].push_back(p.first);
}
int tot = 0;
for(auto &p:ans)
if(p.second.size()>1)
tot++;
printf("%d\n",tot);
for(auto &p:ans)
{
if(p.second.size()>1)
{
for(auto &t:p.second)
cout<<"http://"<<t<<" ";
cout<<endl;
}
}
}