《Cracking the Coding Interview》——第10章:可扩展性和存储空间限制——题目6

2014-04-24 22:01

题目:你有10亿条url,怎么检测其中时候有重复呢?

解法:Hash,算签名,然后用K-V数据库保存数据查重。

代码:

1 // 10.6 You have 10 billion URLs, how would you do to detect duplicates in them.

2 // Answer:

3 //    1. Use digital sign algorithm to convert string to a number of checksum.

4 //    2. Use this sign as the hash key, if memory allow, use an in-memory hash table to detect duplicates.

5 //    3. If memory won't fit in, use K-V database instead. 10GB scale should be acceptable for one machine, so I won't seek help from another computer.

6 int main()

7 {

8     return 0;

9 }

 

你可能感兴趣的:(interview)