find the two same numbers in 1 million random numbers

Problem:

  There are two same numbers and other unique numbers in a set which contains 1 million random numbers totally. Find out the two same numbers.

Ideas:

  If the range of numbers is small, we can use two bitmaps to solve it simply. If the range is too large, it seems to be a  reasonable method using a hash function to map numbers into the range [0, 1m] . We notice that numbers in this set is random, so we can easily select  "module/1m" as hash function.

  Let's compute the probability of  collision of our hash function for our random numbers. Actually, it's not small! The probability is about 0.36 if random integers are generated from [1, MAX_INT]. It's easy to identify we can't decrease the probability of collision by selecting proper hash function. So can save about 3/5 memory by using this method than using tree-map directly.

  Above methods is predicated on the random numbers are generated from an range uniformly.

Solution:

  

template <size_t size>
int find(int v[]) {
  std::bitset<size> indicator;
  std::map<int, int> collision;
  int pos;
  for(int i = 0; i < size; i++) {
    pos = hash(v[i]);
    if (indicator.test(pos)) {
      collision.insert(std::make_pair(v[i], 0));
    } else {
      indicator.set(pos);
    }
  }
  std::cerr<<"map size:"<<collision.size()<<std::endl;
  std::map<int, int>::iterator iter;
  for(int i = 0; i < size; i++) {
    iter = collision.find(v[i]);
    if (iter != collision.end()) {
      iter->second += 1;
      if (iter->second == 2) {
        return v[i];
      }
    }
  }
  return -1;
}


你可能感兴趣的:(test)