Google Interview - Compute the h-index of a list of papers

Google:

Compute the h-index of a list of papers, given their citation count. Can you do it in linear time? How about a distributed algorithm for the task?


Facebook:

Given: for every paper authored, there is a citation count vector. The h-index is a measure of researcher importance. h-index: The largest number i such that there are i papers each with at least i citations. 

1. Suppose that the citation-vector is sorted, how to efficiently compute the h-index? 

2. Suppose that the citation-vector is not sorted, how to efficiently compute the h-index? time complexity? an algorithm with time complexity n?


Princeton algorithm:

Given an array of N positive integers, its h-index is the largest integer h such that there are at least h entries in the array greater than or equal to h. Design an algorithm to compute the h-index of an array. 

Hint: median or quicksort-like partitioning and divide-and-conquer.


Solution:

- Create an int[] Histogram as big as the maximum number of publications of any particular scientist). 

- If all publication reference counts are stored in another int[] references, then go over this array and, on each publication, if it's reference count is R, then do Histogram[R]++. While doing this, keep the maximum reference count in Max. 

- After building the histogram, do a decreasing loop on int[] Histogram from i=Max, adding Histogram[i] values to int hIndex. When hIndex >= i, return i as the hIndex. 

... As to the distributed part, let several machines build the Histogram of disjoint sets of somebody's publications, and then have one machine add up those histograms and return hIndex as described above.


Reference:

http://www.careercup.com/question?id=14585874

http://algs4.cs.princeton.edu/25applications/

http://www.glassdoor.com/Interview/Compute-the-h-index-of-a-list-of-papers-given-their-citation-count-Can-you-do-it-in-linear-time-How-about-a-distributed-QTN_572531.htm


你可能感兴趣的:(Algorithm,Google,Facebook)