Reservoir Sampling Example Question:
Randomly choosing k samples from a list of n items, where n is either a very large or unknown number. Typically n is large enough that the list doesn’t fit into main memory.
从最简单的方法开始:
int r[] with size k;
// O(k^2)
while (i != k) {
srand(time(NULL));
int j = rand() % n
if (input[n] not exist in r) r[i++] = input[n];
else continue;
}
如果优化成O(n)?
通过Reservoir (rehzrvwaar) Sampling原理:
这个例子是从n中间抽1个的例子:
https://www.youtube.com/watch?v=A1iwzSew5QY
言而总之,总而言之,对于每一个新的m th选择,我最后的选择是它的概率是:1/m * (m / m + 1) * (m + 1 / m + 2) + ... + n - 1 / n. 其中n是总样本数。所以最后的概率是:1/n.
那具体到上面的从n当中抽出k的例子,做法就是,先把前k个数复制过来。之后每个新的数 k, k + 1, k + 2... n - 1,generate一个random number rand, rand % (i + 1)的值如果是< k的话,则交换更新里面的k。怎么证明这样的话,每个数能选进去的概率是 k / n呢?
我们从第n个数开始(坐标是n - 1)开始看,它能被选中的几率是:k / n (当次产生的random number)。那从第n - 1个数看,它能被选中的几率是: k / n - 1 (当次产生的random number) * n - 1 / n(没有被下一次的选中为被替代的那个数) = k / n.
那再看看原来被一开始复制过来的k个数,他们能被最终选中的概率是:(k / k + 1) * (k + 1 / k + 2) * ... * (n - 1 / n) 就是第k + 1/ k + 2 ... 第n次的random number选择的不是自己的概率。还是 k / n.
void selectKItems(int stream[], int n, int k)
{
int i; // index for elements in stream[]
// reservoir[] is the output array. Initialize
// it with first k elements from stream[]
int reservoir[k];
for (i = 0; i < k; i++)
reservoir[i] = stream[i];
// Use a different seed value so that we don't get
// same result each time we run this program
srand(time(NULL));
// Iterate from the (k+1)th element to nth element
for (; i < n; i++)
{
// Pick a random index from 0 to i.
int j = rand() % (i + 1);
// If the randomly picked index is smaller than k,
// then replace the element present at the index
// with new element from stream
if (j < k)
reservoir[j] = stream[i];
}
cout << "Following are k randomly selected items \n";
printArray(reservoir, k);
}
Leetcode 例题:
398. Random Pick Index
Given an array of integers with possible duplicates, randomly output the index of a given target number. You can assume that the given target number must exist in the array.
Note:
The array size can be very large. Solution that uses too much extra space will not pass the judge.
Example:
int[] nums = new int[] {1,2,3,3,3}; Solution solution = new Solution(nums); // pick(3) should return either index 2, 3, or 4 randomly. Each index should have equal probability of returning. solution.pick(3); // pick(1) should return 0. Since in the array only nums[0] is equal to 1. solution.pick(1);
class Solution {
public:
Solution(vector& nums) {
srand(time(NULL));
num_size_ = nums.size();
store_= nums;
}
int pick(int target) {
int res = -1;
int found_num = 0;
for (int i = 0; i < num_size_; ++i) {
if (store_.at(i) != target) {
continue;
}
++found_num;
if (rand() % found_num == 0) {
res = i;
}
}
return res;
}
private:
size_t num_size_ = 0;
vector store_;
};
说明:
第一个数被选中的几率是1,然后它能survive到最后的: 1 * 1/2 * 2/3 * m - 1/m。m是这个数据的个数。
382. Linked List Random Node
Given a singly linked list, return a random node's value from the linked list. Each node must have the same probability of being chosen.
Follow up:
What if the linked list is extremely large and its length is unknown to you? Could you solve this efficiently without using extra space?
Example:
// Init a singly linked list [1,2,3]. ListNode head = new ListNode(1); head.next = new ListNode(2); head.next.next = new ListNode(3); Solution solution = new Solution(head); // getRandom() should return either 1, 2, or 3 randomly. Each element should have equal probability of returning. solution.getRandom();
class Solution {
public:
/** @param head The linked list's head.
Note that the head is guaranteed to be not null, so it contains at least one node. */
Solution(ListNode* head) {
head_ = head;
srand(time(NULL));
}
/** Returns a random node's value. */
int getRandom() {
int index = 1;
ListNode* cur = head_;
int res = cur->val;
cur = cur->next;
while (cur != nullptr) {
++index;
if (rand() % index == 0) {
res = cur->val;
}
cur = cur->next;
}
return res;
}
private:
ListNode* head_ = nullptr;
};
道理跟上面的差不多。
Shuffle a given array using Fisher-Yates shuffle algorithm
Given an array, write a program to generate a random permutation of array elements. This question is also asked as “shuffle a deck of cards” or “randomize a given array”. Here shuffle means that every permutation of array element should equally likely.
简单粗暴的做法:
Let the given array be arr[]. A simple solution is to create an auxiliary array temp[] which is initially a copy of arr[]. Randomly select an element from temp[], copy the randomly selected element to arr[0] and remove the selected element from temp[]. Repeat the same process n times and keep copying elements to arr[1], arr[2], … . The time complexity of this solution will be O(n^2).
那如果O(n)解决呢?
To shuffle an array a of n elements (indices 0..n-1):
for i from n - 1 downto 1 do
j = random integer with 0 <= j <= i
exchange a[j] and a[i]
说明一下:
The probability that ith element (including the last one) goes to last position is 1/n, because we randomly pick an element in first iteration.
The probability that ith element goes to second last position can be proved to be 1/n by dividing it in two cases.
Case 1: i = n-1 (index of last element):
The probability of last element going to second last position is = (probability that last element doesn’t stay at its original position) x (probability that the index picked in previous step is picked again so that the last element is swapped)
So the probability = ((n-1)/n) x (1/(n-1)) = 1/n
Case 2: 0 < i < n-1 (index of non-last):
The probability of ith element going to second position = (probability that ith element is not picked in previous iteration) x (probability that ith element is picked in this iteration)
So the probability = ((n-1)/n) x (1/(n-1)) = 1/n
void randomize (int arr[], int n)
{
// Use a different seed value so that
// we don't get same result each time
// we run this program
srand (time(NULL));
// Start from the last element and swap
// one by one. We don't need to run for
// the first element that's why i > 0
for (int i = n - 1; i > 0; i--)
{
// Pick a random index from 0 to i
int j = rand() % (i + 1);
// Swap arr[i] with the element
// at random index
swap(&arr[i], &arr[j]);
}
}