记录下minhash计算流程

以下是从教科书截取过来的讲解的非常清晰,记录一下:

记录下minhash计算流程_第1张图片


Now, let us simulate the algorithm for computing the signature matrix.
Initially, this matrix consists of all ∞’s:

记录下minhash计算流程_第2张图片

First, we consider row 0 of Fig. 3.4. We see that the values of h1(0) and h2(0) are both 1. The row numbered 0 has 1’s in the columns for sets S1 and S4, so only these columns of the signature matrix can change. As 1 is less than ∞, we do in fact change both values in the columns for S1 and S4. The current estimate of the signature matrix is thus:

记录下minhash计算流程_第3张图片


Now, we move to the row numbered 1 in Fig. 3.4. This row has 1 only in S3, and its hash values are h1(1) = 2 and h2(1) = 4. Thus, we set SIG(1, 3) to 2 and SIG(2, 3) to 4. All other signature entries remain as they are because their columns have 0 in the row numbered 1. The new signature matrix:


记录下minhash计算流程_第4张图片


记录下minhash计算流程_第5张图片

记录下minhash计算流程_第6张图片

你可能感兴趣的:(记录下minhash计算流程)