fingerprinting algorithm

    In computer science, a fingerprinting algorithm is a procedure that maps an arbitrarily large data item (such as a computer file) to a much shorter bit string, its fingerprint, that uniquely identifies the original data for all practical purposes just as human fingerprints uniquely identify people for practical purposes. This fingerprint may be used for data deduplication purposes.
    Fingerprints are typically used to avoid the comparison and transmission of bulky data. For instance, a web browser or proxy server can efficiently check whether a remote file has been modified, by fetching only its fingerprint and comparing it with that of the previously fetched copy 。
    Fingerprint functions may be seen as high-performance hash functions used to uniquely identify substantial blocks of data where cryptographic hash functions may be unnecessary. Audio fingerprintalgorithms should not be confused with this type of fingerprint function.


Rabin's fingerprinting algorithm 算法思想如下:

    假设A([b1,…,bm])是包含m个二进制字符的二进制字符串,那么可以根据A构造相应的(m-1)度的多项式如下,其中t是不定元。
    A(t)=b1tm-1 + b2tm-2+…+bm-1t+bm                                       (1)
    给定一个度为k的多项式P(t),如下:
    P(t)=a1tk+a2tk-1+…+ak-1t+ak                                           (2)
    那么A(t) 除以P(t)的余数f (t)的度数为(k-1)。对于给定的字符串A,定义A的指纹f(A)如下:


wiki上解释:

    Given an n-bit message m0,...,mn-1, we view it as a polynomial of degree n-1 over the finite field.
    We then pick a random irreducible polynomial p(x) of degree k over GF(2), and we define the fingerprint of m to be the remainder  after division of  by  over GF(2) which can be viewed as a polynomial of degree k-1 or as a k-bit number.

    f(A)=A(t) mod P(t)                                                     (3)

你可能感兴趣的:(fingerprinting algorithm)