ZZ Hamming distance

 

Hamming distance

 

n information theory, the Hamming distance, named after Richard Hamming, is the number of positions in two strings of equal length for which the corresponding elements are different. Put another way, it measures the number of substitutions required to change one into the other.
    汉明距离是以理查德·卫斯里·汉明的名字命名的。在信息论中,两个等长字符串之间的汉明距离是两个字符串对应位置的不同字符的个数。换句话说,它就是将一个字符串变换成另外一个字符串所需要替换的字符个数。

    For example:
    例如:

    The Hamming distance between 1011101 and 1001001 is 2. 
    The Hamming distance between 2143896 and 2233796 is 3. 
    The Hamming distance between "toned" and "roses" is 3. 
    1011101 与 1001001 之间的汉明距离是 2。
    2143896 与 2233796 之间的汉明距离是 3。
    "toned" 与 "roses" 之间的汉明距离是 3。

    The Hamming weight of a string is its Hamming distance from the zero string (string consisting of all zeros) of the same length. That is, it is the number of elements in the string which are not zero: for a binary string this is just the number of 1's, so for instance the Hamming weight of 11101 is 4.
    汉明重量是字符串相对于同样长度的零字符串的汉明距离,也就是说,它是字符串中非零的元素个数:对于二进制字符串来说,就是 1 的个数,所以 11101 的汉明重量是 4。

    The Hamming distance between two words a and b, viewed as elements of a vector space, can then be seen as the Hamming weight of a-b. If a and b are binary strings this is equivalent to a+b and to a XOR b. The Hamming distance is also equivalent to the Manhattan distance between two vertices in an n-dimensional hypercube, where n is the length of the words.
    如果把a和b两个单词看作是向量空间中的元素,则它们之间的汉明距离等于它们汉明重量的差a-b。如果是二进制字符串a和b,汉明距离等于它们汉明重量的和a+b或者a和b汉明重量的异或a XOR b。汉明距离也等于一个n维的超立方体上两个顶点间的曼哈顿距离,n指的是单词的长度。

    The Hamming distance is used in telecommunication to count the number of flipped bits in a fixed-length binary word, an estimate of error, and so is sometimes called the signal distance. Hamming weight analysis of bits is used in several disciplines including information theory, coding theory, and cryptography. For comparing strings of different lengths, or strings where insertions or deletions are expected, not just substitutions, a more sophisticated metric like the Levenshtein distance is more appropriate.
    汉明距离可以在通信中累计定长二进制字中发生翻转的错误数据位,所以它也被称为信号距离。汉明重量分析在包括信息论、编码理论、密码学等领域都有应用。但是,如果要比较两个不同长度的字符串,不仅要进行替换,而且要进行插入与删除的运算,在这种场合下,通常使用更加复杂的编辑距离等算法。

你可能感兴趣的:(数据挖掘)