详细信息:http://www.oschina.net/code/snippet_99767_1217
1.一个Hash算法首先要有一个Hash表:
////////////////////////////////////////////////////////////////////////// // 哈希索引表定义 typedef struct _HASHTABLE { long nHashA; long nHashB; bool bExists; }HASHTABLE, *PHASHTABLE ; m_tablelength = nTableLength; //初始化hash表 m_HashIndexTable = new HASHTABLE[nTableLength]; for ( int i = 0; i < nTableLength; i++ ) { m_HashIndexTable[i].nHashA = -1; m_HashIndexTable[i].nHashB = -1; m_HashIndexTable[i].bExists = false; }
2.还要有一个压缩算法,把字符串压缩成32位无符号整数:
unsigned long StringHash::HashString(const string& lpszString, unsigned long dwHashType) { unsigned char *key = (unsigned char *)(const_cast<char*>(lpszString.c_str())); unsigned long seed1 = 0x7FED7FED, seed2 = 0xEEEEEEEE; int ch; while(*key != 0) { ch = toupper(*key++); seed1 = cryptTable[(dwHashType << 8) + ch] ^ (seed1 + seed2); seed2 = ch + seed1 + seed2 + (seed2 << 5) + 3; } return seed1; }
3.在上面那个压缩算法中有一个cryptTable,我们要先把它处理一下:
void StringHash::InitCryptTable() { unsigned long seed = 0x00100001, index1 = 0, index2 = 0, i; for( index1 = 0; index1 < 0x100; index1++ ) { for( index2 = index1, i = 0; i < 5; i++, index2 += 0x100 ) { unsigned long temp1, temp2; seed = (seed * 125 + 3) % 0x2AAAAB; temp1 = (seed & 0xFFFF) << 0x10; seed = (seed * 125 + 3) % 0x2AAAAB; temp2 = (seed & 0xFFFF); cryptTable[index2] = ( temp1 | temp2 ); } } }
4.现在就可以Hash一个字符串了,我们调用了三次压缩算法,获取了三个hash值,第一个hash值用来计算其在hash表中的索引,还有两个用来处理碰撞情况(两个不同的字符串三次hash的结果是一样的概率基本为0)
bool StringHash::Hash( string lpszString, //url const PLinkNode node //url 对应节点 ) { const unsigned long HASH_OFFSET = 0, HASH_A = 1, HASH_B = 2; unsigned long nHash = HashString(lpszString, HASH_OFFSET); unsigned long nHashA = HashString(lpszString, HASH_A); unsigned long nHashB = HashString(lpszString, HASH_B); unsigned long nHashStart = nHash % m_tablelength, nHashPos = nHashStart; while ( m_HashIndexTable[nHashPos].bExists) { nHashPos = (nHashPos + 1) % m_tablelength; if (nHashPos == nHashStart) //一个轮回 { //hash表中没有空余的位置了,无法完成hash return false; } } m_HashIndexTable[nHashPos].bExists = true; m_HashIndexTable[nHashPos].nHashA = nHashA; m_HashIndexTable[nHashPos].nHashB = nHashB; return true; }
5.我们还需要一个接口,用来判断一个字符串是否被hash过
unsigned long StringHash::Hashed(const string& url) { const unsigned long HASH_OFFSET = 0, HASH_A = 1, HASH_B = 2; //不同的字符串三次hash还会碰撞的几率无限接近于不可能 unsigned long nHash = HashString(url, HASH_OFFSET); unsigned long nHashA = HashString(url, HASH_A); unsigned long nHashB = HashString(url, HASH_B); unsigned long nHashStart = nHash % m_tablelength, nHashPos = nHashStart; while ( m_HashIndexTable[nHashPos].bExists) { if (m_HashIndexTable[nHashPos].nHashA == nHashA && m_HashIndexTable[nHashPos].nHashB == nHashB) return nHashPos; else nHashPos = (nHashPos + 1) % m_tablelength; if (nHashPos == nHashStart) break; } return -1; //没有找到 }
至此,一个简单的字符串hash算法就完成了。
---------------------------------------------------------------------------------------------------------------------------------------------------
总结:
一开始对下面的代码不是很理解
while ( m_HashIndexTable[nHashPos].bExists) { nHashPos = (nHashPos + 1) % m_tablelength; if (nHashPos == nHashStart) //一个轮回 { //hash表中没有空余的位置了,无法完成hash return false; } }
其实,就是一个hash值与其他的碰撞了,那就先来后到,挑下面相邻的地方;这并不会影响Hashed()的匹配,匹配时也是按照这样的方法,并且还要匹配那两个hash值,有点像从一个房间开始,挨个敲门,看里面是不是要找的人!
所以说这是一个空间换时间的算法,hash表越大,发生碰撞的可能性就越小,算法复杂度就越接近于O(1);