Here is what the visualization looks like when it is first initalized.
In the beginning of the algorithm, we start out with our two storage areas: the directory and the buckets.
The directory has a property associated with it called the directory depth (abbreviated as dd) and each bucket has a property associated with it called the bucket depth (abbreviated as bd).
The directory is an array of pointers to buckets. The color code scheme is used to indicate which bucket each directory element points at.
Three primary operations done on a hash table are: Add an element, Remove an element, and Search for an element. Regardless of which operations are being done, the first steps are always the same. First, the element to be inserted is sent to a hash function that returns a bit string. From this bit string ,we look at the most significant dd bits to obtain the directory element. Once we have the directory element we then load into memory the bucket that it references.
In this example we are performing an operation on 125. When we send the value 125 to the hash function it returns 01111101. Since the directory depth is 3, we look at the most significant 3 bits to obtain the directory element 011. Directory element 011 points to bucket B3. This is the bucket that needs to be loaded into memory.
- With the bucket loaded into memory, it is searched for the element we are looking for. If element is found in the bucket, then the element exists in the database; otherwise it does not exist.
Adding an Element
Removing an Element- If the bucket is not full, then the element we are adding is placed in an available location.
- If the bucket is full, then a new bucket is created and a Bucket Split is performed
- A new bucket depth is given to both of these buckets. It is equal to the number of most significant bits that all of these elements, including the element being added, share in common plus 1.
- The elements in the full bucket are redistributed between the current bucket the elements are located in and the new bucket created. If their most significant bd bit is 0 then the element stays where it is; otherwise if the most signficant bd bit is 1 then this element is moved to the new bucket created.
- If the new bd becomes larger than the directory depth then the directory depth must be changed to the bucket depth of this bucket and then the directory size must be expanded to 2 raised to the "new directory depth"th power
- The pointers in the directory need to be adjusted. In my coding I adjusted the pointers in a similar mannerly as demonstrated in this video at google videos: Click Here To Watch This Video
- The pointers that used to point to this bucket are now pointed to the bucket it has merged to.
- The bucket depth of the merged bucket is decreased until every element of the directory on the (bd)th most significant bits of the directory addresses don't point to this merged bucket
- If all of the bucket depths of each bucket are now smaller than the directory depth, then the directory depth must be changed to the largest bucket depth available and then the directory size must be shrunk to 2 raised to the power of the new directory depth.
- The pointers in the directory need to be adjusted. Again I used the same method for readjustement as demonstrated in the google video.
(1) 查找h(x)=18=10010,取末两位10,由于10位于Next=1和N=4之间,对用桶还未进行分裂,直接取10作为桶编号,在该桶中进行查找。
(2) 查找h(x)=32=10000,取末两位00,由于00不在Next=1和N=4之间,表示该桶已经分裂,再向前取一位,因此桶编号为000,在该桶中进行查找。
(3) 查找h(x)=44=101100,取末两位00,由于00不在Next=1和N=4之间,表示该桶已经分裂,再向前取一位,因此桶编号为100,在该桶中进行查找。
线性散列的删除操作是插入操作的逆操作,若溢出块为空,则可释放。若删除导致某个桶元素变空,则Next指向上一个桶。当Next减少到0,且最后一个桶也是空时,则Next指向N/2 -1的位置,同时Level值减1。
ficiencylinear hash
一,在介绍linear hash 之前,需要对动态hash和静态hash这两个概念做一下解释:
静态hash:是指在hashtable初始化得时候bucket的数目就已经确定了,当需要插入一个元素的时候,通过hash函数找到对应的bucket number,之后插入即可。不论用什么冲突解决方法,当插入的元素越来越多时,在这个hash表中查找元素的效率会变的越来越低。
动态hash:是指在hashtable的bucket的数目不是确定的,而是会根据插入元素的多少而实现动态的增减,当元素变多得时候,bucket会动态增加,这样就可以解决静态hash的查找效率低得问题。当元素变少得时候,bucket会动态减少,从而减少空间的浪费。linear hash就是一种动态hash。
二。linear hash实现.
对于hash操作,主要有insert, find, erase三个操作,下面对linear_hash的3个操作做一些解析:
1. Find.
2. Insert操作。
对于插入操作,客户端程序输入key值和卫星数据,进行这一操作,会增加Linearhash中的元素个数numElement,当Linearhash的bucket负载(numElements/numBuckets)超过一定值,需要动态的增加Bucket数,增加一个bucket, 接着需要把一个特定的bucket上的element分一部分到这个new 的bucket上。
3. Erase操作
对于erase操作,客户端程序给定key值,要求contianer删除其中和key值相同的元素。同时需要减少numElement数,当Linearhash的bucket负载(numElements/numBuckets)减少到一定值,需要动态的减少Bucket数,减少一个bucket, 同时需要把这个减少的bucket中的所有元素还原到原来的oldBucket中。
Linearhash insert time consume: 160
Linearhash find time consume: 50
Linearhash erase time consume: 70
map insert time consume: 80
map find time consume: 40
map erase time consume: 110
hash_map insert time consume: 150
hash_map find time consume: 40
hash_map erase time consume: 2130