#学习笔记#Concept of Hashing

http://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

Introduction

如何加快搜索的速度呢? 如果搜索一个unsorted array, 那么需要扫一遍array, O(n). 如果是sorted array, 则可以用binary search, O(log n). 

如果事先知道要找的value分布在array的什么位置(index)的话, 我们可以在O(1)的时间找到这个value, Hash Function 就是这样一种方法.

  • A hash function is a function which when given a key, generates an address in the table.



Hash function has the following properties

  • It always return a number for an object.
  • Two equal objects will always have the same number
  • Two unequal objects not always have different numbers

使用Hashing Function的步骤(procedure):

Create an array of size M. Choose a hash function h, that is a mapping from objects into integers 0, 1, 2, ... , M - 1. Put these objects into an array at indexes computed via the hash function index = h(object). Such array is called a hash table.



如何来选择hash function呢? 一种方法是使用Java的 hashCode() 方法, which is implemented in the Object class and there fore each class in Java inherits it. 它可以对object提供一种数字化的表示 numeric representation, (相对于 toString() 方法会提供object的text representation)

例子:


hashCode() 方法对不同的class有不同的实现方式, 对于String来说, 如下:

s.charAt(0) * 31n-1 + s.charAt(1) * 31n-2 + ... + s.charAt(n-1)

s is a string and n is the length. Example:

"ABC" = 'A' * 312 + 'B' * 31 + 'C' = 65 * 312 + 66 * 31 + 67 = 64578

Note that Java's hashCode method might return a negative integer. If a string is long enough, its hashcode will be bigger than the largest integer we can store on 32 bits CPU. In this case, due to integer overflow, the value returned by hashCode can be negative.

Review the code in HashCodeDemo.java.


Collisions

When we put objects into a hashtable, it is possible that different objects (by the  equals()  method) might have the same hashcode. This is called a collision . Here is the example of collision. Two different strings "Aa" and "BB" have the same key:
"Aa" = 'A' * 31 + 'a' = 2112
"BB" = 'B' * 31 + 'B' = 2112
#学习笔记#Concept of Hashing_第1张图片
如何解决collisions问题? 
一种方法是基于将collide keys放在linked list中, 即所谓的 separate chaining collision resolution.
hashtable 的优势就在于对于add, remove, contains, size这些basic operations 具有constant- time performance. 然而由于collisions, worst-case情况下我们并不能保证constant runtime. 试想所有的 objects 都 collide into the same index. 这时搜索此hashtable就和搜索一个linkedlist一样了, linear runtime. 我们可以保证的是 expected constant runtime.


=========TO BE CONTINUED...


你可能感兴趣的:(#学习笔记#Concept of Hashing)