js数据结构之字典和哈希表

1. 字典

1.1 基本概念

字典是一种以[键,值]形式储存数据的数据结构, 其中键是用来查找特定元素的.字典和集合很类似,字典和集合很相似,集合以[值,值]的形式储存数据,字典则是以[键值]的形式来储存数据.字典也叫映射,符号表或关联数组.

就像字典里面每个字(键)对应的有它的解释值,在不然电话簿,名字(键)对应电话值.JavaScript中的Object类就是基于字典的形式结构实现的, 与Set类似,ES6中同样包含一个Map类的实现,它是对原JavaScript对象的加强和补充,Map也是以键值对的形式来进行储存数据,不同于对象,Map的键可以是任何一个类型的的值,包括了函数,对象,或其他任意基本类型.在某些场景下,使用Map结构会比Object更加合理和方便.

1.2 创建基本的字典结构

一个字典大概会有以下的功能需求

1. 设置数据 (set);
2. 获取数据 (get);
3. 删除数据 (delete);
4. 是否存在数据(has);
5. 返回长度 (size);

接下来我们以ES6的Map类为基础,实现自己的Map,你会发现他们之间会非常类似.

class MyMap{
    constructor(){
        this.table = []
    }
    set(key,val){
        // 如果key为空字符串则 返回false
        if(key==="") return false
        // 遍历以检查键值是否存在, 如果存在则进行覆盖
        for(let item of this.table){
            if(item[0] === key){
                item[1] = value;
                return false
            }
        }
        // 往数组中添加数组键值对
        this.table.push([key,val]);
        // 返回true 表示设置成功
        return true
    }
    get(key) {
        if (!key) return undefined;
        // 遍历查找出对应的键值,并返回键值
        for (let item of this.table) {
            if(item[0] === key){
                return item[1]
            }
        }
        return undefined
    }
    delete(key){
        if (!key) return undefined;
        // 遍历查找出对应的键值,并从数组中删除
        for(let index of Object.keys(this.table)){
            let item = this.table[index];
            if(item[0] === key){
                this.table.splice(index,1);
                return true
            }
        }
    }
    has(key){
        // 返回字典中是否存在对应的key
        return this.table.some(item => item[0] === key)
    }
    size(){
        return this.table.length
    }
}

上面代码是对字典的简单实现,只供学习和了解字典数据结构的使用,在实际开发中我们直接使用原生的Map数据结构就行了.

接下来我们简单测试一下上面的代码

let map = new MyMap();
let box = document.querySelector('.box')
console.log(map.set(box, "box元素"));// true
console.log(map.get(box))//box元素.
console.log(map.has(box));// true
console.log(map.set('name', "张三"))// true
console.log(map.delete(box))// true
console.log(map.has('box'));// true
console.log(map.get("name"));//张三
console.log(map.size())//1

2. 哈希表

2.1基本概念

哈希表也叫散列表也是一种基于键值对存储值的一种数据结构，与之前的区别在于，哈希表能快速的定位值的位置，而不需要逐个遍历匹配。

JavaScript中的对象就具有哈希表的特性, 我们通过数组来模拟一个哈希表.通过一系列的计算通过key值得到对应的序号,这个过程就是哈希表最重要的过程,叫做哈希函数.

哈希表的作用就是尽可能快的在数据结构中找到一个值,在我们上面的字典中,我们如果要在其中获取到一个值,需要迭代整个数据结构来获取值,如果我们使用哈希函数,就可以知道值,因此能够快速检索到该值.函数函数的作用就是给定一个键值,然后返回值在表中的地址.

2.1创建哈希表的基本结构

正常的哈希表都有如下的几个方法

put(key,value): 向哈希表增加一个新的项.
remove(key):根据键值从哈希表中删除值.
get(key): 返回根据键值检索到的特定的值.

 const HashTable = (function () {
     function HashCode(key) {
         if (typeof key === "number") return key
         if (typeof key !== "string") throw TypeError('key error');
         let hash = 0;//保存对应的每个字符ASCII码值之和
         // 遍历每个字符的并将其ASCII码值累加
         for (let i = 0, len = key.length; i < len; i++) {
             hash += key.charCodeAt(i);
         };
         return hash;
     }
     return class {
         constructor() {
             this.table = []
         }
         put(key, value){
             let hashkey = HashCode(key);
             this.table[hashkey] = value;
             console.log(this.table)
         }
         get(key){
             let hashkey = HashCode(key);
             return this.table[hashkey]
         }
          remove(key){
              return Reflect.deleteProperty(this.table,HashCode(key))
          }
     }
 })();

let hashTable = new HashTable();
hashTable.put('person',"张三");
console.log(hashTable.get('person'))//张三

很显然,我们这个哈希函数是很不合理的,第一,哈希值过大,只需要存一条数据却需要建立一个长度好几百的数组,一般合理的利用率是0.6~0.9, 意思就是数据个数与总长度的比例在这个之间.第二,容易出现重复,要是两个字符码加起来刚好应用就会重新哈希重复,从而覆盖前一个数据,

解决第一个问题就是, 我们可以预先设置一个较为合理的长度,然后规定序号不会超过长度.我们会用hash值和任意一个数进行取模运算(余数),这样就可以规避操作数超过数值变量最大表示范围的风险.

return hash % 37;

解决第二个问题我们可以使用分离链接法,它是解决哈希冲突最简单的方法,我们利用链表的数据结构来储存哈希表每一个位置上的元素.但是在hashTable实例之外还需要额外的储存空间.

下面是我们的基本链表结构代码

const LinkedList = (function(){
    let HEAD = Symbol();
    // 创建节点的基本类
    class Node{
        constructor(element){
            this.element = element;
            // 指向下一个节点的指针
            this.next = null;
        }
    }
    return class{
        constructor(){
            // 头节点
            this[HEAD] = null;
            // 链表的个数
            this.count = 0;
        }
        //往链表链尾追加数据
        append(element){
            // 创建对应的链节点
            const node = new Node(element);
            // 获取链头
            let head = this[HEAD];
            // 如果链头是null;
            if(this[HEAD] === null){
                this[HEAD] = node;
            }else{
                // 如果不是链头则循环找到链尾
                while(head.next !== null){
                    head = head.next;
                }
                // 将其next指向新节点, 建立链接
                head.next = node
            }
            this.count++;
            return this
        }
        // 查找指定元素对应的节点
        find(element,index){
            let result = [];
            let head = this[HEAD];
            // 如果头节点为 null 则返回空数组
            if (head === null) return result;
            // index没有传值, 将按元素是是否相等进行查找
            if(index === undefined){
                while(head !== null){
                    if(head.element === element){
                        result.push(head);
                    }
                    head = head.next;
                }
            }
            // 如果index有中则按照下标进行查找
            if(typeof index === "number"){
                let i = 0;
                while(head !== null){
                    if(index === i){
                        result.push(head)
                    }
                    head = head.next;
                    i++;
                }
            }
            return result
        }
        //在链表指定位置进行插入元素
        insert(element,index){
            // 检测所以值, 防止越界
            if(index >=0 && index <= this.count){
                const node = new Node(element);
                if(index === 0) {// 在第一个位置添加
                    const current = this[HEAD];
                    node.next = current;
                    this[HEAD] = node;
                }else{
                    // 将上一个节点的next指向插入的节点 
                    // 并且当插入的节点的next指向上一个节点的next
                    const pre = this.find("",index-1)[0];
                    const current = pre.next;
                    node.next = current;
                    pre.next = node;
                    this.count++
                    return true;
                }
            }
            // 表示插入失败
            return false;
        }
        // 通过指定数据或者下标删除指定节点
        remove(element,index){
            //获取当前删除的节点
            let head = this[HEAD];
            // 如果index没有传值则, 则通过查找元素是否相等来进行删除
            if(index === undefined){
                // 如果为头节点则将头节点指向下一个节点
                if(head.element === element){
                    this[HEAD] = head.next;
                     this.count--
                      return
                }
                // 遍历链表直到指针不为空, 如果当前节点的元素的下一个节点元素与目标元素相等
                // 则将其指针指向指针得下一个的下一个,来进行清空节点
                while(head.next !== null){
                    if(head.next.element = element){
                        head.next = head.next.next;
                        this.count--
                    }
                    return;
                    head = head.next
                }
            }
            else if (typeof index === "number") {
                //如果为头节点
                if(index === 0){
                    this[HEAD] = head.next;
                    return
                }
                let i = 0;
                while(head.next !== null){
                    if(i === index){
                        head.next = head.next.next;
                        this.count--
                        return
                    }
                    i++;
                    head = head.next;
                }
            }
        }
        // 返回链表个数
        size(){
            return this.count
        }
        // 打印链表
        print(){
            let current = this[HEAD];
            while(current !== null){
                console.log(current.element);
                current = current.next
            }
        }
        //返回元素在链表中的索引, 如果没有则返回-1
        indexOf(element){
            let head = this[HEAD],
                index = 0;
            while(head.next !== null){
                if(head.element ===  element){
                    return index
                }
                index++;
                head = head.next
            }
            return -1
        }
          //返回链表头
        getHead(){
            return this[HEAD]
        }
    }
})();

对于分离链接来说,需要重写对应的三个方法:put,remove,get.这三个方法在不同的技术实现都是不同的,接下来我们分别重写对应的三个方法.

put方法

首先我们声明一个保存键值对的类,用来生成一个保存键值对的实例对象.

 class ValuePari{
     constructor(key,val){
         this.key = key;
         this.value = val;
     }
 }

  put(key, value){
      // 判断kay和value是否存在
      if(key !==null && value !== null){
          let index = HashCode(key)
          // 如果哈希不存在对应的哈希表则创建一个链表储存在哈希表
          if(!this.table[index]){
              this.table[index] = new LinkedList()
          }
          // 当哈希冲突会自动添加链表的next节点中
          this.table[index].append(new ValuePari(key, value));
          return true
      }
      return false
  }

在这个方法中我们将验证要加的新元素的位置是否被占据,如果是第一个向该位置添加元素,我们则会在该位置初始化一个LinkList实例,然后我们直接向LinkList实例中利用append方法向链表添加对应的保存键值对的ValuePari实例.当哈希发生冲突时,我们直接向链表中添加新节点,它会自动保存在上一个节点的next节点上.

get方法

  get(key){
      const index = HashCode(key),
            linkedList = this.table[index];
      // 检测是否存在对应的链表
      if(!linkedList) return undefined
      // 获取链头并进行遍历, 直到head不为null,
      // 当查找当元素的key与传入的key相等时则返回.
      let head = linkedList.getHead();
      while(head){
          if(head.element.key === key){
              return head.element.value
          }
          head = head.next
      }
  }

在get方法中我们先检测对应位置的元素是否存在,如果没有我们返回undefined,如果该位置有值存在,我们知道这是一个链表结构,所以我们需要迭代链表直到找到我们需要的键并直接返回对应的value,如果不相同我们则会进行迭代链表,直到haad为时null停止.

remove方法

 remove(key){
     const index = HashCode(key),
           linkedList = this.table[index];
     if(!linkedList) return false;
     let head = linkedList.getHead();
     while(head){
         if(head.element.key === key){
             linkedList.remove(head.element);
             // 如果对应的链表长度为0, 则删除这个链表
             if(!linkedList.count){
                 Reflect.deleteProperty(this.table,index)
             }
             return true;
         }
         head = head.next
     }
     return false
 }

上面的remove方法和get方法一样的步骤,找到需要删除的元素,迭代ListedList实例时, 如果我们找到了对应的元素,则利用链表的remove方法删除指定的元素,在进行进一步的判断,如果链表为空了,就使用ES6的反射Reflect对象的deletePropertyapi从哈希表指定位置删除这个空链表.释放内存.

另一种解决冲突的方法就是线性探查.之所以称线性,是因为它处理冲突的方法是将元素直接储存到表中,而不是在单独的数据结构中.

当我们向表中每个位置添加元素时, 如果索引为position的位置已经被占据了,就会尝试position+1的位置,如果position+1的位置也被占据了,就尝试position+2的位置,以此类推, 直到在哈希表中找到一个空闲的位置.

const HashTable = (function(){
    let symbol = Symbol();
    function HashCode(key) {
        if (typeof key === "number") return key
        if (typeof key !== "string") throw TypeError('key error');
        let hash = 0;//保存对应的每个字符ASCII码值之和
        // 遍历每个字符的并将其ASCII码值累加
        for (let i = 0, len = key.length; i < len; i++) {
            hash += key.charCodeAt(i);
        };
        return hash % 37;
    }
    class Valuepair{
        constructor(key,value){
            this.key = key;
            this.value = value;
        }
    }
    return class{
        constructor(){
            this[symbol]= [];
        }
        put(key,value){
            if(key !== null && value !== null){
                const position = HashCode(key);
                if(!this[symbol][position]){
                    this[symbol][position] = new Valuepair(key,value);
                }else{
                    let index = position + 1;
                    while(this[symbol][index]){
                        index++;
                        console.log(index)
                    }
                    this[symbol][index] = new Valuepair(key,value);
                    return true
                }
            } 
            return false
        }
        get(key){
            const position = HashCode(key);
            // 判断对应位置是否占据了元素
            if (this[symbol][position] !== null) {
                // 如果已经占据了元素且key键相等则直接返回
                if (this[symbol][position].key === key) {
                    return this[symbol][position].value
                } else {
                    let index = position + 1;
                    // 迭代哈希表,直到查找出key值与目标key值相等的位置.
                    while (this[symbol][index] !== null && this[symbol][index].key !== key) {
                        index++;
                    }
                    // 返回对应位置的元素
                    return this[symbol][index].value
                }
            }
            return undefined
        }
        remove(key){
            const position = HashCode(key);
            if(this[symbol][position]) {
                if(this[symbol][position].key === key){
                    this[symbol].splice(position, 1)
                }
                let index = position + 1;
                // 迭代哈希表, 直到找到对应要删除的元素所在的下标为止
                while(this[symbol][index] && this[symbol][index].key !== key){
                    index++
                }
                // 从哈希表数组中指定下标删除元素
                this[symbol].splice(index,1)
                // 返回true 表示删除成功
                return true
            }
            return false
        }
    }
})()

2.3 创建更好的哈希函数

我们实现的hash函数并不是一个好的哈希函数,因为它会产生太多的冲突.一个表现良好的哈希函数是由几个方面组成的:插入和检索元素的时间,以及较低的冲突可能性.我们可以在网上找到很多不同的实现方法,下面我们来实现自己的哈希函数.

下面是一种可以实现, 比lose lose更好的哈希函数djb2

function djb2HashCode(key){
   let tableKey = tableKey(key);
   let hash = 5381;
    for(let i = 0,len = tableKey.length;i < len; i++){
        hash = (hash % 33) + tableKey+ charCodeAt(i);
    }
    return hash % 1013
}
//将key键名的数据类型转为字符串
  function toStrFn(key) {
        if (key === null) {
            return 'Null'
        } else if (key === undefined) {
            return 'UNDEFIEND'
        } else if (typeof key === 'string' || key instanceof String) {
            return `${key}`
        }else if(typeof key === "object"){
            return JSON.stringify(key)
        }
        return key.toSting()
    }

在键转为字符之后,djb2HashCode方法包括初始化一个hash变量并赋值给一个质数, 大多数实现都是使用5381,然后迭参数key, 将hash与33 相乘(用作一个幻数),并和当前迭代的字符串的ASCII码值相加,最终我们将使用相加的和与另一个随机质数相除的余数, 比我们任务的哈希表的大小要大.在本例中, 我们认为我们的哈希表的大小为1000.

最后我们简单测试一下

console.log(djb2HashCode('jack'));//848
console.log(djb2HashCode('jakc'));//91

我们可以明显的看到两个类似的字符串对应的经过djb2HashCode哈希函数返回的键值不同, 他们之间没有冲突, 当然这不是最好的哈希函数, 但这是社区最推崇的哈希函数之一.

2.4 总结

哈希表的作用就是在数据结构中快速找到一个对应的值,哈希表是一种字典的实现.所以我们可以用作关联数组,它也可以用来对数据库进行索引.哈希表的实现关键就是哈希函数,哈希函数的作用也非常简单,就是给定一个键值,返回对应值在表中的对应位置.JavaScript中内部就是使用哈希表来表示每个对象.此外,对象的每个属性和方法(成员) 被储存为key对象类型,每个key指向对应的对象成员.