算法与数据结构系列之[字典树-Trie]

1.什么是字典树-Trie

百科解释：(之所以引用百度百科的解释，是因为百科的解释概括性已经很好，也很全面，只需要少许补充即可)

字典树又称单词查找树，Trie树，是一种树形结构，是一种哈希树的变种。典型应用是用于统计，排序和保存大量的字符串（但不仅限于字符串），所以经常被搜索引擎系统用于文本词频统计。它的优点是：利用字符串的公共前缀来减少查询时间，最大限度地减少无谓的字符串比较，查询效率比哈希树高。

Trie树是一种专门处理字符串匹配的数据结构，用来解决在一组字符串集合中快速查找某个字符串的问题。下面我们用一个例子说明。我们有5个字符串，分别是：cat, dog, deer, pan, panda.我们希望在这里面多次查找某个字符是否存在，如果我们将字符串用数组存储，每次查找都是拿要查找的字符串跟这五个字符依次进行比较，需要遍历这个字符数组，如果要查找的是数组中最后一个元素，那么只有比到最后一个单词才能找到结果，需要比较的次数是前面每个单词比较过的次数（先用我们要查询单词的首字母和前面单词比较，如果首字母不同，直接跳出本次循环比下一个单词，上个单词比较的次数为1次，如果首字母相同，则接着比较下面的字母，如果有一个字母不相同，就进入下一个单词的比较）+最后一个单词的字母数，因而时间复杂度为O(n),显然效率是比较低的。倘若我们用trie树，先将这五个字符串组织成trie树，构建trie树的过程，需要扫描所有字符串，时间复杂度为O(n),之后每次查找都是在trie树中进行匹配查找,每次查询时，如果要查询的字符串长度为k，我们只需要对比大约k个节点，就能查找到结果，和字符串的个数没有关系，此时查询的时间复杂度为O(k),显然能够提升效率。

2.Trie树的性质

根节点不包含字符，除根节点外每一个节点都只包含一个字符。
从根节点到某一节点，路径上经过的字符连接起来，为该节点对应的字符串。
每个节点的所有子节点包含的字符都不相同。

Trie树的本质就是利用字符串之间的公共前缀，将重复的前缀合并在一起，最后构造成如下图的多叉树。

图示：（维护一个单词结束标志isEnd，每个单词结束后，将该标志设置为true）

图一

3.Trie树的构建过程

图二

4.Trie树的查询过程

图三

5.Trie代码实现：

public class Trie {
   private class Node{

       public boolean isEnd; //单词结束标志
       public TreeMap next;

       public Node(boolean isEnd){
           this.isEnd = isEnd;
           next = new TreeMap<>();
       }

       public Node(){
           this(false);
       }
   }

   private Node root;
   private int size;

   public Trie(){
       root = new Node();
       this.size = 0;
   }

   //获得trie中的单词数量
   public int getSize(){
       return size;
   }

   //向trie中添加一个新的单词word
   public void add(String word){
       Node cur = root;
       for (int i = 0; i < word.length(); i++) {
           char c = word.charAt(i);
           if(cur.next.get(c) == null)
               cur.next.put(c,new Node());
           cur = cur.next.get(c);
       }
       if(!cur.isEnd){
           cur.isEnd = true;
           size++;
       }
   }

   //查询单词word是否在trie中
   public boolean contains(String word){
       Node cur = root;
       for (int i = 0; i < word.length(); i++) {
           char c = word.charAt(i);
           if(cur.next.get(c) == null)
               return false;
           cur = cur.next.get(c);
       }
       return cur.isEnd;     //这里不直接返回true,是因为即使上面的循环结束后，这个单词的每一个字母在trie中都有，
                               // 依然不能确定trie中有我们要查的这个单词
                               // 比如trie中存有一个单词panda,而我们要查询的单词为pan时，
                               // 虽然前三个字母都找到了，但是trie中并不存在pan这个单词，要看单词结束标志是否为true。
   }

   //查询是否在trie中是否有单词以prefix为前缀(前缀查询)
   public boolean isPrefix(String prefix){
       Node cur = root;
       for (int i = 0; i < prefix.length(); i++) {
           char p = prefix.charAt(i);
           if(cur.next.get(p) == null)
               return false;
           cur = cur.next.get(p);
       }
       return true;
   }

   public static void main(String[] args) {
       Trie trie = new Trie();
       String[] str ={"cat","dog","deer","pan","panda"};
       for (String s : str) {
           trie.add(s);
       }
       System.out.println(trie.contains("pan"));
   }
}

6.Tire应用

1.LeetCode第211题：添加与搜索单词-数据结构设计

设计一个支持以下两种操作的数据结构：
void addWord(word)
bool search(word)
search(word) 可以搜索文字或正则表达式字符串，字符串只包含字母 . 或 a-z 。 . 可以表示任何一个字母。
示例:
addWord("bad")
addWord("dad")
addWord("mad")
search("pad") -> false
search("bad") -> true
search(".ad") -> true
search("b..") -> true
说明:
你可以假设所有单词都是由小写字母 a-z 组成的。
来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/add-and-search-word-data-structure-design

public class WordDictionary{
    /**
     * trie的节点
     */
    private class Node{

        public boolean isEnd; //单词结束标志
        public TreeMap next;

        public Node(boolean isEnd){
            this.isEnd = isEnd;
            next = new TreeMap<>();
        }

        public Node(){
            this(false);
        }
    }

    private Node root;  //根节点

    public WordDictionary(){
        root = new Node();
    }

    //向trie中添加单词
    public void addWord(String word) {
        Node  cur = root;
        for (int i = 0; i < word.length(); i++) {
            char c = word.charAt(i);
            if(cur.next.get(c) == null)
                cur.next.put(c,new Node());
            cur = cur.next.get(c);
        }
        if(!cur.isEnd)
            cur.isEnd = true;
    }

    //在trie中查询是否存在单词word
    public boolean search(String word) {
       return match(root,word,0);
    }

    //通过递归查找
    private boolean match(Node node,String word,int index){
        if(index == word.length())   //递归到底的情况
            return node.isEnd;
        char c = word.charAt(index);
        if(c != '.'){
            if(node.next.get(c) == null)
                return false;
            return match(node.next.get(c),word,index+1);
        }
        else {
            for (Character character : node.next.keySet()) {
                if(match(node.next.get(character),word,index+1))
                    return true;
            }
            return false;
        }
    }

}

2.LeetCode第677题：键值映射

实现一个 MapSum 类里的两个方法，insert 和 sum。

对于方法 insert，你将得到一对（字符串，整数）的键值对。字符串表示键，整数表示值。如果键已经存在，那么原来的键值对将被替代成新的键值对。

对于方法 sum，你将得到一个表示前缀的字符串，你需要返回所有以该前缀开头的键的值的总和。

示例 1:
输入: insert("apple", 3), 输出: Null
输入: sum("ap"), 输出: 3
输入: insert("app", 2), 输出: Null
输入: sum("ap"), 输出: 5
来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/map-sum-pairs

public class MapSum{
    /**
     * trie的节点
     */
    private class Node{

        public int value;
        public TreeMap next;


        public Node(int  value){
            this.value = value;
            next = new TreeMap<>();
        }

        public Node(){
            this(0);
        }
    }


    private Node root; //根节点

    publicMapSum(){
        root = new Node();
    }

    //添加元素
    public void insert(String key, int val) {
        Node cur = root;
        for (int i = 0; i < key.length(); i++) {
            char c = key.charAt(i);
            if(cur.next.get(c) == null)
                cur.next.put(c,new Node());
            cur = cur.next.get(c);
        }
       cur.value = val;
    }

    //查询
    public int sum(String prefix) {
      Node cur = root;
        for (int i = 0; i < prefix.length(); i++) {
            char c = prefix.charAt(i);
            if(cur.next.get(c) == null)
                return 0;
            cur = cur.next.get(c);
        }
        return sum(cur);
    }

    //递归查询
    private int sum(Node node){
        int res = node.value;
        for (Character character : node.next.keySet()) {
            res += sum(node.next.get(character));
        }
        return res;
    }
}

7.总结

这篇我们介绍了trie树，trie树是一种字符串快速匹配问题的数据结构，但trie是一种以空间换时间的数据结构，如果用来构建trie树的字符串中，前缀重复的情况不是很多，那么用trie这种结构存储还是比较费内存的，因为trie树的每个节点都只存储一个字符，这时可以选择用散列表或红黑树。当然，实际中用到trie树时，为了解决trie耗费内存的问题，也很多用到trie树的变种数据结构，将trie树进行适当优化，牺牲一些查询的性能，来降低trie树的内存占用。这篇我们也只是介绍了trie树的构建，查询和前缀查询方法，其实与trie有关的操作还有很多，比如trie节点的删除以及后缀查询也用的比较多，还有trie树的代码实现也不唯一，大家可以按照自己的思路来自己实现以及更为深入的学习trie树的其他操作方法，然后对trie树进行优化。

最后在提一点trie的实际应用，搜索引擎搜索框输入搜索关键字后的提示功能底层就用到trie树，当然谷歌和百度也一定做了优化。

本人微信公众号，点关注，不迷路

算法与数据结构系列之[字典树-Trie]

你可能感兴趣的:(算法与数据结构系列之[字典树-Trie])