姬霓钛镁

[力扣刷题总结]（字典树篇）

文章目录

字典树
- 字典树的概念
- 字典树的功能
- 字典树的实现及代码实现
208. 实现 Trie (前缀树)
- 解法1：实现Trie
472. 连接词
- 解法1：字典树+DFS
820. 单词的压缩编码
- 解法1：字典树

字典树

字典树的概念

本小节主要参考参考链接

字典树也叫Trie树、前缀树。顾名思义，它是一种针对字符串进行维护的数据结构。

字典树，顾名思义，是关于“字典”的一棵树。即：它是对于字典的一种存储方式（所以是一种数据结构而不是算法）。这个词典中的每个“单词”就是从根节点出发一直到某一个目标节点的路径，路径中每条边的字母连起来就是一个单词。

（标橙色的节点是“目标节点“，即根节点到这个目标节点的路径上的所有字母构成了一个单词。）

从这张图我们可以看出，字典树就是一棵树，只不过，这棵树的每条边上都有一个字母，然后这棵树的一些节点被指定成了标记节点（目标节点）而已。

这就是字典树的概念。结合上面说的概念，上图所示的字典树包括的单词分别为：

a
abc
bac
bbc
ca

字典树的功能

根据字典树的概念，我们可以发现：字典树的本质是把很多字符串拆成单个字符的形式，以树的方式存储起来。所以我们说字典树维护的是”字典“。那么根据这个最基本的性质，我们可以由此延伸出字典树的很多妙用。简单总结起来大体如下：

1、维护字符串集合（即字典）。

2、向字符串集合中插入字符串（即建树）。

3、查询字符串集合中是否有某个字符串（即查询）。

4、统计字符串在集合中出现的个数（即统计）。

5、将字符串集合按字典序排序（即字典序排序）。

6、求集合内两个字符串的LCP（Longest Common Prefix，最长公共前缀）（即求最长公共前缀）。

我们可以发现，以上列举出的功能都是建立在“字符串集合”的基础上的。再一次强调，字典树是“字典”的树，一切功能都是“字典”的功能。这也为我们使用字典树的时候提供了一个准则：看到一大堆字符串同时出现，就往哈希和Trie树那边想一下。

字典树的实现及代码实现

字典树的两种基本操作分别是建树和查询。其中建树操作就是把一个新单词插入到字典树里。查询操作就是查询给定单词是否在字典树上。

Trie 是一颗非典型的多叉树模型，多叉好理解，即每个结点的分支数量可能为多个。为什么说非典型呢？因为它和一般的多叉树不一样，尤其在结点的数据结构设计上，比如一般的多叉树的结点是这样的：

C++

struct TreeNode {
    VALUETYPE value;    //结点值
    TreeNode* children[NUM];    //指向孩子结点
};

而 Trie 的结点是这样的(假设只包含’a’~'z’中的字符)：

C++

struct TrieNode {
    bool isEnd; //该结点是否是一个串的结束
    TrieNode* next[26]; //字母映射表
};

定义类 Trie：
C++

class Trie {
private:
    bool isEnd;
    Trie* next[26];
public:
    //方法将在下文实现...
};

插入：
描述：向 Trie 中插入一个单词 word

实现：这个操作和构建链表很像。首先从根结点的子结点开始与 word 第一个字符进行匹配，一直匹配到前缀链上没有对应的字符，这时开始不断开辟新的结点，直到插入完 word 的最后一个字符，同时还要将最后一个结点isEnd = true;，表示它是一个单词的末尾。

C++

void insert(string word) {
    Trie* node = this;
    for (char c : word) {
        if (node->next[c-'a'] == NULL) {
            node->next[c-'a'] = new Trie();
        }
        node = node->next[c-'a'];
    }
    node->isEnd = true;
}

查找：
描述：查找 Trie 中是否存在单词 word

实现：从根结点的子结点开始，一直向下匹配即可，如果出现结点值为空就返回 false，如果匹配到了最后一个字符，那我们只需判断 node->isEnd即可。

C++

bool search(string word) {
    Trie* node = this;
    for (char c : word) {
        node = node->next[c - 'a'];
        if (node == NULL) {
            return false;
        }
    }
    return node->isEnd;
}

前缀匹配：
描述：判断 Trie 中是或有以 prefix 为前缀的单词

实现：和 search 操作类似，只是不需要判断最后一个字符结点的isEnd，因为既然能匹配到最后一个字符，那后面一定有单词是以它为前缀的。

C++

bool startsWith(string prefix) {
    Trie* node = this;
    for (char c : prefix) {
        node = node->next[c-'a'];
        if (node == NULL) {
            return false;
        }
    }
    return true;
}

到这我们就已经实现了对 Trie 的一些基本操作，这样我们对 Trie 就有了进一步的理解。

208. 实现 Trie (前缀树)

力扣链接

Trie（发音类似 “try”）或者说前缀树是一种树形数据结构，用于高效地存储和检索字符串数据集中的键。这一数据结构有相当多的应用情景，例如自动补完和拼写检查。

请你实现 Trie 类：

Trie() 初始化前缀树对象。
void insert(String word) 向前缀树中插入字符串 word 。
boolean search(String word) 如果字符串 word 在前缀树中，返回 true（即，在检索之前已经插入）；否则，返回 false 。
boolean startsWith(String prefix) 如果之前已经插入的字符串 word 的前缀之一为 prefix ，返回 true ；否则，返回 false 。

示例：

输入
[“Trie”, “insert”, “search”, “search”, “startsWith”, “insert”, “search”]
[[], [“apple”], [“apple”], [“app”], [“app”], [“app”], [“app”]]
输出
[null, null, true, false, true, null, true]

解释
Trie trie = new Trie();
trie.insert(“apple”);
trie.search(“apple”); // 返回 True
trie.search(“app”); // 返回 False
trie.startsWith(“app”); // 返回 True
trie.insert(“app”);
trie.search(“app”); // 返回 True

提示：

1 <= word.length, prefix.length <= 2000
word 和 prefix 仅由小写英文字母组成
insert、search 和 startsWith 调用次数总计不超过 3 * 104 次

解法1：实现Trie

class Trie {
private:
    bool isEnd;
    Trie* next[26];
public:
    Trie() {
        isEnd = false;
        memset(next,0,sizeof(next));
    }
    
    void insert(string word) {
        Trie* node = this;
        for(auto& w:word){
            if(node->next[w-'a'] == nullptr){
                node->next[w-'a'] = new Trie();
            }
            node = node->next[w-'a'];
        }
        node->isEnd = true;
    }
    
    bool search(string word) {
        Trie* node = this;
        for(auto& w:word){
            node = node->next[w-'a'];
            if(node == nullptr){
                return false;
            }
        }
        return node->isEnd;
    }
    
    bool startsWith(string prefix) {
        Trie* node = this;
        for(auto& w:prefix){
            node = node->next[w-'a'];
            if(node == nullptr) return false;
        }
        return true;
    }
};

/**
 * Your Trie object will be instantiated and called as such:
 * Trie* obj = new Trie();
 * obj->insert(word);
 * bool param_2 = obj->search(word);
 * bool param_3 = obj->startsWith(prefix);
 */

472. 连接词

力扣链接
给你一个不含重复单词的字符串数组 words ，请你找出并返回 words 中的所有连接词。

连接词定义为：一个完全由给定数组中的至少两个较短单词组成的字符串。

示例 1：

输入：words = [“cat”,“cats”,“catsdogcats”,“dog”,“dogcatsdog”,“hippopotamuses”,“rat”,“ratcatdogcat”]
输出：[“catsdogcats”,“dogcatsdog”,“ratcatdogcat”]
解释：“catsdogcats” 由 “cats”, “dog” 和 “cats” 组成;
“dogcatsdog” 由 “dog”, “cats” 和 “dog” 组成;
“ratcatdogcat” 由 “rat”, “cat”, “dog” 和 “cat” 组成。

示例 2：

输入：words = [“cat”,“dog”,“catdog”]
输出：[“catdog”]

提示：

1 <= words.length <= 104
0 <= words[i].length <= 1000
words[i] 仅由小写字母组成
0 <= sum(words[i].length) <= 105

解法1：字典树+DFS

思路：

1.对单词按长度排序；长的单词一定由短的单词构成

2.为之前出现的所有单词构建一个前缀树不了解的同学可以参考OI-wiki

3.搜索过程就是对单词的每个位置在前缀树上遍历； 每次遇到前缀树上的end节点，就可以考虑尝试从目标单词的下个位置重新从前缀树根节点匹配，这代表着s=[try:remain],s中找到了一个之前出现过的连接词try；当然也要考虑接着在树上继续匹配。

代码：

struct Trie{
bool isEnd;
vector<Trie*> next;
Trie(){
    this->isEnd = false;
    this->next = vector<Trie*>(26,nullptr);
}
};

class Solution {
public:
    Trie* trie = new Trie();
    vector<string> findAllConcatenatedWordsInADict(vector<string>& words) {
        vector<string> result;
        sort(words.begin(),words.end(),[&](const string& a, const string& b){
            return a.size() < b.size();
        });

        for(int i = 0;i<words.size();i++){
            string word = words[i];
            if(word.size() == 0) continue;
            if(dfs(word,0)) result.emplace_back(word);
            else insert(word);
        }
        return result;
    }

    bool dfs(const string& word, int start){
        //终止条件
        if(word.size() == start) return true;
        Trie* node = trie;
        for(int i = start;i<word.size();i++){
            node = node->next[word[i]-'a'];
            if(node == nullptr) return false;
            if(node->isEnd){
                if(dfs(word,i+1)) return true;
            }
        }
        return false;
    }
    
    void insert(const string& word){
        Trie* node = trie;
        for(auto& w:word){
            if(node->next[w-'a'] == nullptr){
                node->next[w-'a'] = new Trie();
            }
            node = node->next[w-'a'];
        }
        node->isEnd = true;
    }
};

820. 单词的压缩编码

力扣链接
单词数组 words 的有效编码由任意助记字符串 s 和下标数组 indices 组成，且满足：

words.length == indices.length
助记字符串 s 以 ‘#’ 字符结尾
对于每个下标 indices[i] ，s 的一个从 indices[i] 开始、到下一个 ‘#’ 字符结束（但不包括 ‘#’）的子字符串恰好与 words[i] 相等
给你一个单词数组 words ，返回成功对 words 进行编码的最小助记字符串 s 的长度。

示例 1：

输入：words = [“time”, “me”, “bell”]
输出：10
解释：一组有效编码为 s = “time#bell#” 和 indices = [0, 2, 5] 。
words[0] = “time” ，s 开始于 indices[0] = 0 到下一个 ‘#’ 结束的子字符串，如加粗部分所示 “time#bell#”
words[1] = “me” ，s 开始于 indices[1] = 2 到下一个 ‘#’ 结束的子字符串，如加粗部分所示 “time#bell#”
words[2] = “bell” ，s 开始于 indices[2] = 5 到下一个 ‘#’ 结束的子字符串，如加粗部分所示 “time#bell#”

示例 2：

输入：words = [“t”]
输出：2
解释：一组有效编码为 s = “t#” 和 indices = [0] 。

提示：

1 <= words.length <= 2000
1 <= words[i].length <= 7
words[i] 仅由小写字母组成

解法1：字典树

思路：
（1）如果单词 X 是 Y 的后缀，那么单词 X 就不需要考虑了，因为编码 Y 的时候就同时将 X 编码了。例如，如果 words 中同时有 “me” 和 “time”，我们就可以在不改变答案的情况下不考虑 “me”。

如果单词 Y 不在任何别的单词 X 的后缀中出现，那么 Y 一定是编码字符串的一部分。

因此，目标就是保留所有不是其他单词后缀的单词，最后的结果就是这些单词长度加一的总和，因为每个单词编码后后面还需要跟一个 # 符号。

（2）去找到是否不同的单词具有相同的后缀，我们可以将其反序之后插入字典树中。例如，我们有 “time” 和 “me”，可以将 “emit” 和 “em” 插入字典树中。