追梦船

后缀树

转自：http://www.cnblogs.com/gaochundong/p/suffix_tree.html

后缀树

在《字符串匹配算法》一文中，我们熟悉了字符串匹配问题的形式定义：

文本（Text）是一个长度为 n 的数组 T[1..n]；
模式（Pattern）是一个长度为 m 且 m≤n 的数组 P[1..m]；
T 和 P 中的元素都属于有限的字母表 Σ 表；
如果 0≤s≤n-m，并且 T[s+1..s+m] = P[1..m]，即对 1≤j≤m，有 T[s+j] = P[j]，则说模式 P 在文本 T 中出现且位移为 s，且称 s 是一个有效位移（Valid Shift）。

比如上图中，目标是找出所有在文本 T = abcabaabcabac 中模式 P = abaa 的所有出现。该模式在此文本中仅出现一次，即在位移 s = 3 处，位移 s = 3 是有效位移。

解决字符串匹配问题的常见算法有：

朴素的字符串匹配算法（Naive String Matching Algorithm）
Knuth-Morris-Pratt 字符串匹配算法（即 KMP 算法）
Boyer-Moore 字符串匹配算法

字符串匹配算法通常分为两个步骤：预处理（Preprocessing）和匹配（Matching）。所以算法的总运行时间为预处理和匹配的时间的总和。下图描述了常见字符串匹配算法的预处理和匹配时间。

我们知道，上述字符串匹配算法均是通过对模式（Pattern）字符串进行预处理的方式来加快搜索速度。对 Pattern 进行预处理的最优复杂度为 O(m)，其中 m 为 Pattern 字符串的长度。那么，有没有对文本（Text）进行预处理的算法呢？本文即将介绍一种对 Text 进行预处理的字符串匹配算法：后缀树（Suffix Tree）。

后缀树的性质：

存储所有 n(n-1)/2 个后缀需要 O(n) 的空间，n 为的文本（Text）的长度；
构建后缀树需要 O(dn) 的时间，d 为字符集的长度（alphabet）；
对模式（Pattern）的查询需要 O(dm) 时间，m 为 Pattern 的长度；

在《字典树》一文中，介绍了一种特殊的树状信息检索数据结构：字典树（Trie）。Trie 将关键词中的字符按顺序添加到树中的节点上，这样从根节点开始遍历，就可以确定指定的关键词是否存在于 Trie 中。

下面是根据集合 {bear, bell, bid, bull, buy, sell, stock, stop} 所构建的 Trie 树。

我们观察上面这颗 Trie，对于关键词 "bear"，字符 "a" 和 "r" 所在的节点没有其他子节点，所以可以考虑将这两个节点合并，如下图所示。

这样，我们就得到了一棵压缩过的 Trie，称为压缩字典树（Compressed Trie）。

而后缀树（Suffix Tree）则首先是一棵 Compressed Trie，其次，后缀树中存储的关键词为所有的后缀。这样，实际上我们也就得到了构建后缀树的抽象过程：

根据文本 Text 生成所有后缀的集合；
将每个后缀作为一个单独的关键词，构建一棵 Compressed Trie。

A suffix tree is a compressed trie for all the suffixes of a text.

比如，对于文本 "banana\0"，其中 "\0" 作为文本结束符号。下面是该文本所对应的所有后缀。

banana\0
anana\0
nana\0
ana\0
na\0
a\0
\0

将每个后缀作为一个关键词，构建一棵 Trie。

然后，将独立的节点合并，形成 Compressed Trie。

则上面这棵树就是文本 "banana\0" 所对应的后缀树。

现在我们先熟悉两个概念：显式后缀树（Explicit Suffix Tree）和隐式后缀树（Implicit Suffix Tree）。

下面用字符串 "xabxa" 举例说明两者的区别，其包括后缀列表如下。

xabxa
abxa
bxa
xa
a

我们发现，后缀 "xa" 和 "a" 已经分别包含在后缀 "xabxa" 和 "abxa" 的前缀中，这样构造出来的后缀树称为隐式后缀树（Implicit Suffix Tree）。

而如果不希望这样的情形发生，可以在每个后缀的结尾加上一个特殊字符，比如 "$" 或 "#" 等，这样我们就可以使得后缀保持唯一性。

xabxa$
abxa$
bxa$
xa$
a$
$

在 1995 年，Esko Ukkonen 发表了论文《On-line construction of suffix trees》，描述了在线性时间内构建后缀树的方法。下面尝试描述Ukkonen 算法的基本实现原理，从简单的字符串开始描述，然后扩展到更复杂的情形。

Suffix Tree 与Trie 的不同在于，边（Edge）不再只代表单个字符，而是通过一对整数 [from, to] 来表示。其中 from 和 to 所指向的是 Text 中的位置，这样每个边可以表示任意的长度，而且仅需两个指针，耗费 O(1) 的空间。

首先，我们从一个最简单的字符串 Text = "abc" 开始实践构建后缀树，"abc" 中没有重复字符，使得构建过程更简单些。构建过程的步骤是：从左到右，对逐个字符进行操作。

abc

第 1 个字符是 "a"，创建一条边从根节点（root）到叶节点，以 [0, #] 作为标签代表其在 Text 中的位置从 0 开始。使用 "#" 表示末尾，可以认为 "#" 在 "a" 的右侧，位置从 0 开始，则当前位置 "#" 在 1 位。

其代表的后缀意义如下。

第 1 个字符 "a" 处理完毕，开始处理第 2 个字符 "b"。涉及的操作包括：

扩展已经存在的边 "a" 至 "ab"；
插入一条新边以表示 "b"；

其代表的后缀意义如下。

这里，我们观察到了两点：

"ab" 边的表示 [0, #] 与之前是相同的，当 "#" 位置由 1 挪至 2 时，[0, #] 所代表的意义自动地发生了改变。
每条边的空间复杂度为 O(1)，即只消耗两个指针，而与边所代表的字符数量无关；

接着再处理第 3 个字符 "c"，重复同样的操作，"#" 位置向后挪至第 3 位：

其代表的后缀意义如下。

此时，我们观察到：

经过上面的步骤后，我们得到了一棵正确的后缀树；
操作步骤的数量与 Text 中的字符的数量一样多；
每个步骤的工作量是 O(1)，因为已存在的边都是依据 "#" 的挪动而自动更改的，仅需为最后一个字符添加一条新边，所以时间复杂度为 O(1)。则，对于一个长度为 n 的 Text，共需要 O(n) 的时间构建后缀树。

当然，我们进展的这么顺利，完全是因为所操作的字符串 Text = "abc" 太简单，没有任何重复的字符。那么现在我们来处理一个更复杂一些的字符串 Text = "abcabxabcd"。

abcabxabcd

同上面的例子类似的是，这个新的 Text 同样以 "abc" 开头，但其后接着 "ab","x","abc","d" 等，并且出现了重复的字符。

前 3 个字符 "abc" 的操作步骤与上面介绍的相同，所以我们会得到下面这颗树：

当 "#" 继续向后挪动一位，即第 4 位时，隐含地意味着已有的边会自动的扩展为：

即 [0, #], [1, #], [2, #] 都进行了自动的扩展。按照上面的逻辑，此时应该为剩余后缀 "a" 创建一条单独的边。但，在做这件事之前，我们先引入两个概念。

活动点（active point），是一个三元组，包括（active_node, active_edge, active_length）；
剩余后缀数（remainder），是一个整数，代表着还需要插入多少个新的后缀；

如何使用这两个概念将在下面逐步地说明。不过，现在我们可以先确定两件事：

在 Text = "abc" 的例子中，活动点（active point）总是 (root, '\0x', 0)。也就是说，活动节点（active_node）总是根节点（root），活动边（active_edge）是空字符 '\0x' 所指定的边，活动长度（active_length）是 0。
在每个步骤开始时，剩余后缀数（remainder）总是 1。意味着，每次我们要插入的新的后缀数目为 1，即最后一个字符。

# = 3, active_point = (root, '\0x', 1), remainder = 1

当处理第 4 字符 "a" 时，我们注意到，事实上已经存在一条边 "abca" 的前缀包含了后缀 "a"。在这种情况下：

我们不再向 root 插入一条全新的边，也就是 [3, #]。相反，既然后缀 "a" 已经被包含在树中的一条边上 "abca"，我们保留它们原来的样子。
设置 active point 为 (root, 'a', 1)，也就是说，active_node 仍为 root，active_edge 为 'a'，active_length 为 1。这就意味着，活动点现在是从根节点开始，活动边是以 'a' 开头的某个边，而位置就是在这个边的第 1 位。这个活动边的首字符为 'a'，实际上，仅会有一个边是以一个特定字符开头的。
remainder 的值需要 +1，也就是 2。

# = 4, active_point = (root, 'a', 1), remainder = 2

此时，我们还观察到：当我们要插入的后缀已经存在于树中时，这颗树实际上根本就没有改变，我们仅修改了active point 和 remainder。那么，这颗树也就不再能准确地描述当前位置了，不过它却正确地包含了所有的后缀，即使是通过隐式的方式（Implicitly）。因此，处理修改变量，这一步没有其他工作，而修改变量的时间复杂度为 O(1)。

继续处理下一个字符 "b"，"#" 继续向后挪动一位，即第 5 位时，树被自动的更新为：

由于剩余后缀数（remainder）的值为 2，所以在当前位置，我们需要插入两个最终后缀 "ab" 和 "b"。这是因为：

前一步的 "a" 实际上没有被真正的插入到树中，所以它被遗留了下来（remained），然而我们又向前迈了一步，所以它现在由 "a" 延长到 "ab"；
还有就是我们需要插入新的最终后缀 "b"；

实际操作时，我们就是修改 active point，指向 "a" 后面的位置，并且要插入新的最终后缀 "b"。但是，同样的事情又发生了，"b" 事实上已经存在于树中一条边 "bcab" 的前缀上。那么，操作可以归纳为：

修改活动点为 (root, 'a', 2)，实际还是与之前相同的边，只是将指向的位置向后挪到 "b"，修改了 active_length，即 "ab"。
增加剩余后缀数（remainder）为 3，因为我们又没有为 "b" 插入全新的边。

# = 5, active_point = (root, 'a', 2), remainder = 3

再具体一点，我们本来准备插入两个最终后缀 "ab" 和 "b"，但因为 "ab" 已经存在于其他的边的前缀中，所以我们只修改了活动点。对于 "b"，我们甚至都没有考虑要插入，为什么呢？因为如果 "ab" 存在于树中，那么他的每个后缀都一定存在于树中。虽然仅仅是隐含性的，但却一定存在，因为我们一直以来就是按照这样的方式来构建这颗树的。

继续处理下一个字符 "x"，"#" 继续向后挪动一位，即第 6 位时，树被自动的更新为：

由于剩余后缀数（Remainder）的值为 3，所以在当前位置，我们需要插入 3 个最终后缀 "abx", "bx" 和 "x"。

活动点告诉了我们之前 "ab" 结束的位置，所以仅需跳过这一位置，插入新的 "x" 后缀。"x" 在树中还不存在，因此我们分裂 "abcabx" 边，插入一个内部节点：

分裂和插入新的内部节点耗费 O(1) 时间。

现在，我们已经处理了 "abx"，并且把 remainder 减为 2。然后继续插入下一个后缀 "bx"，但做这个操作之前需要先更新活动点，这里我们先做下部分总结。

对于上面对边的分裂和插入新的边的操作，可以总结为 Rule 1，其应用于当 active_node 为 root 节点时。

Rule 1

当向根节点插入时遵循：

active_node 保持为 root；

active_edge 被设置为即将被插入的新后缀的首字符；

active_length 减 1；

因此，新的活动点为 (root, 'b', 1)，表明下一个插入一定会发生在边 "bcabx" 上，在 1 个字符之后，即 "b" 的后面。

# = 6, active_point = (root, 'b', 1), remainder = 2

我们需要检查 "x" 是否在 "b" 后面出现，如果出现了，就是我们上面见到过的样子，可以什么都不做，只更新活动点。如果未出现，则需要分裂边并插入新的边。

同样，这次操作也花费了 O(1) 时间。然后将 remainder 更新为 1，依据 Rule 1 活动点更新为 (root, 'x', 0)。

# = 6, active_point = (root, 'x', 0), remainder = 1

此时，我们将归纳出 Rule 2。

Rule 2

如果我们分裂（Split）一条边并且插入（Insert）一个新的节点，并且如果该新节点不是当前步骤中创建的第一个节点，则将先前插入的节点与该新节点通过一个特殊的指针连接，称为后缀连接（Suffix Link）。后缀连接通过一条虚线来表示。

继续上面的操作，插入最终后缀 "x"。因为活动点中的 active_length 已经降到 0，所以插入操作将发生在 root 上。由于没有以 "x" 为前缀的边，所以插入一条新的边：

这样，这一步骤中的所有操作就完成了。

# = 6, active_point = (root, '\0x', 0), remainder = 1

继续处理下一个字符 "a"，"#" 继续向后挪动一位。发现后缀 "a" 已经存在于数中的边中，所以仅更新 active point 和 remainder。

# = 7, active_point = (root, 'a', 1), remainder = 2

继续处理下一个字符 "b"，"#" 继续向后挪动一位。发现后缀 "ab" 和 "b" 都已经存在于树中，所以仅更新 active point 和 remainder。这里我们先称 "ab" 所在的边的节点为 node1。

# = 8, active_point = (root, 'a', 2), remainder = 3

继续处理下一个字符 "c"，"#" 继续向后挪动一位。此时由于 remainder = 3，所以需要插入 "abc","bc","c" 三个后缀。"c" 实际上已经存在于 node1 后的边上。

# = 9, active_point = (node1, 'c', 1), remainder = 4

继续处理下一个字符 "d"，"#" 继续向后挪动一位。此时由于 remainder = 4，所以需要插入 "abcd","bcd","cd","d" 四个后缀。

上图中的 active_node，当节点准备分裂时，被标记了红色。则归纳出了 Rule 3。

Rule 3

当从 active_node 不为 root 的节点分裂边时，我们沿着后缀连接（Suffix Link）的方向寻找节点，如果存在一个节点，则设置该节点为 active_noe；如果不存在，则设置 active_node 为 root。active_edge 和 active_length 保持不变。

所以，现在活动点为 (node2, 'c', 1)，其中 node2 为下图中的红色节点：

# = 10, active_point = (node2, 'c', 1), remainder = 3

由于对 "abcd" 的插入已经完成，所以将 remainder 的值减至 3，并且开始处理下一个剩余后缀 "bcd"。此时需要将边 "cabxabcd" 分裂，然后插入新的边 "d"。根据 Rule 2，我们需要在之前插入的节点与当前插入的节点间创建一条新的后缀连接。

此时，我们观察到，后缀连接（Suffix Link）让我们能够重置活动点，使得对下一个后缀的插入操作仅需 O(1) 时间。从上图也确认了，"ab" 连接的是其后缀 "b"，而 "abc" 连接的是其后缀 "bc"。

当前操作还没有完成，因为 remainder 是 2，根绝 Rule 3 我们需要重新设置活动点。因为上图中的红色 active_node 没有后缀连接（Suffix Link），所以活动点被设置为 root，也就是 (root, 'c', 1)。

# = 10, active_point = (root, 'c', 1), remainder = 2

因此，下一个插入操作 "cd" 将从 Root 开始，寻找以 "c" 为前缀的边 "cabxabcd"，这也引起又一次分裂：

由于此处又创建了一个新的内部节点，依据 Rule 2，我们需要建立一条与前一个被创建内节点的后缀连接。

然后，remainder 减为 1，active_node 为 root，根据 Rule 1 则活动点为 (root, 'd', 0)。也就是说，仅需在根节点上插入一条 "d" 新边。

# = 10, active_point = (root, 'd', 0), remainder = 1

整个步骤完成。

总体上看，我们有一系列的观察结果：

在每一步中将 "#" 向右移动 1 位时，所有叶节点自动更新的时间为 O(1)；
但实际上并没有处理这两种情况：
- 从前一步中遗留的后缀；
- 当前步骤中的最终字符；
remainder 告诉了我们还余下多少后缀需要插入。这些插入操作将逐个的与当前位置 "#" 之前的后缀进行对应，我们需要一个接着一个的处理。更重要的是，每次插入需要 O(1) 时间，活动点准确地告诉了我们改如何进行，并且也仅需在活动点中增加一个单独的字符。为什么？因为其他字符都隐式地被包含了，要不也就不需要 active point 了。
每次插入之后，remainder 都需要减少，如果存在后缀连接（Suffix Link）的话就续接至下一个节点，如果不存在则返回值 root 节点（Rule 3）。如果已经是在 root 节点了，则依据 Rule 1 来修改活动点。无论哪种情况，仅需 O(1) 时间。
如果这些插入操作中，如果发现要被插入的字符已经存在于树中，则什么也不做，即使 remainder > 0。原因是要被插入的字符实际上已经隐式地被包含在了当前的树中。而 remainder > 0 则确保了在后续的操作中会进行处理。
那么如果在算法结束时 remainder > 0 该怎么办？这种情况说明了文本的尾部字符串在之前某处已经出现过。此时我们需要在尾部添加一个额外的从未出现过的字符，通常使用 "$" 符号。为什么要这么做呢？如果后续我们用已经完成的后缀树来查找后缀，匹配结果一定要出现在叶子节点，否则就会出现很多假匹配，因为很多字符串已经被隐式地包含在了树中，但实际并不是真正的后缀。同时，最后也强制 remainder = 0，以此来保证所有的后缀都形成了叶子节点。尽管如此，如果想用后缀树搜索常规的子字符串，而不仅是搜索后缀，这么做就不是必要的了。
那么整个算法的复杂度是多少呢？如果 Text 的长度为 n，则有 n 步需要执行，算上 "$" 则有 n+1 步。在每一步中，我们要么什么也不做，要么执行 remainder 插入操作并消耗 O(1) 时间。因为 remainder 指示了在前一步中我们有多少无操作次数，在当前步骤中每次插入都会递减，所以总体的数量还是 n。因此总体的复杂度为 O(n)。
然而，还有一小件事我还没有进行适当的解释。那就是，当我们续接后缀连接时，更新 active point，会发现 active_length 可能与 active_node 协作的并不好。例如下面这种情况：

假设 active point 是红色节点 (red, 'd', 3)，因此它指向 "def" 边中 "f" 之后的位置。现在假设我们做了必要的更新，而且依据 Rule 3 续接了后缀连接并修改了活动点，新的 active point 是 (green, 'd', 3)。然而从绿色节点出发的 "d" 边是 "de"，这条边只有 2 个字符。为了找到合适的活动点，看起来我们需要添加一个到蓝色节点的边，然后重置活动点为 (blue, 'f', 1)。

在最坏的情况下，active_length 可以与 remainder 一样大，甚至可以与 n 一样大。而恰巧这种情况可能刚好在找活动点时发生，那么我们不仅需要跳过一个内部节点，可能是多个节点，最坏的情况是 n 个。由于每步里 remainder 是 O(n)，续接了后缀连接之后的对活动点的后续调整也是 O(n)，那么是否意味着整个算法潜在需要 O(n²) 时间呢？

我认为不是。理由是如果我们确实需要调整活动点（例如，上图中从绿色节点调整到蓝色节点），那么这就引入了一个拥有自己的后缀连接的新节点，而且 active_length 将减少。当我们沿着后缀连接向下走，就要插入剩余的后缀，且只是减少 active_length，使用这种方法可调整的活动点的数量不可能超过任何给定时刻的 active_length。由于 active_length 从来不会超过 remainder，而 remainder 不仅在每个单一步骤里是 O(n)，而且对整个处理过程进行的 remainder 递增的总数也是 O(n)，因此调整活动点的数目也就限制在了 O(n)。

代码示例

下面代码来自 GitHub 作者 Nathan Ridley。

  1 using System;
  2 using System.Collections.Generic;
  3 using System.IO;
  4 using System.Linq;
  5 using System.Text;
  6 
  7 namespace SuffixTreeAlgorithm
  8 {
  9   class Program
 10   {
 11     static void Main(string[] args)
 12     {
 13       var tree = new SuffixTree("abcabxabcd");
 14       tree.Message += (f, o) => { Console.WriteLine(f, o); };
 15       tree.Changed += (t) =>
 16       {
 17         Console.WriteLine(
 18           Environment.NewLine
 19           + t.RenderTree()
 20           + Environment.NewLine);
 21       };
 22       tree.Build('$');
 23 
 24       //SuffixTree.Create("abcabxabcd");
 25       //SuffixTree.Create("abcdefabxybcdmnabcdex");
 26       //SuffixTree.Create("abcadak");
 27       //SuffixTree.Create("dedododeeodo");
 28       //SuffixTree.Create("ooooooooo");
 29       //SuffixTree.Create("mississippi");
 30 
 31       Console.ReadKey();
 32     }
 33   }
 34 
 35   public class SuffixTree
 36   {
 37     public char? CanonizationChar { get; set; }
 38     public string Word { get; private set; }
 39     private int CurrentSuffixStartIndex { get; set; }
 40     private int CurrentSuffixEndIndex { get; set; }
 41     private Node LastCreatedNodeInCurrentIteration { get; set; }
 42     private int UnresolvedSuffixes { get; set; }
 43     public Node RootNode { get; private set; }
 44     private Node ActiveNode { get; set; }
 45     private Edge ActiveEdge { get; set; }
 46     private int DistanceIntoActiveEdge { get; set; }
 47     private char LastCharacterOfCurrentSuffix { get; set; }
 48     private int NextNodeNumber { get; set; }
 49     private int NextEdgeNumber { get; set; }
 50 
 51     public SuffixTree(string word)
 52     {
 53       Word = word;
 54       RootNode = new Node(this);
 55       ActiveNode = RootNode;
 56     }
 57 
 58     public event Action Changed;
 59     private void TriggerChanged()
 60     {
 61       var handler = Changed;
 62       if (handler != null)
 63         handler(this);
 64     }
 65 
 66     public event Action<string, object[]> Message;
 67     private void SendMessage(string format, params object[] args)
 68     {
 69       var handler = Message;
 70       if (handler != null)
 71         handler(format, args);
 72     }
 73 
 74     public static SuffixTree Create(string word, char canonizationChar = '$')
 75     {
 76       var tree = new SuffixTree(word);
 77       tree.Build(canonizationChar);
 78       return tree;
 79     }
 80 
 81     public void Build(char canonizationChar)
 82     {
 83       var n = Word.IndexOf(Word[Word.Length - 1]);
 84       var mustCanonize = n < Word.Length - 1;
 85       if (mustCanonize)
 86       {
 87         CanonizationChar = canonizationChar;
 88         Word = string.Concat(Word, canonizationChar);
 89       }
 90 
 91       for (CurrentSuffixEndIndex = 0; CurrentSuffixEndIndex < Word.Length; CurrentSuffixEndIndex++)
 92       {
 93         SendMessage("=== ITERATION {0} ===", CurrentSuffixEndIndex);
 94         LastCreatedNodeInCurrentIteration = null;
 95         LastCharacterOfCurrentSuffix = Word[CurrentSuffixEndIndex];
 96 
 97         for (CurrentSuffixStartIndex = CurrentSuffixEndIndex - UnresolvedSuffixes; CurrentSuffixStartIndex <= CurrentSuffixEndIndex; CurrentSuffixStartIndex++)
 98         {
 99           var wasImplicitlyAdded = !AddNextSuffix();
100           if (wasImplicitlyAdded)
101           {
102             UnresolvedSuffixes++;
103             break;
104           }
105           if (UnresolvedSuffixes > 0)
106             UnresolvedSuffixes--;
107         }
108       }
109     }
110 
111     private bool AddNextSuffix()
112     {
113       var suffix = string.Concat(Word.Substring(CurrentSuffixStartIndex, CurrentSuffixEndIndex - CurrentSuffixStartIndex), "{", Word[CurrentSuffixEndIndex], "}");
114       SendMessage("The next suffix of '{0}' to add is '{1}' at indices {2},{3}", Word, suffix, CurrentSuffixStartIndex, CurrentSuffixEndIndex);
115       SendMessage(" => ActiveNode:             {0}", ActiveNode);
116       SendMessage(" => ActiveEdge:             {0}", ActiveEdge == null ? "none" : ActiveEdge.ToString());
117       SendMessage(" => DistanceIntoActiveEdge: {0}", DistanceIntoActiveEdge);
118       SendMessage(" => UnresolvedSuffixes:     {0}", UnresolvedSuffixes);
119       if (ActiveEdge != null && DistanceIntoActiveEdge >= ActiveEdge.Length)
120         throw new Exception("BOUNDARY EXCEEDED");
121 
122       if (ActiveEdge != null)
123         return AddCurrentSuffixToActiveEdge();
124 
125       if (GetExistingEdgeAndSetAsActive())
126         return false;
127 
128       ActiveNode.AddNewEdge();
129       TriggerChanged();
130 
131       UpdateActivePointAfterAddingNewEdge();
132       return true;
133     }
134 
135     private bool GetExistingEdgeAndSetAsActive()
136     {
137       Edge edge;
138       if (ActiveNode.Edges.TryGetValue(LastCharacterOfCurrentSuffix, out edge))
139       {
140         SendMessage("Existing edge for {0} starting with '{1}' found. Values adjusted to:", ActiveNode, LastCharacterOfCurrentSuffix);
141         ActiveEdge = edge;
142         DistanceIntoActiveEdge = 1;
143         TriggerChanged();
144 
145         NormalizeActivePointIfNowAtOrBeyondEdgeBoundary(ActiveEdge.StartIndex);
146         SendMessage(" => ActiveEdge is now: {0}", ActiveEdge);
147         SendMessage(" => DistanceIntoActiveEdge is now: {0}", DistanceIntoActiveEdge);
148         SendMessage(" => UnresolvedSuffixes is now: {0}", UnresolvedSuffixes);
149 
150         return true;
151       }
152       SendMessage("Existing edge for {0} starting with '{1}' not found", ActiveNode, LastCharacterOfCurrentSuffix);
153       return false;
154     }
155 
156     private bool AddCurrentSuffixToActiveEdge()
157     {
158       var nextCharacterOnEdge = Word[ActiveEdge.StartIndex + DistanceIntoActiveEdge];
159       if (nextCharacterOnEdge == LastCharacterOfCurrentSuffix)
160       {
161         SendMessage("The next character on the current edge is '{0}' (suffix added implicitly)", LastCharacterOfCurrentSuffix);
162         DistanceIntoActiveEdge++;
163         TriggerChanged();
164 
165         SendMessage(" => DistanceIntoActiveEdge is now: {0}", DistanceIntoActiveEdge);
166         NormalizeActivePointIfNowAtOrBeyondEdgeBoundary(ActiveEdge.StartIndex);
167 
168         return false;
169       }
170 
171       SplitActiveEdge();
172       ActiveEdge.Tail.AddNewEdge();
173       TriggerChanged();
174 
175       UpdateActivePointAfterAddingNewEdge();
176 
177       return true;
178     }
179 
180     private void UpdateActivePointAfterAddingNewEdge()
181     {
182       if (ReferenceEquals(ActiveNode, RootNode))
183       {
184         if (DistanceIntoActiveEdge > 0)
185         {
186           SendMessage("New edge has been added and the active node is root. The active edge will now be updated.");
187           DistanceIntoActiveEdge--;
188           SendMessage(" => DistanceIntoActiveEdge decremented to: {0}", DistanceIntoActiveEdge);
189           ActiveEdge = DistanceIntoActiveEdge == 0 ? null : ActiveNode.Edges[Word[CurrentSuffixStartIndex + 1]];
190           SendMessage(" => ActiveEdge is now: {0}", ActiveEdge);
191           TriggerChanged();
192 
193           NormalizeActivePointIfNowAtOrBeyondEdgeBoundary(CurrentSuffixStartIndex + 1);
194         }
195       }
196       else
197         UpdateActivePointToLinkedNodeOrRoot();
198     }
199 
200     private void NormalizeActivePointIfNowAtOrBeyondEdgeBoundary(int firstIndexOfOriginalActiveEdge)
201     {
202       var walkDistance = 0;
203       while (ActiveEdge != null && DistanceIntoActiveEdge >= ActiveEdge.Length)
204       {
205         SendMessage("Active point is at or beyond edge boundary and will be moved until it falls inside an edge boundary");
206         DistanceIntoActiveEdge -= ActiveEdge.Length;
207         ActiveNode = ActiveEdge.Tail ?? RootNode;
208         if (DistanceIntoActiveEdge == 0)
209           ActiveEdge = null;
210         else
211         {
212           walkDistance += ActiveEdge.Length;
213           var c = Word[firstIndexOfOriginalActiveEdge + walkDistance];
214           ActiveEdge = ActiveNode.Edges[c];
215         }
216         TriggerChanged();
217       }
218     }
219 
220     private void SplitActiveEdge()
221     {
222       ActiveEdge = ActiveEdge.SplitAtIndex(ActiveEdge.StartIndex + DistanceIntoActiveEdge);
223       SendMessage(" => ActiveEdge is now: {0}", ActiveEdge);
224       TriggerChanged();
225       if (LastCreatedNodeInCurrentIteration != null)
226       {
227         LastCreatedNodeInCurrentIteration.LinkedNode = ActiveEdge.Tail;
228         SendMessage(" => Connected {0} to {1}", LastCreatedNodeInCurrentIteration, ActiveEdge.Tail);
229         TriggerChanged();
230       }
231       LastCreatedNodeInCurrentIteration = ActiveEdge.Tail;
232     }
233 
234     private void UpdateActivePointToLinkedNodeOrRoot()
235     {
236       SendMessage("The linked node for active node {0} is {1}", ActiveNode, ActiveNode.LinkedNode == null ? "[null]" : ActiveNode.LinkedNode.ToString());
237       if (ActiveNode.LinkedNode != null)
238       {
239         ActiveNode = ActiveNode.LinkedNode;
240         SendMessage(" => ActiveNode is now: {0}", ActiveNode);
241       }
242       else
243       {
244         ActiveNode = RootNode;
245         SendMessage(" => ActiveNode is now ROOT", ActiveNode);
246       }
247       TriggerChanged();
248 
249       if (ActiveEdge != null)
250       {
251         var firstIndexOfOriginalActiveEdge = ActiveEdge.StartIndex;
252         ActiveEdge = ActiveNode.Edges[Word[ActiveEdge.StartIndex]];
253         TriggerChanged();
254         NormalizeActivePointIfNowAtOrBeyondEdgeBoundary(firstIndexOfOriginalActiveEdge);
255       }
256     }
257 
258     public string RenderTree()
259     {
260       var writer = new StringWriter();
261       RootNode.RenderTree(writer, "");
262       return writer.ToString();
263     }
264 
265     public string WriteDotGraph()
266     {
267       var sb = new StringBuilder();
268       sb.AppendLine("digraph {");
269       sb.AppendLine("rankdir = LR;");
270       sb.AppendLine("edge [arrowsize=0.5,fontsize=11];");
271       for (var i = 0; i < NextNodeNumber; i++)
272         sb.AppendFormat("node{0} [label=\"{0}\",style=filled,fillcolor={1},shape=circle,width=.1,height=.1,fontsize=11,margin=0.01];",
273           i, ActiveNode.NodeNumber == i ? "cyan" : "lightgrey").AppendLine();
274       RootNode.WriteDotGraph(sb);
275       sb.AppendLine("}");
276       return sb.ToString();
277     }
278 
279     public HashSet<string> ExtractAllSubstrings()
280     {
281       var set = new HashSet<string>();
282       ExtractAllSubstrings("", set, RootNode);
283       return set;
284     }
285 
286     private void ExtractAllSubstrings(string str, HashSet<string> set, Node node)
287     {
288       foreach (var edge in node.Edges.Values)
289       {
290         var edgeStr = edge.StringWithoutCanonizationChar;
291         var edgeLength = !edge.EndIndex.HasValue && CanonizationChar.HasValue ? edge.Length - 1 : edge.Length; // assume tailing canonization char
292         for (var length = 1; length <= edgeLength; length++)
293           set.Add(string.Concat(str, edgeStr.Substring(0, length)));
294         if (edge.Tail != null)
295           ExtractAllSubstrings(string.Concat(str, edge.StringWithoutCanonizationChar), set, edge.Tail);
296       }
297     }
298 
299     public List<string> ExtractSubstringsForIndexing(int? maxLength = null)
300     {
301       var list = new List<string>();
302       ExtractSubstringsForIndexing("", list, maxLength ?? Word.Length, RootNode);
303       return list;
304     }
305 
306     private void ExtractSubstringsForIndexing(string str, List<string> list, int len, Node node)
307     {
308       foreach (var edge in node.Edges.Values)
309       {
310         var newstr = string.Concat(str, Word.Substring(edge.StartIndex, Math.Min(len, edge.Length)));
311         if (len > edge.Length && edge.Tail != null)
312           ExtractSubstringsForIndexing(newstr, list, len - edge.Length, edge.Tail);
313         else
314           list.Add(newstr);
315       }
316     }
317 
318     public class Edge
319     {
320       private readonly SuffixTree _tree;
321 
322       public Edge(SuffixTree tree, Node head)
323       {
324         _tree = tree;
325         Head = head;
326         StartIndex = tree.CurrentSuffixEndIndex;
327         EdgeNumber = _tree.NextEdgeNumber++;
328       }
329 
330       public Node Head { get; private set; }
331       public Node Tail { get; private set; }
332       public int StartIndex { get; private set; }
333       public int? EndIndex { get; set; }
334       public int EdgeNumber { get; private set; }
335       public int Length { get { return (EndIndex ?? _tree.Word.Length - 1) - StartIndex + 1; } }
336 
337       public Edge SplitAtIndex(int index)
338       {
339         _tree.SendMessage("Splitting edge {0} at index {1} ('{2}')", this, index, _tree.Word[index]);
340         var newEdge = new Edge(_tree, Head);
341         var newNode = new Node(_tree);
342         newEdge.Tail = newNode;
343         newEdge.StartIndex = StartIndex;
344         newEdge.EndIndex = index - 1;
345         Head = newNode;
346         StartIndex = index;
347         newNode.Edges.Add(_tree.Word[StartIndex], this);
348         newEdge.Head.Edges[_tree.Word[newEdge.StartIndex]] = newEdge;
349         _tree.SendMessage(" => Hierarchy is now: {0} --> {1} --> {2} --> {3}", newEdge.Head, newEdge, newNode, this);
350         return newEdge;
351       }
352 
353       public override string ToString()
354       {
355         return string.Concat(_tree.Word.Substring(StartIndex, (EndIndex ?? _tree.CurrentSuffixEndIndex) - StartIndex + 1), "(",
356           StartIndex, ",", EndIndex.HasValue ? EndIndex.ToString() : "#", ")");
357       }
358 
359       public string StringWithoutCanonizationChar
360       {
361         get { return _tree.Word.Substring(StartIndex, (EndIndex ?? _tree.CurrentSuffixEndIndex - (_tree.CanonizationChar.HasValue ? 1 : 0)) - StartIndex + 1); }
362       }
363 
364       public string String
365       {
366         get { return _tree.Word.Substring(StartIndex, (EndIndex ?? _tree.CurrentSuffixEndIndex) - StartIndex + 1); }
367       }
368 
369       public void RenderTree(TextWriter writer, string prefix, int maxEdgeLength)
370       {
371         var strEdge = _tree.Word.Substring(StartIndex, (EndIndex ?? _tree.CurrentSuffixEndIndex) - StartIndex + 1);
372         writer.Write(strEdge);
373         if (Tail == null)
374           writer.WriteLine();
375         else
376         {
377           var line = new string(RenderChars.HorizontalLine, maxEdgeLength - strEdge.Length + 1);
378           writer.Write(line);
379           Tail.RenderTree(writer, string.Concat(prefix, new string(' ', strEdge.Length + line.Length)));
380         }
381       }
382 
383       public void WriteDotGraph(StringBuilder sb)
384       {
385         if (Tail == null)
386           sb.AppendFormat("leaf{0} [label=\"\",shape=point]", EdgeNumber).AppendLine();
387         string label, weight, color;
388         if (_tree.ActiveEdge != null && ReferenceEquals(this, _tree.ActiveEdge))
389         {
390           if (_tree.ActiveEdge.Length == 0)
391             label = "";
392           else if (_tree.DistanceIntoActiveEdge > Length)
393             label = "<" + String + " (" + _tree.DistanceIntoActiveEdge + ")>";
394           else if (_tree.DistanceIntoActiveEdge == Length)
395             label = "<" + String + ">";
396           else if (_tree.DistanceIntoActiveEdge > 0)
397             label = "<" + String.Substring(0, _tree.DistanceIntoActiveEdge) + " " + String.Substring(_tree.DistanceIntoActiveEdge) + "
>";
398           else
399             label = "\"" + String + "\"";
400           color = "blue";
401           weight = "5";
402         }
403         else
404         {
405           label = "\"" + String + "\"";
406           color = "black";
407           weight = "3";
408         }
409         var tail = Tail == null ? "leaf" + EdgeNumber : "node" + Tail.NodeNumber;
410         sb.AppendFormat("node{0} -> {1} [label={2},weight={3},color={4},size=11]", Head.NodeNumber, tail, label, weight, color).AppendLine();
411         if (Tail != null)
412           Tail.WriteDotGraph(sb);
413       }
414     }
415 
416     public class Node
417     {
418       private readonly SuffixTree _tree;
419 
420       public Node(SuffixTree tree)
421       {
422         _tree = tree;
423         Edges = new Dictionary<char, Edge>();
424         NodeNumber = _tree.NextNodeNumber++;
425       }
426 
427       public Dictionary<char, Edge> Edges { get; private set; }
428       public Node LinkedNode { get; set; }
429       public int NodeNumber { get; private set; }
430 
431       public void AddNewEdge()
432       {
433         _tree.SendMessage("Adding new edge to {0}", this);
434         var edge = new Edge(_tree, this);
435         Edges.Add(_tree.Word[_tree.CurrentSuffixEndIndex], edge);
436         _tree.SendMessage(" => {0} --> {1}", this, edge);
437       }
438 
439       public void RenderTree(TextWriter writer, string prefix)
440       {
441         var strNode = string.Concat("(", NodeNumber.ToString(new string('0', _tree.NextNodeNumber.ToString().Length)), ")");
442         writer.Write(strNode);
443         var edges = Edges.Select(kvp => kvp.Value).OrderBy(e => _tree.Word[e.StartIndex]).ToArray();
444         if (edges.Any())
445         {
446           var prefixWithNodePadding = prefix + new string(' ', strNode.Length);
447           var maxEdgeLength = edges.Max(e => (e.EndIndex ?? _tree.CurrentSuffixEndIndex) - e.StartIndex + 1);
448           for (var i = 0; i < edges.Length; i++)
449           {
450             char connector, extender = ' ';
451             if (i == 0)
452             {
453               if (edges.Length > 1)
454               {
455                 connector = RenderChars.TJunctionDown;
456                 extender = RenderChars.VerticalLine;
457               }
458               else
459                 connector = RenderChars.HorizontalLine;
460             }
461             else
462             {
463               writer.Write(prefixWithNodePadding);
464               if (i == edges.Length - 1)
465                 connector = RenderChars.CornerRight;
466               else
467               {
468                 connector = RenderChars.TJunctionRight;
469                 extender = RenderChars.VerticalLine;
470               }
471             }
472             writer.Write(string.Concat(connector, RenderChars.HorizontalLine));
473             var newPrefix = string.Concat(prefixWithNodePadding, extender, ' ');
474             edges[i].RenderTree(writer, newPrefix, maxEdgeLength);
475           }
476         }
477       }
478 
479       public override string ToString()
480       {
481         return string.Concat("node #", NodeNumber);
482       }
483 
484       public void WriteDotGraph(StringBuilder sb)
485       {
486         if (LinkedNode != null)
487           sb.AppendFormat("node{0} -> node{1} [label=\"\",weight=.01,style=dotted]", NodeNumber, LinkedNode.NodeNumber).AppendLine();
488         foreach (var edge in Edges.Values)
489           edge.WriteDotGraph(sb);
490       }
491     }
492 
493     public static class RenderChars
494     {
495       public const char TJunctionDown = '┬';
496       public const char HorizontalLine = '─';
497       public const char VerticalLine = '│';
498       public const char TJunctionRight = '├';
499       public const char CornerRight = '└';
500     }
501   }
502 }

运行结果

测试 Text = "abcabxabcd"。

=== ITERATION 0 ===
The next suffix of 'abcabxabcd' to add is '{a}' at indices 0,0
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'a' not found
Adding new edge to node #0
 => node #0 --> a(0,#)

(0)──a


=== ITERATION 1 ===
The next suffix of 'abcabxabcd' to add is '{b}' at indices 1,1
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'b' not found
Adding new edge to node #0
 => node #0 --> b(1,#)

(0)┬─ab
   └─b


=== ITERATION 2 ===
The next suffix of 'abcabxabcd' to add is '{c}' at indices 2,2
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'c' not found
Adding new edge to node #0
 => node #0 --> c(2,#)

(0)┬─abc
   ├─bc
   └─c


=== ITERATION 3 ===
The next suffix of 'abcabxabcd' to add is '{a}' at indices 3,3
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'a' found. Values adjusted to:

(0)┬─abca
   ├─bca
   └─ca


 => ActiveEdge is now: abca(0,#)
 => DistanceIntoActiveEdge is now: 1
 => UnresolvedSuffixes is now: 0
=== ITERATION 4 ===
The next suffix of 'abcabxabcd' to add is 'a{b}' at indices 3,4
 => ActiveNode:             node #0
 => ActiveEdge:             abcab(0,#)
 => DistanceIntoActiveEdge: 1
 => UnresolvedSuffixes:     1
The next character on the current edge is 'b' (suffix added implicitly)

(0)┬─abcab
   ├─bcab
   └─cab


 => DistanceIntoActiveEdge is now: 2
=== ITERATION 5 ===
The next suffix of 'abcabxabcd' to add is 'ab{x}' at indices 3,5
 => ActiveNode:             node #0
 => ActiveEdge:             abcabx(0,#)
 => DistanceIntoActiveEdge: 2
 => UnresolvedSuffixes:     2
Splitting edge abcabx(0,#) at index 2 ('c')
 => Hierarchy is now: node #0 --> ab(0,1) --> node #1 --> cabx(2,#)
 => ActiveEdge is now: ab(0,1)

(0)┬─ab────(1)──cabx
   ├─bcabx
   └─cabx


Adding new edge to node #1
 => node #1 --> x(5,#)

(0)┬─ab────(1)┬─cabx
   │          └─x
   ├─bcabx
   └─cabx


New edge has been added and the active node is root. The active edge will now be
 updated.
 => DistanceIntoActiveEdge decremented to: 1
 => ActiveEdge is now: bcabx(1,#)

(0)┬─ab────(1)┬─cabx
   │          └─x
   ├─bcabx
   └─cabx


The next suffix of 'abcabxabcd' to add is 'b{x}' at indices 4,5
 => ActiveNode:             node #0
 => ActiveEdge:             bcabx(1,#)
 => DistanceIntoActiveEdge: 1
 => UnresolvedSuffixes:     1
Splitting edge bcabx(1,#) at index 2 ('c')
 => Hierarchy is now: node #0 --> b(1,1) --> node #2 --> cabx(2,#)
 => ActiveEdge is now: b(1,1)

(0)┬─ab───(1)┬─cabx
   │         └─x
   ├─b────(2)──cabx
   └─cabx


 => Connected node #1 to node #2

(0)┬─ab───(1)┬─cabx
   │         └─x
   ├─b────(2)──cabx
   └─cabx


Adding new edge to node #2
 => node #2 --> x(5,#)

(0)┬─ab───(1)┬─cabx
   │         └─x
   ├─b────(2)┬─cabx
   │         └─x
   └─cabx


New edge has been added and the active node is root. The active edge will now be
 updated.
 => DistanceIntoActiveEdge decremented to: 0
 => ActiveEdge is now:

(0)┬─ab───(1)┬─cabx
   │         └─x
   ├─b────(2)┬─cabx
   │         └─x
   └─cabx


The next suffix of 'abcabxabcd' to add is '{x}' at indices 5,5
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'x' not found
Adding new edge to node #0
 => node #0 --> x(5,#)

(0)┬─ab───(1)┬─cabx
   │         └─x
   ├─b────(2)┬─cabx
   │         └─x
   ├─cabx
   └─x


=== ITERATION 6 ===
The next suffix of 'abcabxabcd' to add is '{a}' at indices 6,6
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'a' found. Values adjusted to:

(0)┬─ab────(1)┬─cabxa
   │          └─xa
   ├─b─────(2)┬─cabxa
   │          └─xa
   ├─cabxa
   └─xa


 => ActiveEdge is now: ab(0,1)
 => DistanceIntoActiveEdge is now: 1
 => UnresolvedSuffixes is now: 0
=== ITERATION 7 ===
The next suffix of 'abcabxabcd' to add is 'a{b}' at indices 6,7
 => ActiveNode:             node #0
 => ActiveEdge:             ab(0,1)
 => DistanceIntoActiveEdge: 1
 => UnresolvedSuffixes:     1
The next character on the current edge is 'b' (suffix added implicitly)

(0)┬─ab─────(1)┬─cabxab
   │           └─xab
   ├─b──────(2)┬─cabxab
   │           └─xab
   ├─cabxab
   └─xab


 => DistanceIntoActiveEdge is now: 2
Active point is at or beyond edge boundary and will be moved until it falls insi
de an edge boundary

(0)┬─ab─────(1)┬─cabxab
   │           └─xab
   ├─b──────(2)┬─cabxab
   │           └─xab
   ├─cabxab
   └─xab


=== ITERATION 8 ===
The next suffix of 'abcabxabcd' to add is 'ab{c}' at indices 6,8
 => ActiveNode:             node #1
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     2
Existing edge for node #1 starting with 'c' found. Values adjusted to:

(0)┬─ab──────(1)┬─cabxabc
   │            └─xabc
   ├─b───────(2)┬─cabxabc
   │            └─xabc
   ├─cabxabc
   └─xabc


 => ActiveEdge is now: cabxabc(2,#)
 => DistanceIntoActiveEdge is now: 1
 => UnresolvedSuffixes is now: 2
=== ITERATION 9 ===
The next suffix of 'abcabxabcd' to add is 'abc{d}' at indices 6,9
 => ActiveNode:             node #1
 => ActiveEdge:             cabxabcd(2,#)
 => DistanceIntoActiveEdge: 1
 => UnresolvedSuffixes:     3
Splitting edge cabxabcd(2,#) at index 3 ('a')
 => Hierarchy is now: node #1 --> c(2,2) --> node #3 --> abxabcd(3,#)
 => ActiveEdge is now: c(2,2)

(0)┬─ab───────(1)┬─c─────(3)──abxabcd
   │             └─xabcd
   ├─b────────(2)┬─cabxabcd
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


Adding new edge to node #3
 => node #3 --> d(9,#)

(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─cabxabcd
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


The linked node for active node node #1 is node #2
 => ActiveNode is now: node #2

(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─cabxabcd
   │             └─xabcd
   ├─cabxabcd
   └─xabcd



(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─cabxabcd
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


The next suffix of 'abcabxabcd' to add is 'bc{d}' at indices 7,9
 => ActiveNode:             node #2
 => ActiveEdge:             cabxabcd(2,#)
 => DistanceIntoActiveEdge: 1
 => UnresolvedSuffixes:     2
Splitting edge cabxabcd(2,#) at index 3 ('a')
 => Hierarchy is now: node #2 --> c(2,2) --> node #4 --> abxabcd(3,#)
 => ActiveEdge is now: c(2,2)

(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─c─────(4)──abxabcd
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


 => Connected node #3 to node #4

(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─c─────(4)──abxabcd
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


Adding new edge to node #4
 => node #4 --> d(9,#)

(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─c─────(4)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


The linked node for active node node #2 is [null]
 => ActiveNode is now ROOT

(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─c─────(4)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─cabxabcd
   └─xabcd



(0)┬─ab───────(1)┬─c─────(3)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─b────────(2)┬─c─────(4)┬─abxabcd
   │             │          └─d
   │             └─xabcd
   ├─cabxabcd
   └─xabcd


The next suffix of 'abcabxabcd' to add is 'c{d}' at indices 8,9
 => ActiveNode:             node #0
 => ActiveEdge:             cabxabcd(2,#)
 => DistanceIntoActiveEdge: 1
 => UnresolvedSuffixes:     1
Splitting edge cabxabcd(2,#) at index 3 ('a')
 => Hierarchy is now: node #0 --> c(2,2) --> node #5 --> abxabcd(3,#)
 => ActiveEdge is now: c(2,2)

(0)┬─ab────(1)┬─c─────(3)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─b─────(2)┬─c─────(4)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─c─────(5)──abxabcd
   └─xabcd


 => Connected node #4 to node #5

(0)┬─ab────(1)┬─c─────(3)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─b─────(2)┬─c─────(4)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─c─────(5)──abxabcd
   └─xabcd


Adding new edge to node #5
 => node #5 --> d(9,#)

(0)┬─ab────(1)┬─c─────(3)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─b─────(2)┬─c─────(4)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─c─────(5)┬─abxabcd
   │          └─d
   └─xabcd


New edge has been added and the active node is root. The active edge will now be
 updated.
 => DistanceIntoActiveEdge decremented to: 0
 => ActiveEdge is now:

(0)┬─ab────(1)┬─c─────(3)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─b─────(2)┬─c─────(4)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─c─────(5)┬─abxabcd
   │          └─d
   └─xabcd


The next suffix of 'abcabxabcd' to add is '{d}' at indices 9,9
 => ActiveNode:             node #0
 => ActiveEdge:             none
 => DistanceIntoActiveEdge: 0
 => UnresolvedSuffixes:     0
Existing edge for node #0 starting with 'd' not found
Adding new edge to node #0
 => node #0 --> d(9,#)

(0)┬─ab────(1)┬─c─────(3)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─b─────(2)┬─c─────(4)┬─abxabcd
   │          │          └─d
   │          └─xabcd
   ├─c─────(5)┬─abxabcd
   │          └─d
   ├─d
   └─xabcd

后缀树的应用

查找字符串 Pattern 是否在于字符串 Text 中
- 方案：用 Text 构造后缀树，按在 Trie 中搜索字串的方法搜索 Pattern 即可。若 Pattern 在 Text 中，则 Pattern 必然是 Text 的某个后缀的前缀。
计算指定字符串 Pattern 在字符串 Text 中的出现次数
- 方案：用 Text+'$' 构造后缀树，搜索 Pattern 所在节点下的叶节点数目即为重复次数。如果 Pattern 在 Text 中重复了 c 次，则 Text 应有 c 个后缀以 Pattern 为前缀。
查找字符串 Text 中的最长重复子串
- 方案：用 Text+'$' 构造后缀树，搜索 Pattern 所在节点下的最深的非叶节点。从 root 到该节点所经历过的字符串就是最长重复子串。
查找两个字符串 Text1 和 Text2 的最长公共部分
- 方案：连接 Text1+'#' + Text2+'$' 形成新的字符串并构造后缀树，找到最深的非叶节点，且该节点的叶节点既有 '#' 也有 '$'。
查找给定字符串 Text 里的最长回文
- 回文指："abcdefgfed" 中对称的字符串 "defgfed"。
- 回文半径指：回文 "defgfed" 的回文半径 "defg" 长度为 4，半径中心为字母 "g"。
- 方案：将 Text 整体反转形成新的字符串 Text2，例如 "abcdefgfed" => "defgfedcba"。连接 Text+'#' + Text2+'$' 形成新的字符串并构造后缀树，然后将问题转变为查找 Text 和 Text1 的最长公共部分。

参考资料

Pattern Searching | Set 8 (Suffix Tree Introduction)
后缀树的构造方法-Ukkonen详解
Ukkonen’s Suffix Tree Construction – Part 1
Suffix Trees
Compressed Trie
Pattern Searching using a Trie of all Suffixes
Algorithms on Strings, Trees, and Sequences
C# Suffix tree implementation based on Ukkonen's algorithm
Ukkonen's suffix tree algorithm in plain English?
Ukkonen 的后缀树算法的清晰解释
Fast String Searching With Suffix Trees
Esko Ukkonen's Paper: On–line construction of suffix trees
Graphviz - Graph Visualization Software
a suffix tree algorithm for .NET written in C#

本文《后缀树》由 Dennis Gao 发表自博客园，未经作者本人同意禁止任何形式的转载，任何自动或人为的爬虫行为均为耍流氓。

标签: Algorithm, Data Structures, Trie, Suffix Tree

绿色通道：好文要顶关注我收藏该文与我联系

Dennis Gao
关注 - 42
粉丝 - 1380

荣誉：推荐博客

+加关注

(请您对文章做出评价)

快速评论返回顶部

posted @ 2014-10-27 22:12 Dennis Gao 阅读( 659) 评论( 5) 编辑收藏

你可能感兴趣的:(算法)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
Goolge earth studio 进阶4——路径修改与平滑陟彼高冈yu Google earth studio 进阶教程旅游
如果我们希望在大约中途时获得更多的城市鸟瞰视角。可以将相机拖动到这里并创建一个新的关键帧。camera_target_clip_7EarthStudio会自动平滑我们的路径，所以当我们通过这个关键帧时，不是一个生硬的角度，而是一个平滑的曲线。camera_target_clip_8路径上有贝塞尔控制手柄，允许我们调整路径的形状。右键单击，我们可以选择“平滑路径”，这是默认的自动平滑算法，或者我们可
基于社交网络算法优化的二维最大熵图像分割智能算法研学社（Jack旭）智能优化算法应用图像分割算法 php 开发语言
智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码文章目录智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码1.前言2.二维最大熵阈值分割原理3.基于社交网络优化的多阈值分割4.算法结果：5.参考文献：6.Matlab代码摘要：本文介绍基于最大熵的图像分割，并且应用社交网络算法进行阈值寻优。1.前言阅读此文章前，请阅读《图像分割：直方图区域划分及信息统计介绍》htt
121. 买卖股票的最佳时机薄荷糖的味道_fb40
给定一个数组，它的第i个元素是一支给定股票第i天的价格。如果你最多只允许完成一笔交易（即买入和卖出一支股票），设计一个算法来计算你所能获取的最大利润。注意你不能在买入股票前卖出股票。示例1:输入:[7,1,5,3,6,4]输出:5解释:在第2天（股票价格=1）的时候买入，在第5天（股票价格=6）的时候卖出，最大利润=6-1=5。注意利润不能是7-1=6,因为卖出价格需要大于买入价格。示例2:输入:
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
回溯算法-重新安排行程 chirou_ 算法数据结构图论 c++图搜索
leetcode332.重新安排行程这题我还没自己ac过，只能现在凭着刚学完的热乎劲把我对题解的理解记下来。本题我认为对数据结构的考察比较多，用什么数据结构去存数据，去读取数据，都是很重要的。classSolution{private:unordered_map>targets;boolbacktracking(intticketNum,vector&result){//1.确定参数和返回值//2
Faiss：高效相似性搜索与聚类的利器网络·魚大数据 faiss
Faiss是一个针对大规模向量集合的相似性搜索库，由FacebookAIResearch开发。它提供了一系列高效的算法和数据结构，用于加速向量之间的相似性搜索，特别是在大规模数据集上。本文将介绍Faiss的原理、核心功能以及如何在实际项目中使用它。Faiss原理：近似最近邻搜索：Faiss的核心功能之一是近似最近邻搜索，它能够高效地在大规模数据集中找到与给定查询向量最相似的向量。这种搜索是近似的，
insert into select 主键自增_mybatis拦截器实现主键自动生成 weixin_39521651 insert into select 主键自增 mybatis delete返回值 mybatis insert返回主键 mybatis insert返回对象 mybatis plus insert返回主键 mybatis plus 插入生成id
前言前阵子和朋友聊天，他说他们项目有个需求，要实现主键自动生成，不想每次新增的时候，都手动设置主键。于是我就问他，那你们数据库表设置主键自动递增不就得了。他的回答是他们项目目前的id都是采用雪花算法来生成，因此为了项目稳定性，不会切换id的生成方式。朋友问我有没有什么实现思路，他们公司的orm框架是mybatis，我就建议他说，不然让你老大把mybatis切换成mybatis-plus。mybat
k均值聚类算法考试例题_k均值算法(k均值聚类算法计算题) 寻找你83497 k均值聚类算法考试例题
?算法：第一步：选K个初始聚类中心，z1(1),z2(1)，…，zK(1)，其中括号内的序号为寻找聚类中心的迭代运算的次序号。聚类中心的向量值可任意设定，例如可选开始的K个.k均值聚类：---------一种硬聚类算法，隶属度只有两个取值0或1，提出的基本根据是“类内误差平方和最小化”准则；模糊的c均值聚类算法：--------一种模糊聚类算法，是.K均值聚类算法是先随机选取K个对象作为初始的聚类
Python实现简单的机器学习算法 master_chenchengg python python 办公效率 python开发 IT
Python实现简单的机器学习算法开篇：初探机器学习的奇妙之旅搭建环境：一切从安装开始必备工具箱第一步：安装Anaconda和JupyterNotebook小贴士：如何配置Python环境变量算法初体验：从零开始的Python机器学习线性回归：让数据说话数据准备：从哪里找数据编码实战：Python实现线性回归模型评估：如何判断模型好坏逻辑回归：从分类开始理论入门：什么是逻辑回归代码实现：使用skl
推荐算法_隐语义-梯度下降 _feivirus_ 算法机器学习和数学推荐算法机器学习隐语义
importnumpyasnp1.模型实现"""inputrate_matrix:M行N列的评分矩阵，值为P*Q.P:初始化用户特征矩阵M*K.Q:初始化物品特征矩阵K*N.latent_feature_cnt:隐特征的向量个数max_iteration:最大迭代次数alpha:步长lamda:正则化系数output分解之后的P和Q"""defLFM_grad_desc(rate_matrix,l
K近邻算法_分类鸢尾花数据集 _feivirus_ 算法机器学习和数学分类机器学习 K近邻
importnumpyasnpimportpandasaspdfromsklearn.datasetsimportload_irisfromsklearn.model_selectionimporttrain_test_splitfromsklearn.metricsimportaccuracy_score1.数据预处理iris=load_iris()df=pd.DataFrame(data=ir
数据结构 | 栈和队列 TT-Kun 数据结构与算法数据结构栈队列 C语言
文章目录栈和队列1.栈：后进先出（LIFO）的数据结构1.1概念与结构1.2栈的实现2.队列：先进先出（FIFO）的数据结构2.1概念与结构2.2队列的实现3.栈和队列算法题3.1有效的括号3.2用队列实现栈3.3用栈实现队列3.4设计循环队列结论栈和队列在计算机科学中，栈和队列是两种基本且重要的数据结构，它们在处理数据存储和访问顺序方面有着独特的规则和应用。本文将详细介绍栈和队列的概念、结构、实
[Python] 数据结构详解及代码 AIAdvocate 算法 python 数据结构链表
今日内容大纲介绍数据结构介绍列表链表1.数据结构和算法简介程序大白话翻译,程序=数据结构+算法数据结构指的是存储,组织数据的方式.算法指的是为了解决实际业务问题而思考思路和方法,就叫:算法.2.算法的5大特性介绍算法具有独立性算法是解决问题的思路和方式,最重要的是思维,而不是语言,其(算法)可以通过多种语言进行演绎.5大特性有输入,需要传入1或者多个参数有输出,需要返回1个或者多个结果有穷性,执行
Python算法L5：贪心算法小熊同学哦 Python算法算法 python 贪心算法
Python贪心算法简介目录Python贪心算法简介贪心算法的基本步骤贪心算法的适用场景经典贪心算法问题1.**零钱兑换问题**2.**区间调度问题**3.**背包问题**贪心算法的优缺点优点：缺点：结语贪心算法（GreedyAlgorithm）是一种在每一步选择中都采取当前最优或最优解的算法。它的核心思想是，在保证每一步局部最优的情况下，希望通过贪心选择达到全局最优解。虽然贪心算法并不总能得到全
【RabbitMQ 项目】服务端：数据管理模块之绑定管理月夜星辉雪 rabbitmq 分布式
文章目录一.编写思路二.代码实践一.编写思路定义绑定信息类交换机名称队列名称绑定关键字：交换机的路由交换算法中会用到没有是否持久化的标志，因为绑定是否持久化取决于交换机和队列是否持久化，只有它们都持久化时绑定才需要持久化。绑定就好像一根绳子，两端连接着交换机和队列，当一方不存在，它就没有存在的必要了定义绑定持久化类构造函数：如果数据库文件不存在则创建，打开数据库，创建binding_table插入
非对称加密算法原理与应用2——RSA私钥加密文件私语茶馆云部署与开发架构及产品灵感记录 RSA2048 私钥加密
作者：私语茶馆1.相关章节（1）非对称加密算法原理与应用1——秘钥的生成-CSDN博客第一章节讲述的是创建秘钥对，并将公钥和私钥导出为文件格式存储。本章节继续讲如何利用私钥加密内容，包括从密钥库或文件中读取私钥，并用RSA算法加密文件和String。2.私钥加密的概述本文主要基于第一章节的RSA2048bit的非对称加密算法讲述如何利用私钥加密文件。这种加密后的文件，只能由该私钥对应的公钥来解密。
粒子群优化 (PSO) 在三维正弦波函数中的应用 subject625Ruben 机器学习人工智能 matlab 算法
在这篇博客中，我们将展示如何使用粒子群优化（PSO）算法求解三维正弦波函数，并通过增加正弦波扰动，使优化过程更加复杂和有趣。本文将介绍目标函数的定义、PSO参数设置以及算法执行的详细过程，并展示搜索空间中的动态过程和收敛曲线。1.目标函数定义我们使用的目标函数是一个三维正弦波函数，定义如下：objectiveFunc=@(x)sin(sqrt(x(1).^2+x(2).^2))+0.5*sin(5
非对称加密算法————RSA理论及详情 hu19930613
转自：https://www.kancloud.cn/kancloud/rsa_algorithm/48484一、一点历史1976年以前，所有的加密方法都是同一种模式：（1）甲方选择某一种加密规则，对信息进行加密；（2）乙方使用同一种规则，对信息进行解密。由于加密和解密使用同样规则（简称"密钥"），这被称为"对称加密算法"（Symmetric-keyalgorithm）。这种加密模式有一个最大弱点
ai绘画工具midjourney怎么下载？附作品管理教程设计师早上好
Midjourney是一款功能强大的AI绘画工具，它使用机器学习技术和深度神经网络等算法，可以生成各种艺术风格的绘画作品。在创意设计、广告宣传等方面有着广泛的应用前景。那么，ai绘画工具midjourney怎么下载？本文将为您介绍Midjourney的下载以及作品的相关管理。一、Midjourney下载Midjourney的下载非常简单，只需打开Midjourney官网（点击“GetMidjour
【加密算法基础——对称加密和非对称加密】 XWWW668899 网络安全服务器笔记
对称加密与非对称加密对称加密和非对称加密是两种基本的加密方法，各自有不同的特点和用途。以下是详细比较：1.对称加密特点密钥:使用相同的密钥进行加密和解密。发送方和接收方必须共享这个密钥。速度:通常速度较快，适合处理大量数据。实现:算法相对简单，计算效率高。常见算法AES(高级加密标准)DES(数据加密标准)3DES(三重数据加密标准)RC4(流密码)应用场景文件加密磁盘加密传输大量数据时的加密2.
【算法练习】IDEA集成leetcode插件实现快速刷 2401_84102892 2024年程序员学习算法 intellij-idea leetcode
============点击右侧边leetcode->设置->配置地址、用户名、密码、存放目录、文件模板用户名要登录后在账号信息里看模板代码1.codefilename!velocityTool.camelC
【加密算法基础——RSA 加密】 XWWW668899 网络服务器笔记 python
RSA加密RSA（Rivest-Shamir-Adleman）加密是非对称加密，一种广泛使用的公钥加密算法，主要用于安全数据传输。公钥用于加密，私钥用于解密。RSA加密算法的名称来源于其三位发明者的姓氏：R:RonRivestS:AdiShamirA:LeonardAdleman这三位计算机科学家在1977年共同提出了这一算法，并发表了相关论文。他们的工作为公钥加密的基础奠定了重要基础，使得安全通
机器学习-聚类算法不良人龍木木机器学习机器学习算法聚类
机器学习-聚类算法1.AHC2.K-means3.SC4.MCL仅个人笔记，感谢点赞关注！1.AHC2.K-means3.SC传统谱聚类：个人对谱聚类算法的理解以及改进4.MCL目前仅专注于NLP的技术学习和分享感谢大家的关注与支持！
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
高性能javascript--算法和流程控制海淀萌狗
-for,while和do-while性能相当-避免使用for-in循环，==除非遍历一个属性量未知的对象==es5:for-in遍历的对象便不局限于数组，还可以遍历对象。原因：for-in每次迭代操作会同时搜索实例或者原型属性，for-in循环的每次迭代都会产生更多开销，因此要比其他循环类型慢，一般速度为其他类型循环的1/7。因此，除非明确需要迭代一个属性数量未知的对象，否则应避免使用for-i
深度 Qlearning：在直播推荐系统中的应用 AGI通用人工智能之禅程序员提升自我硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
深度Q-learning：在直播推荐系统中的应用关键词：深度Q-learning,强化学习,直播推荐系统,个性化推荐1.背景介绍1.1问题的由来随着互联网技术的飞速发展,直播平台如雨后春笋般涌现。面对海量的直播内容,用户很难快速找到自己感兴趣的内容。因此,个性化推荐系统在直播平台中扮演着越来越重要的角色。1.2研究现状目前,主流的个性化推荐算法包括协同过滤、基于内容的推荐等。这些方法在一定程度上缓
JVM源码分析之堆外内存完全解读 HeapDump性能社区
概述广义的堆外内存说到堆外内存，那大家肯定想到堆内内存，这也是我们大家接触最多的，我们在jvm参数里通常设置-Xmx来指定我们的堆的最大值，不过这还不是我们理解的Java堆，-Xmx的值是新生代和老生代的和的最大值，我们在jvm参数里通常还会加一个参数-XX:MaxPermSize来指定持久代的最大值，那么我们认识的Java堆的最大值其实是-Xmx和-XX:MaxPermSize的总和，在分代算法
《算法》四学习——1.1节进阶的Farmer 算法算法笔记
前言买了一本算法4，每天看一点，对每个小结来个学习总结，输出驱动输入。本篇笔记针对第一章基础1.1基础编程模型1.1节总结了相关的语法、语言特性和书中将会用到的库。笔记自己在编码中容易遗漏的点&&优先级比||高在开发中习惯了加括号，所以没注意到这点，教材上也有但是忘记了二分查找中计算mid=left+(right-left)/2这样计算可以有效避免(left+right)/2溢出答疑java无穷大
排序路小白同学
1.冒泡排序冒泡算法是一种基础的排序算法，这种算法会重复的比较数组中相邻的两个元素。如果一个元素比另一个元素大（小），那么就交换这两个元素的位置。重复这一比较直至最后一个元素。这一比较会重复n-1趟，每一趟比较n-j次，j是已经排序好的元素个数。每一趟比较都能找出未排序元素中最大或者最小的那个数字。这就如同水泡从水底逐个飘到水面一样。冒泡排序是一种时间复杂度较高，效率较低的排序方法。其空间复杂度是
矩阵求逆（JAVA）初等行变换 qiuwanchi 矩阵求逆（JAVA）
package gaodai.matrix; import gaodai.determinant.DeterminantCalculation; import java.util.ArrayList; import java.util.List; import java.util.Scanner; /** * 矩阵求逆(初等行变换) * @author 邱万迟 *
JDK timer antlove java jdk schedule code timer
1.java.util.Timer.schedule(TimerTask task, long delay)：多长时间（毫秒）后执行任务 2.java.util.Timer.schedule(TimerTask task, Date time)：设定某个时间执行任务 3.java.util.Timer.schedule(TimerTask task, long delay,longperiod
JVM调优总结 -Xms -Xmx -Xmn -Xss coder_xpf jvm 应用服务器
堆大小设置JVM 中最大堆大小有三方面限制：相关操作系统的数据模型（32-bt还是64-bit）限制；系统的可用虚拟内存限制；系统的可用物理内存限制。32位系统下，一般限制在1.5G~2G；64为操作系统对内存无限制。我在Windows Server 2003 系统，3.5G物理内存，JDK5.0下测试，最大可设置为1478m。典型设置： java -Xmx
JDBC连接数据库 Array_06 jdbc
package Util; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class JDBCUtil { //完
Unsupported major.minor version 51.0（jdk版本错误） oloz java
java.lang.UnsupportedClassVersionError: cn/support/cache/CacheType : Unsupported major.minor version 51.0 (unable to load class cn.support.cache.CacheType) at org.apache.catalina.loader.WebappClassL
用多个线程处理1个List集合 362217990 多线程 thread list 集合
昨天发了一个提问，启动5个线程将一个List中的内容，然后将5个线程的内容拼接起来，由于时间比较急迫，自己就写了一个Demo，希望对菜鸟有参考意义。。 import java.util.ArrayList; import java.util.List; import java.util.concurrent.CountDownLatch; public c
JSP简单访问数据库香水浓 sql mysql jsp
学习使用javaBean，代码很烂，仅为留个脚印 public class DBHelper { private String driverName; private String url; private String user; private String password; private Connection connection; privat
Flex4中使用组件添加柱状图、饼状图等图表 AdyZhang Flex
1.添加一个最简单的柱状图 ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 <?xml version= "1.0"&n
Android 5.0 - ProgressBar 进度条无法展示到按钮的前面 aijuans android
在低于SDK < 21 的版本中，ProgressBar 可以展示到按钮前面，并且为之在按钮的中间，但是切换到android 5.0后进度条ProgressBar 展示顺序变化了，按钮再前面，ProgressBar 在后面了我的xml配置文件如下： [html] view plain copy <RelativeLa
查询汇总的sql baalwolf sql
select list.listname, list.createtime,listcount from dream_list as list , (select listid,count(listid) as listcount from dream_list_user group by listid order by count(
Linux du命令和df命令区别 BigBird2012 linux
1，两者区别 du，disk usage,是通过搜索文件来计算每个文件的大小然后累加，du能看到的文件只是一些当前存在的，没有被删除的。他计算的大小就是当前他认为存在的所有文件大小的累加和。
AngularJS中的$apply，用还是不用？ bijian1013 JavaScript AngularJS $apply
在AngularJS开发中，何时应该调用$scope.$apply()，何时不应该调用。下面我们透彻地解释这个问题。但是首先，让我们把$apply转换成一种简化的形式。 scope.$apply就像一个懒惰的工人。它需要按照命
[Zookeeper学习笔记十]Zookeeper源代码分析之ClientCnxn数据序列化和反序列化 bit1129 zookeeper
ClientCnxn是Zookeeper客户端和Zookeeper服务器端进行通信和事件通知处理的主要类，它内部包含两个类，1. SendThread 2. EventThread， SendThread负责客户端和服务器端的数据通信，也包括事件信息的传输，EventThread主要在客户端回调注册的Watchers进行通知处理 ClientCnxn构造方法 &
【Java命令一】jmap bit1129 Java命令
jmap命令的用法： [hadoop@hadoop sbin]$ jmap Usage: jmap [option] <pid> (to connect to running process) jmap [option] <executable <core> (to connect to a
Apache 服务器安全防护及实战 ronin47
此文转自IBM. Apache 服务简介 Web 服务器也称为 WWW 服务器或 HTTP 服务器 (HTTP Server)，它是 Internet 上最常见也是使用最频繁的服务器之一，Web 服务器能够为用户提供网页浏览、论坛访问等等服务。由于用户在通过 Web 浏览器访问信息资源的过程中，无须再关心一些技术性的细节，而且界面非常友好，因而 Web 在 Internet 上一推出就得到
unity 3d实例化位置出现布置？ brotherlamp unity教程 unity unity资料 unity视频 unity自学
问：unity 3d实例化位置出现布置？答：实例化的同时就可以指定被实例化的物体的位置,即 position Instantiate (original : Object, position : Vector3, rotation : Quaternion) : Object 这样你不需要再用Transform.Position了, 如果你省略了第二个参数(
《重构，改善现有代码的设计》第八章 Duplicate Observed Data bylijinnan java 重构
import java.awt.Color; import java.awt.Container; import java.awt.FlowLayout; import java.awt.Label; import java.awt.TextField; import java.awt.event.FocusAdapter; import java.awt.event.FocusE
struts2更改struts.xml配置目录 chiangfai struts.xml
struts2默认是读取classes目录下的配置文件，要更改配置文件目录，比如放在WEB-INF下，路径应该写成../struts.xml(非/WEB-INF/struts.xml) web.xml文件修改如下： <filter> <filter-name>struts2</filter-name> <filter-class&g
redis做缓存时的一点优化 chenchao051 redis hadoop pipeline
最近集群上有个job，其中需要短时间内频繁访问缓存，大概7亿多次。我这边的缓存是使用redis来做的，问题就来了。首先，redis中存的是普通kv，没有考虑使用hash等解结构，那么以为着这个job需要访问7亿多次redis，导致效率低，且出现很多redi
mysql导出数据不输出标题行 daizj mysql 数据导出去掉第一行去掉标题
当想使用数据库中的某些数据，想将其导入到文件中，而想去掉第一行的标题是可以加上-N参数如通过下面命令导出数据： mysql -uuserName -ppasswd -hhost -Pport -Ddatabase -e " select * from tableName" > exportResult.txt 结果为： studentid
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
先下载PHPEXCEL类文件，放在class目录下面，然后新建一个index.php文件，内容如下 <?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('
爱情格言 dcj3sjt126com 格言
1) I love you not because of who you are, but because of who I am when I am with you. 　　我爱你，不是因为你是一个怎样的人，而是因为我喜欢与你在一起时的感觉。 　　2) No man or woman is worth your tears, and the one who is, won‘t
转 Activity 详解——Activity文档翻译 e200702084 android UI sqlite 配置管理网络应用
activity 展现在用户面前的经常是全屏窗口，你也可以将 activity 作为浮动窗口来使用（使用设置了 windowIsFloating 的主题），或者嵌入到其他的 activity （使用 ActivityGroup ）中。当用户离开 activity 时你可以在 onPause() 进行相应的操作。更重要的是，用户做的任何改变都应该在该点上提交 ( 经常提交到 ContentPro
win7安装MongoDB服务 geeksun mongodb
1. 下载MongoDB的windows版本：mongodb-win32-x86_64-2008plus-ssl-3.0.4.zip，Linux版本也在这里下载，下载地址： http://www.mongodb.org/downloads 2. 解压MongoDB在D:\server\mongodb, 在D:\server\mongodb下创建d
Javascript魔法方法:__defineGetter__,__defineSetter__ hongtoushizi js
转载自： http://www.blackglory.me/javascript-magic-method-definegetter-definesetter/ 在javascript的类中,可以用defineGetter和defineSetter_控制成员变量的Get和Set行为例如,在一个图书类中,我们自动为Book加上书名符号: function Book(name){
错误的日期格式可能导致走nginx proxy cache时不能进行304响应 jinnianshilongnian cache
昨天在整合某些系统的nginx配置时，出现了当使用nginx cache时无法返回304响应的情况，出问题的响应头： Content-Type:text/html; charset=gb2312 Date:Mon, 05 Jan 2015 01:58:05 GMT Expires:Mon , 05 Jan 15 02:03:00 GMT Last-Modified:Mon, 05
数据源架构模式之行数据入口 home198979 PHP 架构行数据入口
注：看不懂的请勿踩，此文章非针对java，java爱好者可直接略过。一、概念行数据入口（Row Data Gateway）：充当数据源中单条记录入口的对象，每行一个实例。二、简单实现行数据入口为了方便理解，还是先简单实现： <?php /** * 行数据入口类 */ class OrderGateway { /*定义元数
Linux各个目录的作用及内容 pda158 linux 脚本
1）根目录“/” 　　根目录位于目录结构的最顶层，用斜线（/）表示，类似于 Windows 操作系统的“C:\“，包含Fedora操作系统中所有的目录和文件。　　2）/bin 　　/bin 　　目录又称为二进制目录，包含了那些供系统管理员和普通用户使用的重要 linux命令的二进制映像。该目录存放的内容包括各种可执行文件，还有某些可执行文件的符号连接。常用的命令有：cp、d
ubuntu12.04上编译openjdk7 ol_beta HotSpot jvm jdk OpenJDK
获取源码从openjdk代码仓库获取(比较慢) 安装mercurial Mercurial是一个版本管理工具。 sudo apt-get install mercurial 将以下内容添加到$HOME/.hgrc文件中，如果没有则自己创建一个： [extensions] forest=/home/lichengwu/hgforest-crew/forest.py fe
将数据库字段转换成设计文档所需的字段 vipbooks 设计模式工作正则表达式
哈哈，出差这么久终于回来了，回家的感觉真好！ PowerDesigner的物理数据库一出来，设计文档中要改的字段就多得不计其数，如果要把PowerDesigner中的字段一个个Copy到设计文档中，那将会是一件非常痛苦的事情。