TRIE - Data Structure

Introduction

介绍

Trie,又称单词查找树,是一种树形结构,用于保存大量的字符串。它的优点是:利用字符串的公共前缀来节约存储空间。

Trie is an ordered tree data structure that uses strings as keys. Unlike Binary Trees, Tries do not store keys associated with the node. The key is actually determined based on the position of the node on the tree. Any descendants of a node shares a common prefix of the key string associated with that node. Hence, trie is also called as Prefix Tree. The word "trie" comes from Retrieval, and it is pronounced as "try".

Trie是一种以字符串为键的有序树状数据结构,与二叉树不同的是, Trie并不会存储节点键值,节点键值是被节点在树状数据结构中的位置所决定的。一个特定节点的所有子孙的键值都有一个公共的前缀字符,所有Trie也被称为前缀树。 单词“Trie”来自于英文单词“Retrieve”,其发音与单词“Try”相同。

Since this data structure is a prefix tree, trie is commonly used in Dictionaries, Phone Directories and matching algorithms. Trie is best-suited for phone directory (any matching application for that matter) because it is very efficient in matching strings.

因为这个数据结构是一个前缀树,Trie通常被用在字典算法中,尤其电话号码目录的匹配算法中。在字符串匹配中Trie是非常高效的,对于电话号码目录是非常适合的。

So I have decided to implement Trie myself in C#. I have created three classes:

所以我决定使用C#来实现一个Trie数据结构,定义了以下一些类:

  • Node: Represents a single tree node;
  • Node:树节点类;
  • NodeCollection: Represents the children of a node;
  • NodeCollection:树节点结合类;
  • Trie: Trie implementation to insert and search nodes.
  • TrieTrie类实现了节点的查找以及插入操作。

Implementation

实现

Node: Node represents a basic tree node. Node implements both Depth First and Breadth First algorithms to search its children. It also contains its Parent node and Children node. Node has a key and a Value. Key contains the single character and the value has the actual value of the node. The actual key of the node will be determined by suffixing the single character to its parent's key. Node has a special property called IsTerminal. This property is set to true if the key or a value represents a complete string.

NodeNode类代表了一个基本的节点类,该类同时实现了深度优先和广度优先的子节点查找算法,该类包含了对父节点以及子节点的引用。Node类有两个属性KeyValueKey是一个单个字符,节点类真正的键值是以其Key属性为后缀,然后依次加上其父节点的Key属性。另外Node还有一个属性叫IsTerminal,来表明其ValueKey是否表示了一个完整的字符串。

See the picture below:

参见下面的图片:

 TRIE - Data Structure

NodeCollection: This class is a simple collection class which implements the IEnumerable interface for iteration operations. This class contains an internal List class of type Node. NodeCollection implements the standard list methods such as Add( ), Contains( ), Remove( ) etc.

NodeCollection:该类是一个简单的集合类型,实现了用于枚举的IEnumerable接口,其内部包含了一个List<Node>类型,支持一些集合类的标准操作例如添加,删除以及判断是否包含指定节点等等操作。

Trie: This class implements the Trie algorithms such as Insert and Find methods.

Trie:该类实现了Trie算法,实现了插入和查找操作。

示例代码:

// Inserts Names into the Trie data structure
public   static  Node InsertNode( string  name, Node root)
{
    
// Is name null?
     if  ( string .IsNullOrEmpty(name))
        
throw   new  ArgumentNullException( " Null Key " );

    
// set the index, start inserting characters
     int  index  =   1 ;

    
// key
     string  key;

    
// start with the root node
    Node currentNode  =  root;

    
// loop for all charecters in the name
     while  (index  <=  name.Length)
    {
        
// get the key character
        key  =  name[index  -   1 ].ToString();

        
// does the node with same key already exist?
        Node resultNode  =  currentNode.Children.GetNodeByKey(key);

        
// No, this is a new key
         if  (resultNode  ==   null )
        {
            
// Add a node
            Node newNode  =   new  Node(key, name.Substring( 0 , index));

            
// If reached the last charaecter, this is a valid full name
             if  (index  ==  name.Length)
                newNode.IsTerminal 
=   true ;

            
// add the node to currentNode(i.e. Root node for the first time)
            currentNode.Children.Add(newNode);

            
// set as the current node
            currentNode  =  newNode;
        }
        
else
        {
            
// node already exist, set as tghe current node
            
// and move to the next character in the name
            currentNode  =  resultNode;
        }
        
// move to the next character in the name
        index ++ ;
    }
    
// all done, return root node
     return  root;
}

The Insert method inserts the string as one character at a time. It starts with the first character; if the first character doesn't already exist in the root node it adds a new node with the new character and returns the new node. Otherwise it returns the node with the fist character for adding remaining characters. It loops until it adds the entire string. Once it reaches the last character, it marks that node as a terminal node because this node represents a complete string in the tree hierarchy.

插入操作再插入一个字符串的时候,从第一个字符开始,每次只处理一个字符;如果第一个字符在根节点的子节点中没有存在,那么会使用该字符添加一个新的节点然后返回,否则返回已经存在的节点,然后依次循环后面的字符串。一旦到达最后一个字符串,就会标识该节点为一个终止节点(IsTerminalTrue),因为在整个树结构上其表示了一个完整的字符串。

The Find methods is implemented by Depth First search algorithm. The tree is searched until the complete string is found. Below is the code.

查找方法实现了深度优先的查找算法,整个树形数据结构将被查找直至该字符串被找到。下面是示例代码:


// Find a node given the key("Jo")
public   static   bool  Find(Node node,  string  key){    

    
// Is key empty
     if  ( string .IsNullOrEmpty(key))
        
return   true ; // terminal Node

    
// get the first character
     string  first  =  key.Substring( 0 1 );

    
// get the tail: key - first character
     string  tail  =  key.Substring( 1 );
    Node curNode 
=  node.Children.GetNodeByKey(first);

    
// loop until you locate the key i.e. "Jo"
     if  (curNode  !=   null )
    {
        
return  Find(curNode, tail);
    }
    
else
    {
        
// not found, return false
         return   false ;
    }
}

 

 

I've attached the entire source code above. The source code contains the Trie class library and a console application to test the Trie library. The console application loads a set of names (stored in names.txt in debug folder) in to the tree and provides options to run Depth First & Breadth First algorithm. The application also provides options for Directory Look-Up and Find option.

我已经将源代码添加在附件中了,源代码中包含了Trie算法类库一个Console测试程序。Console测试程序会加载一些字符串(存储在Debug文件夹下的names.txt文件中)Trie树上,并且可以在深度优先以及广度优先切换算法。

The class library can be further used to develop a web based phone directory. The data can also be stored on the client (it is too small) and the Trie can be implemented in JavaScript.

这个类库可以被进一步开发成为一个基于Web的电话目录,数据可以存储在客户端,然后使用JavaScript来实现Trie算法。

Happy Coding

编程快乐!

原贴地址:http://www.codeproject.com/KB/recipes/PhoneDirectory.aspx

 

你可能感兴趣的:(struct)