Python SuffixTree (后缀树)中文 AutoComplete 算法

最近javaeye的python板块实在是太不活跃了,发一个有意思的开源程序,给大家玩玩,这个程序代码是后缀树,一般用于autoComplete,还不知到的同学赶紧来看看吧 :)
开源地址:https://github.com/edisonlz/suffixTree_ch



o SuffixTree.SuffixTree -- The suffix tree structure. This is a
thin wrapper around strmat's stree data structure. This isn't a
complete wrapper yet; I need to find some time to complete this.
The wrapper appears to be good enough for simple stuff.

Methods of SuffixTree:

o SuffixTree(alphabet=STREE_ASCII)

Construct a new SuffixTree. By default, the alphabet
used by the SuffixTree is ASCII. Other choices include
STREE_DNA, STREE_RNA, and STREE_PROTEIN.

o add(string, id)

Adds a string to the suffix tree with an id.

o root()

Returns the root() SuffixNode of the tree.

o num_nodes():

Returns the total number of nodes held in the tree.

o match(string)

Given a string, traverse the suffix tree and return a
3-tuple (match_length, suffix_node, endpos)


o SuffixTree.SuffixNode (I need to fix the documentation here)

Methods of
num_children()
find_child(char ch)
children()
next()
parent()
suffix_link()
edgelen()
edgestr()
getch()
labellen()
labelstr()
ident()
num_leaves()
leaf(int leafnum)


o SuffixTree.SubstringDict -- An application of suffix trees toward
substring matching. An example might help:

>>> #coding=utf-8
>>> from SuffixTree import SubstringDict


>>> sd = SubstringDict()
>>> sd.__setitem__("我是python程序员",1)
>>> sd.__setitem__("我是ruby程序员",2)
>>> sd.__setitem__("我是javascript程序员",3)
>>> sd.__setitem__("我是android程序员",4)
>>> sd.__setitem__("我还是DBA",4)
>>> print sd[“我是”]
>>> print sd[“我还是”]


>>> sd = SubstringDict()
>>> sd["我是python程序员"] = 1
>>> sd["我是ruby程序员"] = 2
>>> sd["我是javascript程序员"] = 3
>>> sd["我是android程序员"] = 4
>>> sd["我还是DBA"] = 5
>>> print sd[“我还是”]


SubstringDict provides a mapping that allows for substrings of
keys. The keys do need to be strings though.

支持中文的方式是使用 base64,数据量回增加30%,对性能回有些损耗,但是,损耗不大

64 位 安装 :
ARCHFLAGS="-arch i386 -arch x86_64" python setup.py installPython SuffixTree (后缀树)中文

你可能感兴趣的:(Python SuffixTree (后缀树)中文 AutoComplete 算法)