搜索常用到的“Do you mean"功能的算法(收藏)

文章来自:build-your-own-finite-state-transducer

Have you always wanted your very own Lucene finite state transducer (FST) but you couldn't figure out how to use Lucene's crazy APIs?

Then today is your lucky day! I just built a simple web application that creates an FST from the input/output strings that you enter.

If you just want a finite state automaton (no outputs) then enter only inputs, such as this example:



If all of your outputs are non-negative integers then the FST will use numeric outputs, where you sum up the outputs as you traverse a path to get the final output:

Finally, if the outputs are non-numeric then they are treated as strings, in which case you concatenate as you traverse the path:

The red arcs are the ones with the NEXT optimization: these arcs do not store a pointer to a node because their to-node is the very next node in the FST. This is a good optimization: it generally results in a large reduction of the FST size. The bolded arcs tell you the next node is final; this is most interesting when a prefix of another input is accepted, such as this example:



Here the "r" arc is bolded, telling you that "star" is accepted. Furthermore, that node following the "r" arc has a final output, telling you the overall output for "star" is "abc".

The web app is a simple Python WSGI app; source code is here. It invokes a simple Java tool as a subprocess; source code (including generics violations!) is here.

你可能感兴趣的:(Lucene,FST,whoosh)