如何使用Trie树,设计实践Google一样的输入提示功能

如何使用Trie树,设计实践Google一样的输入提示功能_第1张图片

来源 | 搜索技术
责编 | 小白

Google和百度都支持输入提示功能,辅助你快速准确的输入想要的内容。

如下:输入“五一”,会提示“五一劳动节”等。

那如何实现谷歌这样的输入提示功能呢?

分析下输入提示的功能需求

当输入前面的词A,希望提示出前缀为A的所有高相关性的词。这个特性属于前缀匹配,trie树被称为前缀树,是一种搜索排序树,很适合用作输入提示的实践。

下面以python3为例,使用Trie树,构建输入提示服务。

# Python3 program to demonstrate auto-complete  # feature using Trie data structure. # Note: This is a basic implementation of Trie # and not the most optimized one. class TrieNode():     def __init__(self):
        # Initialising one node for trie         self.children = {}         self.last = False
class Trie():     def __init__(self):
        # Initialising the trie structure.         self.root = TrieNode()         self.word_list = []
    def formTrie(self, keys):
        # Forms a trie structure with the given set of strings         # if it does not exists already else it merges the key         # into it by extending the structure as required         for key in keys:             self.insert(key) # inserting one key to the trie.
    def insert(self, key):
        # Inserts a key into trie if it does not exist already.         # And if the key is a prefix of the trie node, just          # marks it as leaf node.         node = self.root
        for a in list(key):             if not node.children.get(a):                 node.children[a] = TrieNode()
            node = node.children[a]
        node.last = True
    def search(self, key):
        # Searches the given key in trie for a full match         # and returns True on success else returns False.         node = self.root         found = True
        for a in list(key):             if not node.children.get(a):                 found = False                break
            node = node.children[a]
        return node and node.last and found
    def suggestionsRec(self, node, word):
        # Method to recursively traverse the trie         # and return a whole word.          if node.last:             self.word_list.append(word)
        for a,n in node.children.items():             self.suggestionsRec(n, word + a)
    def printAutoSuggestions(self, key):
        # Returns all the words in the trie whose common         # prefix is the given key thus listing out all          # the suggestions for autocomplete.         node = self.root         not_found = False        temp_word = ''
        for a in list(key):             if not node.children.get(a):                 not_found = True                break
            temp_word += a             node = node.children[a]
        if not_found:             return 0        elif node.last and not node.children:             return -1
        self.suggestionsRec(node, temp_word)
        for s in self.word_list:             print(s)         return 1
# Driver Codekeys = ["五一", "五一劳动节", "五一放假安排", "五一劳动节图片", "五一劳动节图片 2020", "五一劳动节快乐", "五一晚会", "五一假期", "五一快乐","五一节快乐", "五花肉",        "五行", "五行相生"] # keys to form the trie structure.key = "五一" # key for autocomplete suggestions.status = ["Not found", "Found"]
# creating trie objectt = Trie()
# creating the trie structure with the# given set of strings.t.formTrie(keys)
# autocompleting the given key using# our trie structure.comp = t.printAutoSuggestions(key)
if comp == -1:    print("No other strings found with this prefix\n")elif comp == 0:    print("No string found with this prefix\n")
# This code is contributed by amurdia

输入:五一,输入提示结果如下:

如何使用Trie树,设计实践Google一样的输入提示功能_第2张图片

结果都实现了,但我们实现后的输入提示顺序跟Google有点不一样,那怎么办呢?

一般构建输入提示的数据源都是用户输入的query词的日志数据,并且会统计每个输入词的次数,以便按照输入词的热度给用户提示。

现在我们把日志词库加上次数,来模拟Google的输入效果。

日志库的查询词及个数示例如下:

五一劳动节 10五一劳动节图片 9五一假期 8五一劳动节快乐 7五一放假安排 6五一晚会 5五一 4五一快乐 3五一劳动节图片2020 2五一快乐 1

把输入提示的代码调整下,支持查询词次数的支持:

# Python3 program to demonstrate auto-complete  # feature using Trie data structure. # Note: This is a basic implementation of Trie # and not the most optimized one. import operatorclass TrieNode():     def __init__(self):                   # Initialising one node for trie         self.children = {}         self.last = False  class Trie():     def __init__(self):                   # Initialising the trie structure.         self.root = TrieNode()         #self.word_list = []         self.word_list = {}      def formTrie(self, keys):                   # Forms a trie structure with the given set of strings         # if it does not exists already else it merges the key         # into it by extending the structure as required         for key in keys:             self.insert(key) # inserting one key to the trie.       def insert(self, key):                   # Inserts a key into trie if it does not exist already.         # And if the key is a prefix of the trie node, just          # marks it as leaf node.         node = self.root           for a in list(key):             if not node.children.get(a):                 node.children[a] = TrieNode()               node = node.children[a]           node.last = True      def search(self, key):                   # Searches the given key in trie for a full match         # and returns True on success else returns False.         node = self.root         found = True          for a in list(key):             if not node.children.get(a):                 found = False                break              node = node.children[a]           return node and node.last and found       def suggestionsRec(self, node, word):                   # Method to recursively traverse the trie         # and return a whole word.          if node.last:             #self.word_list.append(word)            ll = word.split(',')            if(len(ll) >= 2):                self.word_list[ll[0]] = int(ll[1])            else:                self.word_list[ll[0]] = 0          for a,n in node.children.items():             self.suggestionsRec(n, word + a)       def printAutoSuggestions(self, key):                   # Returns all the words in the trie whose common         # prefix is the given key thus listing out all          # the suggestions for autocomplete.         node = self.root         not_found = False        temp_word = ''           for a in list(key):             if not node.children.get(a):                 not_found = True                break              temp_word += a             node = node.children[a]           if not_found:             return 0        elif node.last and not node.children:             return -1          self.suggestionsRec(node, temp_word)           #sort        sorted_d = dict(sorted(self.word_list.items(), key=operator.itemgetter(1),reverse=True))        for s in sorted_d.keys():             print(s)         return 1
# Driver Codekeys = ["五一,4", "五一劳动节,10", "五一放假安排,6", "五一劳动节图片,9", "五一劳动节图片 2020,2", "五一劳动节快乐,7", "五一晚会,5", "五一假期,8", "五一快乐,3","五一节快乐,1", "五花肉,0",        "五行,0", "五行相生,0"] # keys to form the trie structure.key = "五一" # key for autocomplete suggestions.status = ["Not found", "Found"]
# creating trie objectt = Trie()
# creating the trie structure with the# given set of strings.t.formTrie(keys)
# autocompleting the given key using# our trie structure.comp = t.printAutoSuggestions(key)
if comp == -1:    print("No other strings found with this prefix\n")elif comp == 0:    print("No string found with this prefix\n")
# This code is contributed by amurdia

输出结果跟Google一模一样:

如何使用Trie树,设计实践Google一样的输入提示功能_第3张图片

总结:

以上是使用Trie树,实践Google输入提示的功能。除了Trie树实践,我们还有其他办法么,搜索中有没有其他的索引能很好实现输入提示的功能呢?

更多阅读推荐

  • 云原生体系下的技海浮沉与理论探索

  • 如何通过 Serverless 轻松识别验证码?

  • 5G与金融行业融合应用的场景探索

  • 打破“打工人”魔咒,RPA 来狙击!

  • 使用 SQL 语句实现一个年会抽奖程序

你可能感兴趣的:(python,数据分析,搜索引擎,数据库,kubernetes)