掌握搜索系统中的邻近搜索,并实现临近搜索中两个倒排记录表的搜索算法。
充分理解邻近搜索中两个倒排记录表的搜索算法,并通过python编程实现。当用户在提示后输入查询语句即可以实现临近搜索中两个倒排记录表的搜索算法。
系统读取预设文档返回所有可查询的词项,用户通过提示输入查询词项,系统分别计算所有词项所在的文档及其倒排记录表,然后执行临近搜索中两个倒排记录表的搜索算法,并将合并结果输出。
分为提示输入模块与临近搜索中两个倒排记录表的搜索算法模块两个功能模块。
createdict函数为功能函数,用来创建文档字典。createdict函数调用了python字符串处理的re库,处理预设的文档,返回所有词项用于提示用户可选词项,并计算所有词项的倒排记录表。
def createdict(f0):
dl = list(set(re.split('[ \n?!,.;]', f0)))
dl.pop(0)
d = f0.split('\n')
dict1 = {}
dict0 = {}
for word in dl:
for i in range(len(d)):
d0 = re.split('[ \n?!,.;]', d[i])
if word in d0:
dict1[i + 1] = []
for j in range(len(d0)):
if word == d0[j]:
dict1[i + 1].append(j + 1)
dict0[word] = dict1
dict1 = {}
return dict0
PositionalIntersect函数为临近搜索中两个倒排记录表的搜索算法模块,首先获取输入倒排记录表的所有文档ID,然后循环文档ID,获取该文档ID下的倒排记录表,再通过循环获取倒排记录表元素,判断两个元素的距离是否小于等于预设距离,满足则存储文档ID、词项1的倒排记录表、词项2的倒排记录表,不满足则继续循环,最终返回该结果列表。
def PositionalIntersect(p1, p2, k):
r = []
k1, k2 = [key for key in p1], [key for key in p2]
i, j = 0, 0
while(i < len(p1) and j < len(p2)):
if(k1[i] == k2[j]):
l = []
pp1, pp2 = p1[k1[i]], p2[k2[j]]
i1, j1 = 0, 0
while(i1 < len(pp1)):
while(j1 < len(pp2)):
if(abs(pp1[i1] - pp2[j1]) <= k):
l.append(pp2[j1])
elif(pp2[j1] > pp1[i1]):
break
j1 = j1 + 1
while(l != [] and abs(l[0] - pp1[i1]) > k):
del(l[0])
for n in range(0, len(l)):
r.append([k1[i], pp1[i1], l[n]])
i1 = i1 + 1
i = i + 1
j = j + 1
elif(k1[i] > k2[j]):
j = j + 1
else:
i = i + 1
return r
下面的p1、p2为调试的倒排记录表。
import re
p1 = {1: [7, 18, 33, 72, 86, 231], 2: [1, 17, 74, 222, 255], 4: [8, 16, 190, 429, 433], 5: [363, 367], 7: [13, 23, 191]}
p2 = {1: [17, 25], 4: [17, 191, 291, 430, 434], 5: [14, 19, 101]}
f = open("document.txt", "r")
f0 = f.read()
f.close()
dict0 = createdict(f0)
k = [key for key in dict0]
print("可供查询的词项为:", k, "\n")
print("请输入临近搜索要查询的第一个词项:", end = '')
p1 = dict0[input()]
print("请输入临近搜索要查询的第二个词项:", end = '')
p2 = dict0[input()]
print("临近搜索结果为:\n", PositionalIntersect(p1, p2, 1))
document.txt模拟文档如下,应该可以用任意一篇英文文档尝试。
There are moments in life when you miss someone so much that you just want to pick them from your dreams and hug them for real! Dream what you want to dream;go where you want to go;be what you want to be,because you have only one life and one chance to do all the things you want to do.
May you have enough happiness to make you sweet,enough trials to make you strong,enough sorrow to keep you human,enough hope to make you happy? Always put yourself in others’shoes.If you feel that it hurts you,it probably hurts the other person, too.
The happiest of people don’t necessarily have the best of everything;they just make the most of everything that comes along their way.Happiness lies for those who cry,those who hurt, those who have searched,and those who have tried,for only they can appreciate the importance of people
Who have touched their lives.Love begins with a smile,grows with a kiss and ends with a tear.The brightest future will always be based on a forgotten past, you can’t go on well in life until you let go of your past failures and heartaches.
When you were born,you were crying and everyone around you was smiling.Live your life so that when you die,you’re the one who is smiling and everyone around you is crying.
Please send this message to those people who mean something to you,to those who have touched your life in one way or another,to those who make you smile when you really need it,to those that make you see the brighter side of things when you are really down,to those who you want to let them know that you appreciate their friendship.And if you don’t, don’t worry,nothing bad will happen to you,you will just miss out on the opportunity to brighten someone’s day with this message.
根据提示分别输入词项,临近距离即程序PositionalIntersect(p1, p2, k)中的k值设置为1,得到结果如下图。
搜索结果的三个位置数字的含义分别为:第6个文档、send在当前文档第2个位置、this在当前文档第3个位置。现在将距离设置为5再次查询,结果如下图。
在程序调试过程中,预设p1 = {1: [7, 18, 33, 72, 86, 231], 2: [1, 17, 74, 222, 255], 4: [8, 16, 190, 429, 433], 5: [363, 367], 7: [13, 23, 191]},p2 = {1: [17, 25], 4: [17, 191, 291, 430, 434], 5: [14, 19, 101]},运行得到结果如下图所示。