今天想用谷歌的Spelling-Suggestion功能,也就是“您是不是要找:*” 或者“Did you mean *”这一功能。
结果在网上一通找,大多都是已经停止服务的办法。也就是“/tbproxy/spell?lang=en
”相关的问题。
如参考网址1所示。该方式已经被谷歌所停用。
后来发现了参考网页2所述内容:利用Google Custom Search API for spelling check
详细阅读了里面的各种说明(见参考网页3),在网页上测试终于成功。比如输入“丁军辉”,会给出相应的suggestion,为“丁俊晖”。
如参考网页4所示。
后来用python实现了一下,发现返回的结果,都是以&#x开头的特殊编码方式,需要将其转换为汉字。参考网页5,实现成功。
详细代码如下:
# -*- coding: utf-8 -*- import time import os import sys import re import urllib2,urllib import HTMLParser reload(sys) sys.setdefaultencoding('utf-8') class getGoogleSuggestion: def __init__(self): self.cx = '012080660999116631289:zlpj9ypbnii' def getSuggestion(self,query): url = ('http://www.google.com/search?' 'q=%s' '&hl=zh' '&output=xml' '&client=google-csbe' '&cx=%s')%(urllib.quote(query),self.cx) request = urllib2.Request(url, None) response = urllib2.urlopen(request).read() h= HTMLParser.HTMLParser() print (h.unescape(response)) if __name__=='__main__': test = getGoogleSuggestion() keyword = '丁军辉' test.getSuggestion(keyword)
得到的结果如下所示,为XML。
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <!DOCTYPE GSP SYSTEM "google.dtd"> <GSP VER="3.2"> <ERROR>403</ERROR><TM>0.028615</TM><Q>丁军辉</Q> <PARAM name="q" value="丁军辉" original_value="%E4%B8%81%E5%86%9B%E8%BE%89" url_escaped_value="%E4%B8%81%E5%86%9B%E8%BE%89" js_escaped_value="丁军辉"></PARAM><PARAM name="hl" value="zh" original_value="zh" url_escaped_value="zh" js_escaped_value="zh"></PARAM><PARAM name="output" value="xml" original_value="xml" url_escaped_value="xml" js_escaped_value="xml"></PARAM><PARAM name="client" value="google-csbe" original_value="google-csbe" url_escaped_value="google-csbe" js_escaped_value="google-csbe"></PARAM><PARAM name="cx" value="012080660999116631289:zlpj9ypbnii" original_value="012080660999116631289:zlpj9ypbnii" url_escaped_value="012080660999116631289%3Azlpj9ypbnii" js_escaped_value="012080660999116631289:zlpj9ypbnii"></PARAM><Spelling><Suggestion q="丁俊晖"><em>丁俊晖</em></Suggestion></Spelling></GSP>
若无拼写建议,则没有该字段,如下所示。
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?> <!DOCTYPE GSP SYSTEM "google.dtd"> <GSP VER="3.2"> <ERROR>403</ERROR><TM>0.037868</TM><Q>丁俊晖</Q> <PARAM name="q" value="丁俊晖" original_value="%E4%B8%81%E4%BF%8A%E6%99%96" url_escaped_value="%E4%B8%81%E4%BF%8A%E6%99%96" js_escaped_value="丁俊晖"></PARAM><PARAM name="hl" value="zh" original_value="zh" url_escaped_value="zh" js_escaped_value="zh"></PARAM><PARAM name="output" value="xml" original_value="xml" url_escaped_value="xml" js_escaped_value="xml"></PARAM><PARAM name="client" value="google-csbe" original_value="google-csbe" url_escaped_value="google-csbe" js_escaped_value="google-csbe"></PARAM><PARAM name="cx" value="012080660999116631289:zlpj9ypbnii" original_value="012080660999116631289:zlpj9ypbnii" url_escaped_value="012080660999116631289%3Azlpj9ypbnii" js_escaped_value="012080660999116631289:zlpj9ypbnii"></PARAM></GSP>
参考:
1.http://stackoverflow.com/questions/8428767/how-to-implement-python-spell-checker-using-googles-did-you-mean
2.http://stackoverflow.com/questions/11948945/google-custom-search-api-for-spelling-check
3.https://developers.google.com/custom-search/docs/xml_results?hl=en#wsAdvancedSearch
上述网页的 XML Results for Regular and Advanced Search Queries 部分。
4.http://www.google.com/search?q=%E4%B8%81%E5%86%9B%E8%BE%89&output=xml&client=google-csbe&cx=00255077836266642015:u-scht7a-8i
5.http://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string