decruft(A library to extract meaningful data from a webpage) 源码分析
开源Python模块,http://code.google.com/p/decruft/decruft使用example,fromdecruftimportDocument#importurllib2#f=urllib2.open('url')f=open('index.html','a')printDocument(f.read()).summary()分析一下summary的实现,总体来说并没