[SearchEngine]text summarizer技术

Kevin的工具:Tailrank text summarizer,是展示language categorization and text summarizer technology的,也就是说,他搜索你提供的blog永久连接时,就可以解析出何种语言,以及自动提炼出blog的概述(summarizer)

我到http://tools.tailrank.com/上试验了一把,比如提供这么一个地址:

http://weblogs.java.net/blog/tomwhite/archive/2005/09/mapreduce.html

Tailrank解析的结果是:

Resulting Summary

summary:

NDFS provides a fault-tolerant environment for working with very large files using cheap commodity hardware. This processing model is ideal for the operations a search engine indexer like Nutch or Google needs to perform - like computing inlinks for URLs, or building inverted indexes - and it will transform Nutch into a scalable, distributed search engine. Currently MapReduce is a part of Nutch, but it has been proposed that it and NDFS be moved into a separate project.

title: Tom White's Blog

lang: en

当然,第一提炼得不准,第二很多blog url都不能算出来。



Trackback: http://tb.blog.csdn.net/TrackBack.aspx?PostId=605933


你可能感兴趣的:(search)