nutch存储数据文件sequencefile mapfile对应keyValue


crawldb
(org.apache.hadoop.io.Text,org.apache.nutch.crawl.CrawlDatum)
segments/content
(org.apache.hadoop.io.Text,org.apache.nutch.protocol.Content)
segments/crawl_fetch
(org.apache.hadoop.io.Text,org.apache.nutch.crawl.CrawlDatum)
segments/parse_data
(org.apache.hadoop.io.Text,org.apache.nutch.parse.ParseData)
segments/parse_text
(org.apache.hadoop.io.Text,org.apache.nutch.parse.ParseText)
segments/crawl_generate
(org.apache.hadoop.io.Text,org.apache.nutch.crawl.CrawlDatum)
segments/crawl_parse
(org.apache.hadoop.io.Text,org.apache.nutch.crawl.CrawlDatum)

[url]https://github.com/apache/nutch/blob/branch-1.7/src/java/org/apache/nutch/crawl/CrawlDatum.java[/url]

[url]https://github.com/apache/nutch/blob/branch-1.7/src/java/org/apache/nutch/protocol/Content.java[/url]

[url]https://github.com/apache/nutch/blob/branch-1.7/src/java/org/apache/nutch/parse/ParseData.java[/url]

[url]https://github.com/apache/nutch/blob/branch-1.7/src/java/org/apache/nutch/parse/ParseText.java[/url]

你可能感兴趣的:(nutch)