nutch1.3与solr3.4集成部署在eclipse上成功
在eclipse上运行参数是:
crawl urls -solr http://localhost:8080/l-nutch-solr -depth 3 -topN 10
运行时输出日志:
crawl started in: crawl-20111107123624 rootUrlDir = urls threads = 10 depth = 3 solrUrl=http://localhost:8080/solr/ topN = 10 Injector: starting at 2011-11-07 12:36:25 Injector: crawlDb: crawl-20111107123624/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: finished at 2011-11-07 12:36:30, elapsed: 00:00:05 Generator: starting at 2011-11-07 12:36:30 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 10 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls for politeness. Generator: segment: crawl-20111107123624/segments/20111107123633 Generator: finished at 2011-11-07 12:36:35, elapsed: 00:00:04 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting at 2011-11-07 12:36:35 Fetcher: segment: crawl-20111107123624/segments/20111107123633 Fetcher: threads: 10 QueueFeeder finished: total 1 records + hit by time limit :0 fetching http://www.amazon.cn/ -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=6 -finishing thread FetcherThread, activeThreads=5 -finishing thread FetcherThread, activeThreads=4 -finishing thread FetcherThread, activeThreads=3 -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=2 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: finished at 2011-11-07 12:36:39, elapsed: 00:00:04 ParseSegment: starting at 2011-11-07 12:36:39 ParseSegment: segment: crawl-20111107123624/segments/20111107123633 ParseSegment: finished at 2011-11-07 12:36:42, elapsed: 00:00:02 CrawlDb update: starting at 2011-11-07 12:36:42 CrawlDb update: db: crawl-20111107123624/crawldb CrawlDb update: segments: [crawl-20111107123624/segments/20111107123633] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2011-11-07 12:36:44, elapsed: 00:00:01 Generator: starting at 2011-11-07 12:36:44 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 10 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls for politeness. Generator: segment: crawl-20111107123624/segments/20111107123646 Generator: finished at 2011-11-07 12:36:48, elapsed: 00:00:04 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting at 2011-11-07 12:36:48 Fetcher: segment: crawl-20111107123624/segments/20111107123646 Fetcher: threads: 10 QueueFeeder finished: total 10 records + hit by time limit :0 fetching http://www.amazon.cn/%E4%B8%89%E6%98%9FS5838-3G%E6%89%8B%E6%9C%BA/dp/B005KP4AFG?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9 fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005OPL41A?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 fetching http://www.amazon.cn/b?ie=UTF8&node=79553071 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 fetching http://www.amazon.cn/%E5%B0%8F%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=814224051 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=6 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6 fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-IdeaPad-Y470N-%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91/dp/B005LT2VIE?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5 fetching http://www.amazon.cn/ThinkPad-E40-0579-A22-14-0%E8%8B%B1%E5%AF%B8%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91-%E9%80%81%E5%8E%9F%E8%A3%85%E5%8C%85/dp/B005LFRMVY?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640644496 now = 1320640639907 0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640644496 now = 1320640640909 0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640644496 now = 1320640641910 0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640644496 now = 1320640642911 0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640644496 now = 1320640643912 0. http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 fetching http://www.amazon.cn/%E5%A4%A7%E5%AE%B6%E7%94%B5/b?ie=UTF8&node=80207071 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 1 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640644496 now = 1320640644913 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640650546 now = 1320640645914 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640650546 now = 1320640646915 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640650546 now = 1320640647916 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640650546 now = 1320640648918 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640650546 now = 1320640649919 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMVOK?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640655698 now = 1320640650919 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640655698 now = 1320640651921 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640655698 now = 1320640652923 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640655698 now = 1320640653924 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640655698 now = 1320640654925 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-Lepad-A1%E5%B9%B3%E6%9D%BF%E7%94%B5%E8%84%91/dp/B005PSMV54?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640660855 now = 1320640655926 0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640660855 now = 1320640656927 0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640660855 now = 1320640657928 0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640660855 now = 1320640658929 0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640660855 now = 1320640659930 0. http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 fetching http://www.amazon.cn/%E7%94%B5%E8%A7%86-%E9%9F%B3%E5%93%8D/b?ie=UTF8&node=874259051 -finishing thread FetcherThread, activeThreads=9 -finishing thread FetcherThread, activeThreads=8 -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=6 -finishing thread FetcherThread, activeThreads=5 -finishing thread FetcherThread, activeThreads=4 -finishing thread FetcherThread, activeThreads=3 -finishing thread FetcherThread, activeThreads=2 -finishing thread FetcherThread, activeThreads=1 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: finished at 2011-11-07 12:37:43, elapsed: 00:00:55 ParseSegment: starting at 2011-11-07 12:37:43 ParseSegment: segment: crawl-20111107123624/segments/20111107123646 ParseSegment: finished at 2011-11-07 12:37:45, elapsed: 00:00:01 CrawlDb update: starting at 2011-11-07 12:37:45 CrawlDb update: db: crawl-20111107123624/crawldb CrawlDb update: segments: [crawl-20111107123624/segments/20111107123646] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2011-11-07 12:37:47, elapsed: 00:00:01 Generator: starting at 2011-11-07 12:37:47 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 10 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls for politeness. Generator: segment: crawl-20111107123624/segments/20111107123749 Generator: finished at 2011-11-07 12:37:51, elapsed: 00:00:04 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting at 2011-11-07 12:37:51 Fetcher: segment: crawl-20111107123624/segments/20111107123749 Fetcher: threads: 10 QueueFeeder finished: total 10 records + hit by time limit :0 fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-THINK%E9%BB%91/dp/B005GZ0I5G?_encoding=UTF8&s=electronics fetching http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8 fetching http://www.amazon.cn/%E8%81%94%E6%83%B3-P90W-WCDMA-%E6%95%B0%E5%AD%97%E7%A7%BB%E5%8A%A8%E7%94%B5%E8%AF%9D%E6%9C%BA-%E7%84%89%E7%B2%89/dp/B005GZ0IC4?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7 fetching http://www.amazon.cn/gp/yourstore/home fetching http://www.amazon.cn/gp/css/homepage.html fetching http://www.amazon.cn/%E6%89%8B%E8%A1%A8-%E6%97%B6%E9%92%9F/b?ie=UTF8&node=1953164051 -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 1 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640683363 now = 1320640684037 0. http://www.amazon.cn/gp/registry/wishlist 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640689186 now = 1320640685037 0. http://www.amazon.cn/gp/registry/wishlist 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640689186 now = 1320640686039 0. http://www.amazon.cn/gp/registry/wishlist 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640689186 now = 1320640687043 0. http://www.amazon.cn/gp/registry/wishlist 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640689186 now = 1320640688044 0. http://www.amazon.cn/gp/registry/wishlist 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640689186 now = 1320640689045 0. http://www.amazon.cn/gp/registry/wishlist 1. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 3. http://www.amazon.cn/gp/help/customer/display.html fetching http://www.amazon.cn/gp/registry/wishlist -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 1 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640689186 now = 1320640690047 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640695079 now = 1320640691048 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640695079 now = 1320640692049 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640695079 now = 1320640693049 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640695079 now = 1320640694051 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640695079 now = 1320640695053 0. http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 2. http://www.amazon.cn/gp/help/customer/display.html fetching http://www.amazon.cn/Lenovo-%E8%81%94%E6%83%B3-A60-WCDMA-GSM-3G%E6%89%8B%E6%9C%BA/dp/B005GZ0IZG?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640700231 now = 1320640696053 0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640700231 now = 1320640697054 0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640700231 now = 1320640698056 0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640700231 now = 1320640699057 0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640700231 now = 1320640700058 0. http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics 1. http://www.amazon.cn/gp/help/customer/display.html fetching http://www.amazon.cn/%E8%AF%BA%E5%9F%BA%E4%BA%9AN9-%E5%85%A8%E8%A7%A6%E5%B1%8F3G%E6%99%BA%E8%83%BD%E6%89%8B%E6%9C%BA-%E5%85%A8%E6%96%B0MEEGO%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F/dp/B005MWLXU2?_encoding=UTF8&s=electronics -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640705384 now = 1320640701058 0. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640705384 now = 1320640702060 0. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640705384 now = 1320640703060 0. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640705384 now = 1320640704061 0. http://www.amazon.cn/gp/help/customer/display.html -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1 * queue: http://www.amazon.cn maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1320640705384 now = 1320640705063 0. http://www.amazon.cn/gp/help/customer/display.html fetching http://www.amazon.cn/gp/help/customer/display.html -finishing thread FetcherThread, activeThreads=8 -finishing thread FetcherThread, activeThreads=8 -finishing thread FetcherThread, activeThreads=7 -finishing thread FetcherThread, activeThreads=6 -finishing thread FetcherThread, activeThreads=5 -finishing thread FetcherThread, activeThreads=4 -finishing thread FetcherThread, activeThreads=3 -finishing thread FetcherThread, activeThreads=2 -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: finished at 2011-11-07 12:38:26, elapsed: 00:00:35 ParseSegment: starting at 2011-11-07 12:38:26 ParseSegment: segment: crawl-20111107123624/segments/20111107123749 Error parsing: http://g-ec4.images-amazon.com/images/G/28/x-locale/common/transparent-pixel._V192562247_.gif: failed(2,0): Can't retrieve Tika parser for mime-type image/gif ParseSegment: finished at 2011-11-07 12:38:28, elapsed: 00:00:01 CrawlDb update: starting at 2011-11-07 12:38:28 CrawlDb update: db: crawl-20111107123624/crawldb CrawlDb update: segments: [crawl-20111107123624/segments/20111107123749] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2011-11-07 12:38:30, elapsed: 00:00:01 LinkDb: starting at 2011-11-07 12:38:30 LinkDb: linkdb: crawl-20111107123624/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123633 LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123646 LinkDb: adding segment: file:/E:/Workspaces/workspace1/L-nutch/crawl-20111107123624/segments/20111107123749 LinkDb: finished at 2011-11-07 12:38:32, elapsed: 00:00:01 SolrIndexer: starting at 2011-11-07 12:38:32 SolrIndexer: finished at 2011-11-07 12:38:37, elapsed: 00:00:05 SolrDeleteDuplicates: starting at 2011-11-07 12:38:37 SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/ SolrDeleteDuplicates: finished at 2011-11-07 12:38:39, elapsed: 00:00:01 crawl finished: crawl-20111107123624
抓取数据模型
1. CrawlDB,用于存储所有的urls信息,包括抓取机制,抓取状态,网页指纹和元数据。
2. LinkDB,存储每一个url的连入锚链接和锚文本
3. Segment,原始的网页内容;解析后的网页;元数据;外链接;用于索引的元文本
参考:http://blog.csdn.net/amuseme_lu/article/details/5993916