Nutch一次爬取运行结果

test@admin:~/programs/nutch-1.2$ ./bin/nutch  crawl urls/seed.txt -dir localweb -depth 5 -threads 4 -topN 50
crawl started in: localweb
rootUrlDir = urls/seed.txt
threads = 4
depth = 5
indexer=lucene
topN = 50
Injector: starting at 2014-05-23 14:20:11
Injector: crawlDb: localweb/crawldb
Injector: urlDir: urls/seed.txt
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2014-05-23 14:20:15, elapsed: 00:00:03
Generator: starting at 2014-05-23 14:20:15
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: localweb/segments/20140523142018
Generator: finished at 2014-05-23 14:20:19, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-05-23 14:20:19
Fetcher: segment: localweb/segments/20140523142018
Fetcher: threads: 4
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://www.cafepress.com/nutch/
fetching http://www.baidu.com/
fetching http://www.apache.org/dist/nutch/2.2.1/CHANGES-2.2.1.txt
fetching http://xw.qq.com/simple/s/finance/index.htm
fetching http://issues.apache.org/jira/browse/NUTCH
fetch of http://www.cafepress.com/nutch/ failed with: java.net.SocketException: Connection reset
fetching http://forrest.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=44
fetching http://house60.3g.qq.com/g/welcome.jsp
fetching http://www.oschina.net/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=42
fetching http://www.elasticsearch.org/
fetch of http://www.oschina.net/ failed with: Http code=403, url=http://www.oschina.net/
fetching http://xw.qq.com/a/ent/20140418023432/ENT20140418023432AE
fetching http://mat1.gtimg.com/
fetching http://wiki.apache.org/nutch/
fetching http://nutch.apache.org/credits.html
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=37
fetching http://xw.qq.com/simple/s/house/index.htm
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=36
fetching http://search.maven.org/
fetching http://code.google.com/p/crawler-commons/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=34
fetching http://s.apache.org/oHY
fetching http://xw.qq.com/simple/s/games/index.htm
fetching http://find.searchhub.org/p:nutch
fetching http://nutch.apache.org/skin/fontsize.js
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=30
fetching http://qgame.3g.qq.com/
Error parsing: http://nutch.apache.org/skin/fontsize.js: failed(2,0): Can't retrieve Tika parser for mime-type application/javascript
fetching http://www.apache.org/dyn/closer.cgi/nutch/
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=28
fetching http://xw.qq.com/simple/s/fashion/index.htm
fetching http://nutch.apache.org/nightly.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=26
fetching http://s.apache.org/PGa
fetching http://xw.qq.com/a/ent/20140523003834/ENT2014052300383403
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=24
fetching http://nutch.apache.org/old_downloads.html
fetching http://xw.qq.com/a/ent/20131128004407/ENT201311280044077O
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=22
fetching http://nutch.apache.org/skin/breadcrumbs.js
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=21
fetching http://xw.qq.com/a/ent/20140523001154/ENT2014052300115404
fetching http://www.apache.org/security/
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=19
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=19
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=19
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=19
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=19
fetching http://www.apache.org/foundation/thanks.html
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
fetch of http://nutch.apache.org/skin/breadcrumbs.js failed with: java.net.SocketTimeoutException: connect timed out
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=18
fetching http://www.apache.org/foundation/sponsorship.html
fetching http://nutch.apache.org/menu_1.3
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=16
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=16
fetching http://nutch.apache.org/skin/getMenu.js
Error parsing: http://nutch.apache.org/skin/getMenu.js: failed(2,0): Can't retrieve Tika parser for mime-type application/javascript
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=15
fetching http://nutch.apache.org/index.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=14
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=14
fetching http://www.apache.org/dist/nutch/1.8/CHANGES.txt
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=13
fetching http://nutch.apache.org/apidocs-1.8/index.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=12
fetching http://nutch.apache.org/sonar.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=11
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=11
fetching http://nutch.apache.org/apidocs-2.2.1/index.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=10
fetching http://nutch.apache.org/bot.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=9
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=9
fetching http://nutch.apache.org/issue_tracking.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=8
fetching http://nutch.apache.org/faq.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=7
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=7
fetching http://nutch.apache.org/version_control.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=6
fetching http://nutch.apache.org/menu_1.4
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=5
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=5
fetching http://nutch.apache.org/tutorial.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=4
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826062347
  now           = 1400826061417
  0. http://nutch.apache.org/menu_1.2
  1. http://nutch.apache.org/skin/getBlank.js
  2. http://nutch.apache.org/index.pdf
  3. http://nutch.apache.org/menu_selected_1.1
fetching http://nutch.apache.org/menu_1.2
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826062347
  now           = 1400826062418
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/menu_selected_1.1
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826063932
  now           = 1400826063420
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/menu_selected_1.1
fetching http://nutch.apache.org/skin/getBlank.js
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826063932
  now           = 1400826064425
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/menu_selected_1.1
Error parsing: http://nutch.apache.org/skin/getBlank.js: failed(2,0): Can't retrieve Tika parser for mime-type application/javascript
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826065480
  now           = 1400826065427
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/menu_selected_1.1
fetching http://nutch.apache.org/index.pdf
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826065480
  now           = 1400826066430
  0. http://nutch.apache.org/menu_selected_1.1
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826065480
  now           = 1400826067431
  0. http://nutch.apache.org/menu_selected_1.1
Error parsing: http://nutch.apache.org/index.pdf: failed(2,0): expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream@65d43d1e
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826068954
  now           = 1400826068433
  0. http://nutch.apache.org/menu_selected_1.1
fetching http://nutch.apache.org/menu_selected_1.1
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-05-23 14:21:12, elapsed: 00:00:52
CrawlDb update: starting at 2014-05-23 14:21:12
CrawlDb update: db: localweb/crawldb
CrawlDb update: segments: [localweb/segments/20140523142018]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-05-23 14:21:14, elapsed: 00:00:02
Generator: starting at 2014-05-23 14:21:14
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: localweb/segments/20140523142116
Generator: finished at 2014-05-23 14:21:18, elapsed: 00:00:03
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-05-23 14:21:18
Fetcher: segment: localweb/segments/20140523142116
Fetcher: threads: 4
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://isdspeed.qq.com/
fetching http://nutch.apache.org/apidocs-2.2.1/overview-summary.html
fetching http://pnewsapp.tc.qq.com/newsapp_ls/0/17978428_150110/0
fetching http://pingjs.qq.com/ping.js
fetching http://lucene.apache.org/hadoop
fetching http://m.v.qq.com/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=44
fetching http://xw.qq.com/a/ent/20140523013234/ENT2014052301323407
fetching http://imgcache.qq.com/
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=42
fetching http://nutch.apache.org/apidocs-1.8/overview-summary.html
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=41
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=41
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=41
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=41
fetching http://xw.qq.com/a/fashion/20140523005888/FAS2014052300588802
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=40
fetching http://xw.qq.com/a/finance/20140510015766/FIN2014051001576635
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=39
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=39
fetching http://xw.qq.com/a/fashion/20140523018320/FAS2014052301832001
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=38
fetching http://xw.qq.com/a/news/20140523016639/NEW2014052301663902
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=37
fetching http://xw.qq.com/a/house/20140523005159/HOS2014052300515906
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=36
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=36
fetching http://xw.qq.com/m/house/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=35
fetching http://xw.qq.com/a/auto/20140522044474/AUT2014052204447404
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=34
fetching http://xw.qq.com/a/digi_tech/20140523022831/DIG2014052302283101
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=33
fetching http://xw.qq.com/m/photo/
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=32
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=32
fetching http://xw.qq.com/a/finance/20140516032679/FIN201405160326790Q
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=31
fetching http://xw.qq.com/m/auto/
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=30
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=30
fetching http://xw.qq.com/a/auto/20140523006942/AUT2014052300694202
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=29
fetching http://xw.qq.com/m/news/
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=28
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=28
fetching http://xw.qq.com/a/auto/20140523007567/AUT2014052300756702
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=27
fetching http://xw.qq.com/a/digi_tech/20140523013949/DIG2014052301394901
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=26
fetching http://xw.qq.com/a/finance/20140523009913/FIN201405230099130A
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=25
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=25
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=25
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=25
fetching http://xw.qq.com/c/ent/20140523003834/ENT2014052300383403
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=24
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=24
fetching http://xw.qq.com/a/news/20140523022774/NEW201405230227740H
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=23
fetching http://xw.qq.com/m/digi/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=22
fetching http://xw.qq.com/a/news/20140523007240/MIL2014052300724002
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=21
fetching http://xw.qq.com/c/news/20140523022986/NEW2014052302298605
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=20
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=20
fetching http://xw.qq.com/a/finance/20140521060120/FIN201405210601200T
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=19
fetching http://xw.qq.com/a/news/20140523005827/MIL2014052300582702
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=18
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=18
fetching http://xw.qq.com/a/house/20140523005305/HOS2014052300530501
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=17
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=17
fetching http://xw.qq.com/a/news/20140523024720/NEW2014052302472004
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=16
fetching http://xw.qq.com/a/news/20140523008280/MIL2014052300828002
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=15
fetching http://xw.qq.com/c/zt/201312100083161/SPO201312100083161R
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=14
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=14
fetching http://xw.qq.com/m/fashion/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=13
fetching http://xw.qq.com/a/news/20140523017079/MIL2014052301707904
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=12
fetching http://xw.qq.com/a/digi_tech/20140521009545/DIG201405210095451E
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=11
fetching http://xw.qq.com/a/auto/20140523007267/AUT2014052300726702
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=10
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=10
fetching http://xw.qq.com/a/finance/20140523008356/FIN2014052300835605
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=9
fetching http://xw.qq.com/m/astro/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=8
fetching http://xw.qq.com/a/house/20140523014070/HOS2014052301407005
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=7
fetching http://xw.qq.com/a/news/20140523022986/NEW2014052302298605
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=6
fetching http://xw.qq.com/m/games/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=5
fetching http://xw.qq.com/a/digi_tech/20140523019030/DIG2014052301903001
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=4
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826148455
  now           = 1400826147670
  0. http://xw.qq.com/a/house/20140523005188/HOS2014052300518801
  1. http://xw.qq.com/m/finance/
  2. http://xw.qq.com/m/tech/
  3. http://xw.qq.com/a/news/20140523017822/MIL2014052301782203
fetching http://xw.qq.com/a/house/20140523005188/HOS2014052300518801
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=3
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826148455
  now           = 1400826148672
  0. http://xw.qq.com/m/finance/
  1. http://xw.qq.com/m/tech/
  2. http://xw.qq.com/a/news/20140523017822/MIL2014052301782203
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=3
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826149689
  now           = 1400826149675
  0. http://xw.qq.com/m/finance/
  1. http://xw.qq.com/m/tech/
  2. http://xw.qq.com/a/news/20140523017822/MIL2014052301782203
fetching http://xw.qq.com/m/finance/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=2
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826150958
  now           = 1400826150677
  0. http://xw.qq.com/m/tech/
  1. http://xw.qq.com/a/news/20140523017822/MIL2014052301782203
fetching http://xw.qq.com/m/tech/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=1
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826152184
  now           = 1400826151680
  0. http://xw.qq.com/a/news/20140523017822/MIL2014052301782203
fetching http://xw.qq.com/a/news/20140523017822/MIL2014052301782203
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-05-23 14:22:34, elapsed: 00:01:16
CrawlDb update: starting at 2014-05-23 14:22:34
CrawlDb update: db: localweb/crawldb
CrawlDb update: segments: [localweb/segments/20140523142116]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-05-23 14:22:36, elapsed: 00:00:02
Generator: starting at 2014-05-23 14:22:36
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: localweb/segments/20140523142238
Generator: finished at 2014-05-23 14:22:39, elapsed: 00:00:03
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-05-23 14:22:39
Fetcher: segment: localweb/segments/20140523142238
Fetcher: threads: 4
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://btrace.qq.com/
fetching http://pingfore.qq.com/
fetching http://www.xw.qq.com/
fetching http://qt.gtimg.cn/
fetch of http://pingfore.qq.com/ failed with: java.io.EOFException
fetching http://jsqmt.qq.com/
fetch of http://btrace.qq.com/ failed with: Http code=403, url=http://btrace.qq.com/
fetching http://pnewsapp.tc.qq.com/newsapp_ls/0/17977385_640330/0
fetching http://sns.video.qq.com/
fetching http://xw.qq.com/a/sports/20131218015367/SPO20131218015367SL
fetching http://i.gtimg.cn/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=41
fetch of http://www.xw.qq.com/ failed with: Http code=500, url=http://www.xw.qq.com/
fetching http://openapi.inews.qq.com/
fetching http://vliveachy.tc.qq.com/
fetching http://jqmt.qq.com/
fetch of http://vliveachy.tc.qq.com/ failed with: Http code=405, url=http://vliveachy.tc.qq.com/
fetching http://mat1.gtimg.com/www/mobi/js/simple.article.js
fetching http://trace.qq.com/
fetching http://tajs.qq.com/stats
fetching http://ui.ptlogin2.qq.com/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=34
fetching http://vv.video.qq.com/
fetching http://live.qq.com/
fetching http://img.gtimg.cn/
fetch of http://img.gtimg.cn/ failed with: Http code=403, url=http://img.gtimg.cn/
fetching http://xw.qq.com/a/sports/20140522050024/SPO2014052205002408
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=30
fetching http://api.t.qq.com/
fetching http://vpic.video.qq.com/
fetching http://check.ptlogin2.qq.com/
fetch of http://check.ptlogin2.qq.com/ failed with: Http code=403, url=http://check.ptlogin2.qq.com/
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=27
fetching http://xw.qq.com/a/sports/20140523020282/SPO2014052302028205
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=26
fetching http://xw.qq.com/c/sports/20140522048392/SPO2014052204839205
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=25
fetching http://xw.qq.com/a/news/20140523008556/NEW2014052300855606
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=24
fetching http://xw.qq.com/c/sports/20140523023486/SPO2014052302348601
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=23
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=23
fetching http://xw.qq.com/a/news/20140523024872/NEW2014052302487201
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=22
fetching http://xw.qq.com/c/ent/20140522045677/ENT2014052204567701
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=21
fetching http://pnewsapp.tc.qq.com/newsapp_ls/0/17976606_640330/0
fetching http://xw.qq.com/c/ent/20140523001109/ENT2014052300110906
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=19
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=19
fetching http://xw.qq.com/a/news/20140523007960/NEW2014052300796006
fetch of http://vpic.video.qq.com/ failed with: java.net.SocketTimeoutException: Read timed out
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=18
fetching http://xw.qq.com/c/zt/20131218015367/SPO20131218015367SL
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=17
fetching http://xw.qq.com/a/sports/20140523023486/SPO2014052302348601
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=16
fetching http://xw.qq.com/c/sports/20140522050024/SPO2014052205002408
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=15
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=15
fetching http://xw.qq.com/c/zt/201405100152772/SPO201405100152772X
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=14
fetching http://xw.qq.com/simple/s/index/qq.com
-activeThreads=4, spinWaiting=1, fetchQueues.totalSize=13
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=13
fetching http://xw.qq.com/c/sports/20140523009111/SPO2014052300911102
fetch of http://jqmt.qq.com/ failed with: java.net.SocketTimeoutException: Read timed out
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=12
fetching http://xw.qq.com/a/news/20140523024749/NEW2014052302474905
fetching http://pnewsapp.tc.qq.com/
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=10
fetching http://xw.qq.com/a/news/20140523013685/NEW2014052301368505
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=9
fetching http://xw.qq.com/c/ent/20140522048291/ENT2014052204829104
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=8
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=8
fetching http://xw.qq.com/c/ent/20140523000009/ENT2014052300000904
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=7
fetching http://xw.qq.com/c/sports/20140523024606/SPO2014052302460603
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
fetching http://xw.qq.com/a/sports/20131210008316/SPO201312100083161R
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=5
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=5
fetching http://xw.qq.com/c/mil/20140523005827/MIL2014052300582702
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=4
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826191797
  now           = 1400826191299
  0. http://xw.qq.com/a/fashion/20140522044069/FAS201405220440690L
  1. http://xw.qq.com/c/zt/20140418023432/ENT20140418023432AE
  2. http://xw.qq.com/a/fashion/20140507013249/FAS201405070132490V
  3. http://xw.qq.com/c/zt/201311280044077/ENT201311280044077O
fetching http://xw.qq.com/a/fashion/20140522044069/FAS201405220440690L
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=3
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826193100
  now           = 1400826192303
  0. http://xw.qq.com/c/zt/20140418023432/ENT20140418023432AE
  1. http://xw.qq.com/a/fashion/20140507013249/FAS201405070132490V
  2. http://xw.qq.com/c/zt/201311280044077/ENT201311280044077O
fetching http://xw.qq.com/c/zt/20140418023432/ENT20140418023432AE
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=2
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826193100
  now           = 1400826193305
  0. http://xw.qq.com/a/fashion/20140507013249/FAS201405070132490V
  1. http://xw.qq.com/c/zt/201311280044077/ENT201311280044077O
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=2
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826194379
  now           = 1400826194307
  0. http://xw.qq.com/a/fashion/20140507013249/FAS201405070132490V
  1. http://xw.qq.com/c/zt/201311280044077/ENT201311280044077O
fetching http://xw.qq.com/a/fashion/20140507013249/FAS201405070132490V
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=1
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826195692
  now           = 1400826195308
  0. http://xw.qq.com/c/zt/201311280044077/ENT201311280044077O
fetching http://xw.qq.com/c/zt/201311280044077/ENT201311280044077O
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
fetch of http://pnewsapp.tc.qq.com/ failed with: java.net.SocketTimeoutException: Read timed out
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-05-23 14:23:25, elapsed: 00:00:45
CrawlDb update: starting at 2014-05-23 14:23:25
CrawlDb update: db: localweb/crawldb
CrawlDb update: segments: [localweb/segments/20140523142238]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-05-23 14:23:27, elapsed: 00:00:02
Generator: starting at 2014-05-23 14:23:27
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: localweb/segments/20140523142330
Generator: finished at 2014-05-23 14:23:31, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-05-23 14:23:31
Fetcher: segment: localweb/segments/20140523142330
Fetcher: threads: 4
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://www.qq.com/mobile/appios.htm
fetching http://nutch.apache.org/menu_1.1
fetching http://pnewsapp.tc.qq.com/newsapp_ls/0/17954361_150110/0
fetching http://lucene.apache.org/
fetching http://xw.qq.com/c/news/20140523007960/NEW2014052300796006
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=45
fetch of http://nutch.apache.org/menu_1.1 failed with: java.net.SocketException: Connection reset
fetching http://people.apache.org/committer-index.html
fetching http://s.apache.org/1zE
fetching http://v.qq.com/index.html
fetching http://ubook.qq.com/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=41
fetching http://xw.qq.com/iphone/m/finance/finance.htm
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=40
fetching http://xw.qq.com/c/news/20140523024720/NEW2014052302472004
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=39
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=39
fetching http://xw.qq.com/iphone/m/view/52292e0c763fd027c6eba6b8f494d2eb.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=38
fetching http://xw.qq.com/iphone/m/caijingguancha/dabd8d2ce74e782c65a973ef76fd540b.html
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=37
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=37
fetching http://xw.qq.com/iphone/m/tmtdecode/a3fb4fbf9a6f9cf09166aa9c20cbc1ad.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=36
fetching http://xw.qq.com/c/finance/20140523017080/FIN2014052301708006
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=35
fetching http://xw.qq.com/c/finance/20140523012251/FIN2014052301225102
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=34
fetching http://pnewsapp.tc.qq.com/newsapp_ls/0/17965405_150110/0
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=33
fetching http://xw.qq.com/c/yc/820140523/tmtdecode
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=32
fetching http://xw.qq.com/c/finance/20140523008356/FIN2014052300835605
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=31
fetching http://xw.qq.com/c/news/20140523024749/NEW2014052302474905
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=30
fetching http://xw.qq.com/c/zt/2014051001576635/FIN2014051001576635
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=29
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=29
fetching http://xw.qq.com/c/zt/201405210601200/FIN201405210601200T
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=28
fetching http://xw.qq.com/c/news/20140523013685/NEW2014052301368505
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=27
fetching http://xw.qq.com/c/zt/201403030155684/SPO20140303015568J4
-activeThreads=4, spinWaiting=2, fetchQueues.totalSize=26
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=26
fetching http://xw.qq.com/c/yc/520140523/guiquan
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=25
fetching http://xw.qq.com/m/shehui/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=24
fetching http://xw.qq.com/c/zhibo/10000300/FIN201405230099130A
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=23
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=23
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=23
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=23
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=23
fetching http://xw.qq.com/c/news/20140523026482/NEW2014052302648202
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=22
fetching http://xw.qq.com/iphone/m/dy/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=21
fetching http://xw.qq.com/c/ent/20140523001154/ENT2014052300115404
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=20
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=20
fetching http://xw.qq.com/c/yc/220140523/view
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=19
fetching http://xw.qq.com/c/ent/20140523013234/ENT2014052301323407
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=18
fetching http://xw.qq.com/c/sports/20140523020282/SPO2014052302028205
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=17
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=17
fetching http://xw.qq.com/c/finance/20140523009486/FIN2014052300948601
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=16
fetching http://xw.qq.com/c/news/20140523016639/NEW2014052301663902
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=15
fetching http://xw.qq.com/c/sports/20140523006605/SPO2014052300660502
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=14
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=14
fetching http://xw.qq.com/c/zt/201405160326790/FIN201405160326790Q
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=13
fetching http://xw.qq.com/c/news/20140523008556/NEW2014052300855606
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=12
fetching http://xw.qq.com/iphone/m/guiquan/7d771e0e8f3633ab54856925ecdefc5d.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=11
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=11
fetching http://xw.qq.com/c/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=10
fetching http://xw.qq.com/c/finance/20140516032679/FIN201405160326790Q
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=9
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=9
fetching http://xw.qq.com/c/ent/20140523000360/ENT2014052300036003
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=8
fetching http://xw.qq.com/iphone/m/sports/nba/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=7
fetching http://xw.qq.com/m/
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=6
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=6
fetching http://xw.qq.com/c/yc/1120140523/caijingguancha
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=5
fetching http://xw.qq.com/c/ent/20140523007189/ENT2014052300718904
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=4
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826265915
  now           = 1400826265267
  0. http://xw.qq.com/c/news/20140523024872/NEW2014052302487201
  1. http://xw.qq.com/c/news/20140523022774/NEW201405230227740H
  2. http://xw.qq.com/c/news/20140523026425/NEW2014052302642501
  3. http://xw.qq.com/m/shijiebei/
fetching http://xw.qq.com/c/news/20140523024872/NEW2014052302487201
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=3
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826267201
  now           = 1400826266269
  0. http://xw.qq.com/c/news/20140523022774/NEW201405230227740H
  1. http://xw.qq.com/c/news/20140523026425/NEW2014052302642501
  2. http://xw.qq.com/m/shijiebei/
fetching http://xw.qq.com/c/news/20140523022774/NEW201405230227740H
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=2
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826267201
  now           = 1400826267269
  0. http://xw.qq.com/c/news/20140523026425/NEW2014052302642501
  1. http://xw.qq.com/m/shijiebei/
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=2
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826268467
  now           = 1400826268271
  0. http://xw.qq.com/c/news/20140523026425/NEW2014052302642501
  1. http://xw.qq.com/m/shijiebei/
fetching http://xw.qq.com/c/news/20140523026425/NEW2014052302642501
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=1
* queue: http://xw.qq.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826269716
  now           = 1400826269272
  0. http://xw.qq.com/m/shijiebei/
fetching http://xw.qq.com/m/shijiebei/
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-05-23 14:24:32, elapsed: 00:01:00
CrawlDb update: starting at 2014-05-23 14:24:32
CrawlDb update: db: localweb/crawldb
CrawlDb update: segments: [localweb/segments/20140523142330]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-05-23 14:24:34, elapsed: 00:00:02
Generator: starting at 2014-05-23 14:24:34
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: localweb/segments/20140523142436
Generator: finished at 2014-05-23 14:24:37, elapsed: 00:00:03
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting at 2014-05-23 14:24:37
Fetcher: segment: localweb/segments/20140523142436
Fetcher: threads: 4
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://search-lucene.com/nutch
fetching http://lenya.apache.org/
fetching http://gump.apache.org/
fetching http://zookeeper.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=46
fetching http://repo1.maven.org/maven2/org/apache/gora/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=45
fetching http://gora.apache.org/downloads.html
fetching http://cassandra.apache.org/
fetching http://hadoop.apache.org/releases.html
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=42
fetching http://www.isi.edu/~koehn/europarl/
fetching http://lucene.apache.org/solr/books.html
fetching http://hive.apache.org/
fetching http://openmeetings.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=38
fetching http://s.apache.org/LPB
fetching http://giraph.apache.org/
fetching http://projects.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=35
fetching http://mahout.apache.org/
fetching http://www.eu.apachecon.com/c/aceu2009/
fetch of http://www.eu.apachecon.com/c/aceu2009/ failed with: java.net.UnknownHostException: www.eu.apachecon.com
fetching http://accumulo.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=32
fetching http://avro.apache.org/
fetching http://www.apache.org/licenses/LICENSE-2.0
fetching http://sqt.gtimg.cn/
fetching http://people.apache.org/committers-by-project.html
fetching http://imgcache.qq.com/ihome.qzone.qq.com
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=27
fetching http://incubator.apache.org/
fetching http://i.gtimg.cn/ihome.qzone.qq.com
fetching http://hbase.apache.org/
fetching http://ace.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=23
fetching http://mat1.gtimg.com/www/mobi/js/vote.v1.0.js
fetching http://cocoon.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=21
fetching http://wiki.apache.org/nutch/FrontPage
fetching http://nutch.apache.org/apidocs-2.2.1/allclasses-frame.html
fetching http://abdera.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=18
fetching http://tika.apache.org/download.html
fetching http://mat1.gtimg.com/www/mobi/js/template.min.js
fetching http://pig.apache.org/
fetching http://url.cn/JVoGlc
fetching http://activemq.apache.org/
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=13
fetching http://ant.apache.org/
fetching http://www.apache.org/dyn/closer.cgi
fetching http://mat1.gtimg.com/www/mobi/js/zepto.min.js
-activeThreads=4, spinWaiting=0, fetchQueues.totalSize=10
fetching http://nutch.apache.org/apidocs-1.8/overview-frame.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=9
fetching http://mat1.gtimg.com/www/mobi/js/article.v2.js
fetching http://nutch.apache.org/apidocs-2.2.1/overview-frame.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=7
fetching http://mat1.gtimg.com/www/mobi/js/QMobiFoot.js
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=6
fetching http://nutch.apache.org/apidocs-1.8/allclasses-frame.html
fetching http://mat1.gtimg.com/www/mobi/js/touch.min.js
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=4
* queue: http://www.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 4000
  minCrawlDelay = 0
  nextFetchTime = 1400826294822
  now           = 1400826294043
  0. http://www.apache.org/foundation/
  1. http://www.apache.org/licenses/LICENSE-2.0.html
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826293413
  now           = 1400826294045
  0. http://nutch.apache.org/menu_selected_1.2
  1. http://nutch.apache.org/menu_selected_1.3
fetching http://www.apache.org/foundation/
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=3
* queue: http://www.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 4000
  minCrawlDelay = 0
  nextFetchTime = 1400826294822
  now           = 1400826295046
  0. http://www.apache.org/licenses/LICENSE-2.0.html
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826295239
  now           = 1400826295047
  0. http://nutch.apache.org/menu_selected_1.2
  1. http://nutch.apache.org/menu_selected_1.3
fetching http://nutch.apache.org/menu_selected_1.2
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=2
* queue: http://www.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 4000
  minCrawlDelay = 0
  nextFetchTime = 1400826299973
  now           = 1400826296050
  0. http://www.apache.org/licenses/LICENSE-2.0.html
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 1000
  minCrawlDelay = 0
  nextFetchTime = 1400826296808
  now           = 1400826296051
  0. http://nutch.apache.org/menu_selected_1.3
fetching http://nutch.apache.org/menu_selected_1.3
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=1
* queue: http://www.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 4000
  minCrawlDelay = 0
  nextFetchTime = 1400826299973
  now           = 1400826297052
  0. http://www.apache.org/licenses/LICENSE-2.0.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=1
* queue: http://www.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 4000
  minCrawlDelay = 0
  nextFetchTime = 1400826299973
  now           = 1400826298054
  0. http://www.apache.org/licenses/LICENSE-2.0.html
-activeThreads=4, spinWaiting=4, fetchQueues.totalSize=1
* queue: http://www.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 4000
  minCrawlDelay = 0
  nextFetchTime = 1400826299973
  now           = 1400826299055
  0. http://www.apache.org/licenses/LICENSE-2.0.html
fetching http://www.apache.org/licenses/LICENSE-2.0.html
-activeThreads=4, spinWaiting=3, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2014-05-23 14:25:02, elapsed: 00:00:25
CrawlDb update: starting at 2014-05-23 14:25:02
CrawlDb update: db: localweb/crawldb
CrawlDb update: segments: [localweb/segments/20140523142436]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2014-05-23 14:25:06, elapsed: 00:00:03
LinkDb: starting at 2014-05-23 14:25:06
LinkDb: linkdb: localweb/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523135555
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523140148
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523140159
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523135939
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523142330
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523135546
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523142436
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523140007
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523142116
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523135609
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523140136
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523142238
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523135953
LinkDb: adding segment: file:/home/test/programs/nutch-1.2/localweb/segments/20140523142018
LinkDb: merging with existing linkdb: localweb/linkdb
LinkDb: finished at 2014-05-23 14:25:12, elapsed: 00:00:06
Deleting old indexes: localweb/indexes
Deleting old merged index: localweb/index
Indexer: starting at 2014-05-23 14:25:12
Indexer: finished at 2014-05-23 14:25:25, elapsed: 00:00:12
Dedup: starting at 2014-05-23 14:25:25
Dedup: adding indexes in: localweb/indexes
Dedup: finished at 2014-05-23 14:25:28, elapsed: 00:00:03
IndexMerger: starting at 2014-05-23 14:25:28
IndexMerger: merging indexes to: localweb/index
Adding file:/home/test/programs/nutch-1.2/localweb/indexes/part-00000
IndexMerger: finished at 2014-05-23 14:25:30, elapsed: 00:00:01
crawl finished: localweb

你可能感兴趣的:(nutch)