pyspider 下载安装使用

pyspider 是一个用python实现的功能强大的网络爬虫系统,能在浏览器界面上进行脚本的编写,功能的调度和爬取结果的实时查看,后端使用常用的数据库进行爬取结果的存储,还能定时设置任务与任务优先级等。
首先附上:github地址

https://github.com/binux/pyspider

下载安装:
1、安装python或者anaconda(在添加环境变量,包括安装路径、Scripts路径等,)
2、在dos下运行pip install pyspider 安装pyspider环境
3、在dos中运行 pip install pycurl 安装pycrul环境
4、在dos中运行pyspider all启动pyspider
5、在浏览器中访问http://localhost:5000/,访问成功则表示pyspider安装成功。

使用:
使用CSS选择器

问题1:

[I 180629 07:08:29 result_worker:49] result_worker starting...
[I 180629 07:08:31 processor:211] processor starting...
[I 180629 07:08:31 tornado_fetcher:638] fetcher starting...
[I 180629 07:08:31 scheduler:675] scheduler starting...
[I 180629 07:08:31 scheduler:614] in 5m: new:0,success:0,retry:0,failed:0
[I 180629 07:08:31 scheduler:810] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 180629 07:08:32 app:84] webui exiting...
Traceback (most recent call last):
  File "/root/.pyenv/versions/3.6.5/bin/pyspider", line 11, in 
    load_entry_point('pyspider==0.3.10', 'console_scripts', 'pyspider')()
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 754, in main
    cli()
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 497, in all
    ctx.invoke(webui, **webui_config)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/run.py", line 384, in webui
    app.run(host=host, port=port)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/app.py", line 59, in run
    from .webdav import dav_app
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider-0.3.10-py3.6.egg/pyspider/webui/webdav.py", line 216, in 
    dav_app = WsgiDAVApp(config)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 122, in __init__
    _check_config(config)
  File "/root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/WsgiDAV-3.0.0a2-py3.6.egg/wsgidav/wsgidav_app.py", line 104, in _check_config
    raise ValueError("Invalid configuration:\n  - " + "\n  - ".join(errors))
ValueError: Invalid configuration:
  - Deprecated option 'dir_browser.enable': use 'middleware_stack' instead.
  - Deprecated option 'domaincontroller': use 'domain_controller' instead.

解决办法:
这个错误是由于wsgidav发布的3.x版本,暂时先换成2.x的版本就可以了。在命令行卸载再安装就可以了,命令如下:

pip uninstall wsgidav  
python -m pip install wsgidav==2.4.1

问题2:
HTTP 599: Could not resolve host: START_URL
解决:curl 不支持win7等,建议实验win8.1以上或linux系统

参考网页:

[1] https://segmentfault.com/q/1010000015429020
[2] https://blog.csdn.net/huoxingdeshidai6/article/details/80770744
[3] https://segmentfault.com/a/1190000015563531
[4] https://binux.blog/2015/01/pyspider-tutorial-level-1-html-and-css-selector/
[5] http://docs.pyspider.org/en/latest/
[6] https://www.cnblogs.com/leviatan/p/9428186.html

你可能感兴趣的:(pyspider 下载安装使用)