Python爬虫问题汇总(持续更新)

@分布式爬虫的slave端找不到scrapy_redis:

  • 运行slave端时使用:sudo scrapy crawl spidername,或sudo scrapy runspider mycrawler_redis.py,总之sudo一下;
  • 没sudo居然报找不到模块…没道理,蛋疼啊;

@分布式爬虫尝试连接远程redis被拒:

  • 报错:redis.exceptions.ResponseError: DENIED Redis is running in protected mode…:
  • 解决:https://www.cnblogs.com/nzbbody/p/6389619.html

@爬虫报连接丢失错误

  • 报错:twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.
  • 被反爬了,要配置请求头或IP代理

@ubuntu16下安装chrome浏览器:

  • http://www.linuxidc.com/Linux/2016-05/131096.htm

@安装chromedriver和phantomjs:

  • https://www.cnblogs.com/Lin-Yi/p/7658001.html
  • chrome支持无头模式以后,phantomjs已然过时,不太有学的必要了

@chromedriver的版本与chrome版本要注意匹配,否则会报非法上下文错误(Runtime.executionContextCreated has invalid ‘context’):

  • http://blog.csdn.net/c08762/article/details/70339587

你可能感兴趣的:(Python,爬虫程序开发,有趣的Python之旅,Python爬虫)