Linux下安装pyspider 系统版本为centos7 【总结版】

 

国庆节的现在重新租了个阿里云服务器,不得不装个pyspider用于爬虫,但是安装却没那么顺利了。这里把安装过程记录一下,以及一些error 的解决方法。

 

【1】首先确保系统里面装了pip ,没有的话可以自己百度详细信息,这里只贴出我安装时的指令:

  

       wget https://pypi.python.org/packages/source/p/pip/pip-7.1.2.tar.gz#md5=3823d2343d9f3aaab21cf9c917710196
       tar -xvf pip-7.1.2.tar.gz
       cd pip-7.1.2
       python setup.py install

 

【2】安装好了后就可以直接安装pyspider了。输入指令: pip install pyspider

       结果报错!下面分别对遇到的每个报错信息做记录:

(1)错误一,pip 的使用有问题,以及安装flask出错。如下:

[root@iZ28jyxu47dZ fancy]# pip install pyspider
Collecting pyspider
/usr/lib/python2.7/site-packages/pip-7.1.2-py2.7.egg/pip/_vendor/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading pyspider-0.3.5.tar.gz (94kB)
    100% |████████████████████████████████| 98kB 41kB/s
Collecting Flask>=0.10 (from pyspider)
  Downloading Flask-0.10.1.tar.gz (544kB)
    15% |████▉                           | 81kB 250bytes/s eta 0:30:44
  Hash of the package https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz#md5=378670fe456957eb3c27ddaef60b2b24 (fromhttps://pypi.python.org/simple/flask/) (e11c5569eb68d582ce1c85154b9b48c9) doesn't match the expected hash 378670fe456957eb3c27ddaef60b2b24!
Bad md5 hash for package https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz#md5=378670fe456957eb3c27ddaef60b2b24 (fromhttps://pypi.python.org/simple/flask/)

出错原因是urllib3的ssl连接失败。解决办法是安装需要的依赖库什么的,

参考网址:http://blog.csdn.net/henulwj/article/details/48131393

                 https://www.phodal.com/blog/python-pip-openssl-issue/

相关指令:

               

yum install python-devel libffi-devel openssl-devel  
pip install pyopenssl ndg-httpsclient pyasn1

(注意,Ubuntu系统不能用yum,应该换成apt-get) 


安装完了以后还是不能直接通过pip install pyspider 。因为上面这一步只是解决了pip使用时出现 InsecurePlatformWarning 的报错信息。

而flask还是不能装上的,这个时候就只能通过自己手动装上flask了。当然,有走了弯路,去搜索bad md5 hash for package。这里就不贴了。

参考网址:http://www.169it.com/tech-python/article-539019800.html 在安装了相关的程序以后通过:

            

easy_install flask 

          就成功装上了flask。这也说明,通过pip install flask 时出现错误,重新安装时只会从缓冲里面读取,哪怕是装好了相关依赖还是安装不成功,这个时候通过easy_install去安装也许是一个不错的方法。

 

【3】再次运行  pip install pyspider .

一切都很顺利,直到安装lxml时出错。这里我把出错的几个关键信息贴上来:

#信息一#:

Installing collected packages: chardet, cssselect, lxml, pyquery, requests, certifi, tornado, Flask-Login, u-msgpack-python, click, pyspider
  Found existing installation: chardet 2.0.1
    DEPRECATION: Uninstalling a distutils installed project (chardet) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling chardet-2.0.1:
      Successfully uninstalled chardet-2.0.1
  Running setup.py install for chardet
  Running setup.py install for cssselect
  Running setup.py install for lxml
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-dtraef/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5Tyn0R-record/install-record.txt --single-version-externally-managed --compile:
    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'bugtrack_url'
      warnings.warn(msg)
    Building lxml version 3.4.4.
    Building without Cython.
    ERROR: /bin/sh: xslt-config: command not found

    ** make sure the development packages of libxml2 and libxslt are installed **

#信息二#:

gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/tmp/pip-build-dtraef/lxml/src/lxml/includes -I/usr/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.linux-x86_64-2.7/src/lxml/lxml.etree.o -w
    In file included from src/lxml/lxml.etree.c:239:0:
    /tmp/pip-build-dtraef/lxml/src/lxml/includes/etree_defs.h:14:31: fatal error: libxml/xmlversion.h: No such file or directory
     #include "libxml/xmlversion.h"
                                   ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-dtraef/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-5Tyn0R-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-dtraef/lxml

 

从信息一可以看出:已经下载好了所有pyspider依赖加入了安装阶段,并且chardet、cssselect也安装成功了,是lxml安装出错。从报错信息看,应该是libxml2和libxslt没有装好。从信息二看,也可能是 gcc 除了问题。我是先从信息二入手:

参考网址:http://hxl2009.blog.51cto.com/779549/980421

      所以通过指令yum install python-dev gcc把python-dev和gcc重新安装了一下。通过pip install lxml 发现还是出现这样的信息,这就说明出错一定是在信息一了。(不得不佩服能有这样的安装日记可以查阅啊,不然真的不知道哪里出错了!!)

参考网址:http://stackoverflow.com/questions/5178416/pip-install-lxml-error

                 http://blog.csdn.net/azhao_dn/article/details/7501432

输入指令:

             

 yum install libxslt-devel libxml2-devel

然后在输入:

            

 pip install lxml

发现安装成功了!

 

【4】到这里,再输入 pip install pyspider 终于安装成功了!!!尽情开启你的爬虫之路吧!

 

如果你想看我的安装过程的详细信息,可以看我的这篇博文:

      Linux下安装pyspider的详细过程和相关指令【无总结版】

 

【总结】

     1. 特别留意安装过程中的相关信息,那可以排除bug的线索啊

     2. 最好搞清楚原理和每条指令的含义,不然,有时候会为自己的系统装上一大堆没有什么用的东西

     3. 其实可以通过搜索指令来查找报错信息,这样貌似更高效、更有针对性

 

 

 

 

 

 

你可能感兴趣的:(pyspider)