不多说,直接开整
一、安装开发包组、升级操作系统
#yum groupinstall "Development Tools" -y #yum update -y
注:
1、如果你的系统上的python不是python2.7以上版本请升级到python2.7以上版本(由于Scrapy 需要python 2.7 以上的版本)
#下载python2.7 #wget http://python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2 #解压 #tar -jxvf Python-2.7.3.tar.bz2 #cd Python-2.7.3 #安装 #./configure #make all #make install #make clean #make distclean #查看python 版本 #/usr/local/bin/python2.7 -V #建立软连接,使系统默认的 python指向 python2.7 #mv /usr/bin/python /usr/bin/python2.6.6 #ln -s /usr/local/bin/python2.7 /usr/bin/python #解决系统 Python 软链接指向 Python2.7 版本后,因为yum是不兼容 Python 2.7的,所以yum不能正常工作,我们需要指定 yum 的Python版本 vim /usr/bin/yum 将文件头部的 #!/usr/bin/python 改成 #!/usr/bin/python2.6.6
2、强烈建议升级python2.7后再安装pip与setuptools,如果不这样操作会出现很多莫明的问题,让你酸爽到天明!!
3、如果你是升级到python2.7,更大的可能性是全部通过python setup.py 编译安装,所需要的包含但不限于这些包
lxml,zope.interface,Twisted,characteristic,pyasn1-modules,service-identity,Scrapy
PS:我一开始就是编译安装的,其中最多的问题是:
error:command 'gcc' failed with exit status 1
后来我发现,如果有这样的提示不是缺少devel包就是少某一个lib库文件;最令我哭笑不得是安装Scrapy 提示成功,但无法创建项目,测试样例都跑不了,最终我果断的换centos7了!
###################以下内容都是Centos 7上的操作,升级到python2.7的同学请绕行##############
二、vim /etc/yum.repo/rpmforge.repo 指定rpmforge,来安装liffi-devel【如果不指定源,yum install liffi-devel会提示没有找到】
[rpmforge] name = Red Hat Enterprise $releasever - RPMforge.net - dag #baseurl = http://apt.sw.be/redhat/el5/en/$basearch/dag mirrorlist = http://apt.sw.be/redhat/el7/en/mirrors-rpmforge #mirrorlist = file:///etc/yum.repos.d/mirrors-rpmforge enabled = 1 protect = 0 gpgkey = file:///etc/pki/rpm-gpg/RPM-GPG-KEY-rpmforge-dag gpgcheck = 1
#rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt #yum install liffi-devel -y
三、如果系统中安装有audit这个包请先移除,它会影响到Scrapy的安装
#yum remove audit
四、安装Scarpy 所需要的开发包
#yum install -y python-devel openssl-devel libxslt-devel libxml2-devel
五、安装pip与setuptools
#yum install python-pip -y #pip install setuptools #pip install setuptoos --upgrade
六、安装Scrapy
# pip install Scrapy Collecting Scrapy Using cached Scrapy-1.0.3-py2-none-any.whl Requirement already satisfied (use --upgrade to upgrade): cssselect>=0.9 in /usr/lib/python2.7/site-packages (from Scrapy) Requirement already satisfied (use --upgrade to upgrade): queuelib in /usr/lib/python2.7/site-packages (from Scrapy) Requirement already satisfied (use --upgrade to upgrade): pyOpenSSL in /usr/lib/python2.7/site-packages (from Scrapy) Requirement already satisfied (use --upgrade to upgrade): w3lib>=1.8.0 in /usr/lib/python2.7/site-packages (from Scrapy) Collecting lxml (from Scrapy) Using cached lxml-3.4.4.tar.gz Collecting Twisted>=10.0.0 (from Scrapy) Using cached Twisted-15.4.0.tar.bz2 Requirement already satisfied (use --upgrade to upgrade): six>=1.5.2 in /usr/lib/python2.7/site-packages (from Scrapy) Collecting service-identity (from Scrapy) Using cached service_identity-14.0.0-py2.py3-none-any.whl Requirement already satisfied (use --upgrade to upgrade): cryptography>=0.7 in /usr/lib64/python2.7/site-packages (from pyOpenSSL->Scrapy) Collecting zope.interface>=3.6.0 (from Twisted>=10.0.0->Scrapy) Using cached zope.interface-4.1.3.tar.gz Collecting characteristic>=14.0.0 (from service-identity->Scrapy) Using cached characteristic-14.3.0-py2.py3-none-any.whl Collecting pyasn1-modules (from service-identity->Scrapy) Using cached pyasn1_modules-0.0.8-py2.py3-none-any.whl Requirement already satisfied (use --upgrade to upgrade): pyasn1 in /usr/lib/python2.7/site-packages (from service-identity->Scrapy) Requirement already satisfied (use --upgrade to upgrade): idna>=2.0 in /usr/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->Scrapy) Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->Scrapy) Requirement already satisfied (use --upgrade to upgrade): enum34 in /usr/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->Scrapy) Requirement already satisfied (use --upgrade to upgrade): ipaddress in /usr/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->Scrapy) Requirement already satisfied (use --upgrade to upgrade): cffi>=1.1.0 in /usr/lib64/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->Scrapy) Requirement already satisfied (use --upgrade to upgrade): pycparser in /usr/lib/python2.7/site-packages (from cffi>=1.1.0->cryptography>=0.7->pyOpenSSL->Scrapy) Installing collected packages: lxml, zope.interface, Twisted, characteristic, pyasn1-modules, service-identity, Scrapy Running setup.py install for lxml Running setup.py install for zope.interface Running setup.py install for Twisted Successfully installed Scrapy-1.0.3 Twisted-15.4.0 characteristic-14.3.0 lxml-3.4.4 pyasn1-modules-0.0.8 service-identity-14.0.0 zope.interface-4.1.3
七、创建项目
[root@localhost workspace]# scrapy startproject tutorial 2015-10-15 21:54:24 [scrapy] INFO: Scrapy 1.0.3 started (bot: scrapybot) 2015-10-15 21:54:24 [scrapy] INFO: Optional features available: ssl, http11 2015-10-15 21:54:24 [scrapy] INFO: Overridden settings: {} New Scrapy project 'tutorial' created in: /workspace/tutorial You can start your first spider with: cd tutorial scrapy genspider example example.com
八、目录结构
[root@localhost workspace]# tree . └── tutorial ├── scrapy.cfg └── tutorial ├── __init__.py ├── items.py ├── pipelines.py ├── settings.py └── spiders └── __init__.py 3 directories, 6 files
九、Scrapy相关文档
http://www.tuicool.com/articles/URNVV3E 【编译安装Scrapy,但很不幸,我没成功】
https://scrapy-chs.readthedocs.org/zh_CN/0.24/intro/overview.html 【很早之前Scrapy中文翻译】
http://scrapy.org/
http://doc.scrapy.org/en/master/
十、总结
再次证明了度娘确实不好用;
一定要看官方的文档,搜索出来的不全面。这样可以少走很多弯路,减少不必要的工作量;
遇到的问题要先思考,冷静3s【也就一个Q[三重爪击]的时间】,再去搜索问题;
解决问题要形成文档,方便自己也方便别人。