python爬虫之scrapy框架命令行(超级详细)

知识点1.创建项目

scrapy startproject testproject
#  testproject是项目的名称可以自己命名

输出结果为:


C:\Users\qs418>scrapy startproject testproject
New Scrapy project 'testproject', using template directory 'd:\\python_exe\\lib\\site-packages\\scrapy\\templates\\project', created in:
    C:\Users\qs418\testproject

You can start your first spider with:
    cd testproject
    scrapy genspider example example.com

知识点2.进入项目中·:

cd testproject

知识点3.生成spider

scrapy genspider baidu www.baidu.com
 # 生成一个baidu的spider

输出结果为;

Created spider 'baidu' using template 'basic' in module:
  testproject.spiders.baidu

知识点4.了解各类模板

scrapy genspider -l

输出结果为

Available templates:
  basic
  crawl
  csvfeed
  xmlfeed

知识点5,指定模板

scrapy genspider -t crawl zhihu www.zhihu.com

输出结果:



C:\Users\qs418>scrapy genspider -t crawl zhihu www.zhihu.com
Created spider 'zhihu' using template 'crawl'

6.学习笔记
crawl :运行spider的方法,可以指定运行的spider的名称
例如:

scrapy crawl zhihu.py

check:用来检查代码是否有错误

scrapy check zhihu.py

scrapy list:返回项目中所有的名称
scrapy edit :在命令行下编辑
fetch:返回网页源代码,等同于response

scrapy fetch http://www.baidu.com

去掉日志:得到headers

scrapy fetch --nolog --headers http://www.baidu.com

输出结果:


C:\Users\qs418>scrapy fetch --nolog --headers http://www.baidu.com
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en
> User-Agent: Scrapy/1.5.1 (+https://scrapy.org)
> Accept-Encoding: gzip,deflate
>
< Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
< Content-Type: text/html
< Date: Thu, 02 Aug 2018 04:36:31 GMT
< Last-Modified: Mon, 23 Jan 2017 13:27:32 GMT
< Pragma: no-cache
< Server: bfe/1.0.8.18
< Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/

禁止重定向:–no redicrect

scrapy fetch --no-direct http://www.baidu.com

view:将网页以文件的形式保存下来,然后去打开,可以在自动测试中应用

scrapy view http://www.baidu.com

shell:命令行模式的交互,并且返回一些可用的变量

scrapy shell http://www.baidu.com

parse: 传入一些参数,查看返回的结果,相当于格式化输出
seetings:获取当前的配置信息

scrapy settings -h
# -h 获取帮助信息

输出:

C:\Users\qs418\quotetutorial>scrapy settings -h
Usage
=====
  scrapy settings [options]

Get settings values

Options
=======
--help, -h              show this help message and exit
--get=SETTING           print raw setting value
--getbool=SETTING       print setting value, interpreted as a boolean
--getint=SETTING        print setting value, interpreted as an integer
--getfloat=SETTING      print setting value, interpreted as a float
--getlist=SETTING       print setting value, interpreted as a list

Global Options
--------------
--logfile=FILE          log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
                        log level (default: DEBUG)
--nolog                 disable logging completely
--profile=FILE          write python cProfile stats to FILE
--pidfile=FILE          write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
                        set/override setting (may be repeated)
--pdb                   enable pdb on failure

C:\Users\qs418\quotetutorial>

runspider:运行spider

scrapy runspider  baidu.py

version:输出scrapy的版本

scrapy version -v

bench:测试当前爬虫的速度

你可能感兴趣的:(python,python爬虫程序笔记)