scrapyd 使用

远程访问设置

查找配置文件

sudo find / -name default_scrapyd.conf

配置文件路径如下图:


scrapyd配置文件路径.png

编辑配置文件内容,由于默认bind_address = 127.0.0.1 现需要远程访问需要更改为bind_address = 0.0.0.0

[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
items_dir   =
jobs_to_keep = 5
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 4
finished_to_keep = 100
poll_interval = 5.0
# bind_address = 127.0.0.1
bind_address = 0.0.0.0
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application
launcher    = scrapyd.launcher.Launcher
webroot     = scrapyd.website.Root

[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json  = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus

启动scrapyd

scrapyd

打开浏览器,访问ip:6800,效果如下:

访问scrapyd

scrapy本地项目配置

切换到项目所在目录,找到scrapy.cfg文件,打开文件编辑内容,将# url = http://localhost:6800/ 更改为url = http://目标ip:6800/ 这里的目标ip地址为192.168.137.239,编辑后内容如下:

# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.io/en/latest/deploy.html

[settings]
default = meizhuang.settings

[deploy]
url = http://192.168.137.239:6800/
project = meizhuang

安装scrapyd-client

pip install scrapyd-client

拷贝scrapyd-deploy文件到项目根目录,anaconda安装scrapyd-client后的srapyd-deploy文件在windows中的位置C:\ProgramData\Anaconda3\Scripts

运行命令python scrapyd-deploy -l 执行结果如下:

D:\project\python\meizhuang>python scrapyd-deploy -l
default              http://192.168.137.239:6800/

发布爬虫

scrapyd-deploy -p --version

  • target就是前面配置文件里deploy后面的的target名字。
  • project 可以随意定义,跟爬虫的工程名字无关。
  • version自定义版本号,不写的话默认为当前时间戳。

注意,爬虫目录下不要放无关的py文件,放无关的py文件会导致发布失败,但是当爬虫发布成功后,会在当前目录生成一个setup.py文件,可以删除掉。

D:\project\python\meizhuang>python scrapyd-deploy
Packing version 1542160941
Deploying to project "meizhuang" in http://192.168.137.239:6800/addversion.json
Server response (200):
{"project": "meizhuang", "status": "ok", "version": "1542160941", "node_name": "raspberrypi", "spiders": 2}

运行爬虫任务

D:\project\python\meizhuang>curl http://192.168.137.239:6800/schedule.json -d project=meizhuang -d spider=gcfts
{"status": "ok", "jobid": "6a7b48b4e7b411e89e4db827eb8b4dc9", "node_name": "raspberrypi"}

运行效果截图

运行效果截图

你可能感兴趣的:(scrapyd 使用)