1.新建一个目录文件,例如:dockerfile
2. 进入目录dockerfile中,然后新建以下三个文件:
新建scrapyd.conf,内容如下:
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 5
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 10
finished_to_keep = 100
poll_interval = 5.0
#bind_address = 127.0.0.1
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
新建requirements.txt,内容如下:
requests
selenium
beautifulsoup4
pyquery
pymysql
pymongo
redis
flasky
django
scrapy
scrapyd
scrapyd-client
scrapy-redis
scrapy-splash
新建dockerfile,内容如下(最好指定国内下载源,不然下载很慢,我这里设置为pypi.douban):
from python:3.7
add . /code
workdir /code
copy ./scrapyd.conf /etc/scrapyd/
expose 6800
run pip3 install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com
cmd scrapyd
docker build -t scrapyd:latest .
docker run -d -p 6800:6800 scrapyd
docker login
首先可以为镜像打一个标签来标识一下:
docker tag scrapyd:latest ***/scrapyd:latest
然后 Push 即可:
docker push ***/scrapyd:latest
运行:
docker run -d -p 6800:6800 ***/scrapyd:latest
Scrapyd成功在其他服务器上运行。
如何将scrapy项目部署到scrapyd上
1.首先在docker中,启动Docker Hub远程scrapyd服务:
docker run -d -p 6800:6800 ***/scrapyd:latest
2.然后进入scrapy项目工程,找到scrapy.cfg文件,修改如下(192.168.99.100为docker虚拟机ip地址):
[deploy]
url = http://192.168.99.100:6800/
project = tutorial
3.在scrapy.cfg当前目录下,执行scrapyd-deploy,就把当前scrapy的项目部署到 scrapyd上了:
返回的结果表示部署成功了
4. 通过curl http://192.168.99.100:6800/schedule.json运行远程程序:
5.通过网页打开http://192.168.99.100:6800/, 可以看到执行的任务,然后可以查看里面的log信息,看看执行结果是否ok: