docker 跑 selenium 的时候遇到了很多坑,记录一下排坑过程
Python 使用 selenium+chromedriver+chrome 实现网页截图
Dockerfile
FROM python:3.10-buster
# 如果要阿里源,就用下面这个
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
# 如果要清华源,就用下面这个
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list)
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
WORKDIR /code
RUN mkdir /code/depends
# 下载并安装 chrome, TIPS: dpkg 不会处理依赖,要使用 apt 安装 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
COPY install.py /code/
RUN python install.py
RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/
让我们一行一行来看
RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
这行的作用是使用 aliyun 的 debian apt 仓库,原因当然是邪恶长城RUN (apt update) && (apt upgrade -y)
更新一下 apt 源,并更新软件。可以只要apt-get update
,而删除apt-get upgrade
,后者不是必须项RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
这几个包用来干嘛呢?安装中文字体,作用会在下面讲到RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
,记得使用 apt 安装 chrome,而不是 dpkg
解决中文显示为方块的问题:
简中互联网上,会有人教你,如何自己安装手动下载 ttf 文件,然后复制粘贴,然后怎么怎么样,一堆操作。我就很无语,他们真的是一点不懂什么叫做 Linux 吗?
没有这么多麻烦的事情,你装个 Linux Desktop 难道不是自带中文的?还要你自己去网上下字体文件的?
很简单,apt 仓库里面都有准备好的字体,直接用 apt 命令一键安装就好了!
apt-get install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*
如何在 Docker 中安装 chrome?
简中互联网上,很喜欢用 dkpg 安装 chrome,但是这样是非常的愚蠢的!他们可能不懂 Linux 也不懂 apt
正确的方式:使用 apt 安装 chrome,因为 apt 会自动帮你处理依赖关系!而使用 dkpg 就需要自己解决依赖问题了,会很棘手
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
解决 Docker + selenium + chromedriver + chrome 会出现僵尸进程的问题:
1 18042 18041 18041 ? -1 Z 0 0:00 [chrome_crashpad]
1 18046 1 1 ? -1 Z 0 0:00 [chrome]
1 18047 1 1 ? -1 Z 0 0:00 [chrome]
1 18060 1 1 ? -1 Z 0 0:00 [chrome]
1 18062 1 1 ? -1 Z 0 0:00 [chrome]
1 18095 1 1 ? -1 Z 0 0:02 [chrome]
1 18116 1 1 ? -1 Z 0 0:00 [cat]
1 18117 1 1 ? -1 Z 0 0:00 [cat]
1 18119 18118 18118 ? -1 Z 0 0:00 [chrome_crashpad]
1 18123 1 1 ? -1 Z 0 0:00 [chrome]
1 18124 1 1 ? -1 Z 0 0:00 [chrome]
1 18140 1 1 ? -1 Z 0 0:00 [chrome]
1 18141 1 1 ? -1 Z 0 0:00 [chrome]
1 18171 1 1 ? -1 Z 0 0:02 [chrome]
1 18193 1 1 ? -1 Z 0 0:00 [cat]
1 18194 1 1 ? -1 Z 0 0:00 [cat]
1 18196 18195 18195 ? -1 Z 0 0:00 [chrome_crashpad]
1 18200 1 1 ? -1 Z 0 0:00 [chrome]
1 18201 1 1 ? -1 Z 0 0:00 [chrome]
1 18216 1 1 ? -1 Z 0 0:00 [chrome]
1 18218 1 1 ? -1 Z 0 0:00 [chrome]
1 18248 1 1 ? -1 Z 0 0:02 [chrome]
1 18271 1 1 ? -1 Z 0 0:00 [cat]
1 18272 1 1 ? -1 Z 0 0:00 [cat]
1 18274 18273 18273 ? -1 Z 0 0:00 [chrome_crashpad]
1 18278 1 1 ? -1 Z 0 0:00 [chrome]
1 18279 1 1 ? -1 Z 0 0:00 [chrome]
1 18293 1 1 ? -1 Z 0 0:00 [chrome]
1 18295 1 1 ? -1 Z 0 0:00 [chrome]
1 18328 1 1 ? -1 Z 0 0:02 [chrome]
1 18350 1 1 ? -1 Z 0 0:00 [cat]
1 18351 1 1 ? -1 Z 0 0:00 [cat]
1 18353 18352 18352 ? -1 Z 0 0:00 [chrome_crashpad]
1 18357 1 1 ? -1 Z 0 0:00 [chrome]
1 18358 1 1 ? -1 Z 0 0:00 [chrome]
1 18373 1 1 ? -1 Z 0 0:00 [chrome]
1 18375 1 1 ? -1 Z 0 0:00 [chrome]
1 18406 1 1 ? -1 Z 0 0:01 [chrome]
1 18428 1 1 ? -1 Z 0 0:00 [cat]
1 18429 1 1 ? -1 Z 0 0:00 [cat]
1 18431 18430 18430 ? -1 Z 0 0:00 [chrome_crashpad]
1 18435 1 1 ? -1 Z 0 0:00 [chrome]
1 18436 1 1 ? -1 Z 0 0:00 [chrome]
1 18450 1 1 ? -1 Z 0 0:00 [chrome]
1 18451 1 1 ? -1 Z 0 0:00 [chrome]
1 18483 1 1 ? -1 Z 0 0:03 [chrome]
1 18507 1 1 ? -1 Z 0 0:00 [cat]
1 18508 1 1 ? -1 Z 0 0:00 [cat]
1 18510 18509 18509 ? -1 Z 0 0:00 [chrome_crashpad]
1 18514 1 1 ? -1 Z 0 0:00 [chrome]
1 18515 1 1 ? -1 Z 0 0:00 [chrome]
1 18530 1 1 ? -1 Z 0 0:00 [chrome]
1 18532 1 1 ? -1 Z 0 0:00 [chrome]
1 18562 1 1 ? -1 Z 0 0:02 [chrome]
defunct 是什么?是僵尸进程!
超多的僵尸进程会耗尽 pid 表,导致 Chrome failed to start: exited abnormally.
snapshot-consumer | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
snapshot-consumer | (unknown error: DevToolsActivePort file doesn't exist)
snapshot-consumer | (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
snapshot-consumer | Stacktrace:
解决办法参考:
如果直接使用 docker、docker-compose 就用第一种
如果是 k8s,就用第二种!
解决因为 shm 交换空间过小,导致 session deleted because of page crash
selenium + chrome + chromedriver 这套组合需要的 shm 空间还是挺大的,Docker 默认只分配 shm 的 size 为 16 MB
单个 selenium + chrome + chromedriver 实例需要 20 MB 左右的 shm 空间。
如果你不管,就会出现下面的错误:
snapshot-consumer | File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
snapshot-consumer | raise exception_class(message, screen, stacktrace)
snapshot-consumer | │ │ │ └ ['#0 0x556b82b0db13 ', '#1 0x556b8291451f ', '#2 0x556b8290193d ', '#3 0x556b82901355 ', ...
snapshot-consumer | │ │ └ None
snapshot-consumer | │ └ 'unknown error: session deleted because of page crash\nfrom tab crashed\n (Session info: headless chrome=103.0.5060.114)'
snapshot-consumer | └
snapshot-consumer |
snapshot-consumer | selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
snapshot-consumer | from tab crashed
snapshot-consumer | (Session info: headless chrome=103.0.5060.114)
snapshot-consumer | Stacktrace:
snapshot-consumer | #0 0x556b82b0db13
snapshot-consumer | #1 0x556b8291451f
snapshot-consumer | #2 0x556b8290193d
如何解决呢?
version: "3"
services:
snapshot:
container_name: snapshot
image: ponponon/snapshot
restart: always
logging:
driver: json-file
options:
max-size: "30m"
max-file: "1"
shm_size: "2048M"
command: python main.py
shm_size 设为多大合适?通过肉眼观测,使用一般在 50MB 左右,所以设为 512M
绰绰有大余
解决办法:https://developer.aliyun.com/article/833847
docker-compose 如何设置 shm-size :参考 https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container
如何获取 jpg 截图
参考: JPG 还是 PNG 和内存结构有关系吗?还是只是保存到硬盘的时候,才有区别?
完整的 Dockerfile 如下:
FROM python:3.10-buster
# 如果要阿里源,就用下面这个
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list)
# 如果要清华源,就用下面这个
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list)
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*)
WORKDIR /code
RUN mkdir /code/depends
# 下载并安装 chrome, TIPS: dpkg 不会处理依赖,要使用 apt 安装 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)
COPY install.py /code/
RUN python install.py
RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/
顺手做了一个开源教程放于 github 中:ponponon/snapshot