docker 打包 selenium+chromedriver+chrome 遇到的坑和解决方案

docker 跑 selenium 的时候遇到了很多坑,记录一下排坑过程

Python 使用 selenium+chromedriver+chrome 实现网页截图

Dockerfile

FROM python:3.10-buster

# 如果要阿里源,就用下面这个
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) 
# 如果要清华源,就用下面这个
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list) 
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) 

WORKDIR /code
RUN mkdir /code/depends
# 下载并安装 chrome, TIPS: dpkg 不会处理依赖,要使用 apt 安装 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)


COPY install.py /code/
RUN python install.py

RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/

让我们一行一行来看

  • RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) 这行的作用是使用 aliyun 的 debian apt 仓库,原因当然是邪恶长城
  • RUN (apt update) && (apt upgrade -y) 更新一下 apt 源,并更新软件。可以只要 apt-get update,而删除 apt-get upgrade,后者不是必须项
  • RUN (apt install -y lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) 这几个包用来干嘛呢?安装中文字体,作用会在下面讲到
  • RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb) ,记得使用 apt 安装 chrome,而不是 dpkg

解决中文显示为方块的问题:

简中互联网上,会有人教你,如何自己安装手动下载 ttf 文件,然后复制粘贴,然后怎么怎么样,一堆操作。我就很无语,他们真的是一点不懂什么叫做 Linux 吗?

没有这么多麻烦的事情,你装个 Linux Desktop 难道不是自带中文的?还要你自己去网上下字体文件的?

很简单,apt 仓库里面都有准备好的字体,直接用 apt 命令一键安装就好了!

apt-get install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*

如何在 Docker 中安装 chrome?

简中互联网上,很喜欢用 dkpg 安装 chrome,但是这样是非常的愚蠢的!他们可能不懂 Linux 也不懂 apt

正确的方式:使用 apt 安装 chrome,因为 apt 会自动帮你处理依赖关系!而使用 dkpg 就需要自己解决依赖问题了,会很棘手

RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)

解决 Docker + selenium + chromedriver + chrome 会出现僵尸进程的问题:

      1   18042   18041   18041 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18046       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18047       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18060       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18062       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18095       1       1 ?             -1 Z        0   0:02 [chrome] 
      1   18116       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18117       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18119   18118   18118 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18123       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18124       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18140       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18141       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18171       1       1 ?             -1 Z        0   0:02 [chrome] 
      1   18193       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18194       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18196   18195   18195 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18200       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18201       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18216       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18218       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18248       1       1 ?             -1 Z        0   0:02 [chrome] 
      1   18271       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18272       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18274   18273   18273 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18278       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18279       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18293       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18295       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18328       1       1 ?             -1 Z        0   0:02 [chrome] 
      1   18350       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18351       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18353   18352   18352 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18357       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18358       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18373       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18375       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18406       1       1 ?             -1 Z        0   0:01 [chrome] 
      1   18428       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18429       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18431   18430   18430 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18435       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18436       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18450       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18451       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18483       1       1 ?             -1 Z        0   0:03 [chrome] 
      1   18507       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18508       1       1 ?             -1 Z        0   0:00 [cat] 
      1   18510   18509   18509 ?             -1 Z        0   0:00 [chrome_crashpad] 
      1   18514       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18515       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18530       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18532       1       1 ?             -1 Z        0   0:00 [chrome] 
      1   18562       1       1 ?             -1 Z        0   0:02 [chrome] 
defunct 是什么?是僵尸进程!

超多的僵尸进程会耗尽 pid 表,导致 Chrome failed to start: exited abnormally.

snapshot-consumer    | selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
snapshot-consumer    |   (unknown error: DevToolsActivePort file doesn't exist)
snapshot-consumer    |   (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
snapshot-consumer    | Stacktrace:

解决办法参考:

如果直接使用 docker、docker-compose 就用第一种

如果是 k8s,就用第二种!

解决因为 shm 交换空间过小,导致 session deleted because of page crash

selenium + chrome + chromedriver 这套组合需要的 shm 空间还是挺大的,Docker 默认只分配 shm 的 size 为 16 MB

单个 selenium + chrome + chromedriver 实例需要 20 MB 左右的 shm 空间。

如果你不管,就会出现下面的错误:

snapshot-consumer    |   File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
snapshot-consumer    |     raise exception_class(message, screen, stacktrace)
snapshot-consumer    |           │               │        │       └ ['#0 0x556b82b0db13 ', '#1 0x556b8291451f ', '#2 0x556b8290193d ', '#3 0x556b82901355 ', ...
snapshot-consumer    |           │               │        └ None
snapshot-consumer    |           │               └ 'unknown error: session deleted because of page crash\nfrom tab crashed\n  (Session info: headless chrome=103.0.5060.114)'
snapshot-consumer    |           └ 
snapshot-consumer    | 
snapshot-consumer    | selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
snapshot-consumer    | from tab crashed
snapshot-consumer    |   (Session info: headless chrome=103.0.5060.114)
snapshot-consumer    | Stacktrace:
snapshot-consumer    | #0 0x556b82b0db13 
snapshot-consumer    | #1 0x556b8291451f 
snapshot-consumer    | #2 0x556b8290193d 

如何解决呢?

version: "3"
services:
  snapshot:
    container_name: snapshot
    image: ponponon/snapshot
    restart: always
    logging:
      driver: json-file
      options:
        max-size: "30m"
        max-file: "1"
    shm_size: "2048M"
    command: python main.py
shm_size 设为多大合适?通过肉眼观测,使用一般在 50MB 左右,所以设为 512M 绰绰有大余

解决办法:https://developer.aliyun.com/article/833847

docker-compose 如何设置 shm-size :参考 https://stackoverflow.com/questions/30210362/how-to-increase-the-size-of-the-dev-shm-in-docker-container

如何获取 jpg 截图

参考: JPG 还是 PNG 和内存结构有关系吗?还是只是保存到硬盘的时候,才有区别?


完整的 Dockerfile 如下:

FROM python:3.10-buster

# 如果要阿里源,就用下面这个
# RUN (echo "deb http://mirrors.aliyun.com/debian/ buster main non-free contrib" > /etc/apt/sources.list) 
# 如果要清华源,就用下面这个
RUN (echo "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free" > /etc/apt/sources.list) 
RUN (apt update) && (apt upgrade -y)
RUN (apt install -y  lsb-release wget ttf-wqy-zenhei xfonts-intl-chinese wqy*) 

WORKDIR /code
RUN mkdir /code/depends
# 下载并安装 chrome, TIPS: dpkg 不会处理依赖,要使用 apt 安装 deb
RUN (wget -P /code/depends https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) && ( apt install -y /code/depends/google-chrome-stable_current_amd64.deb)


COPY install.py /code/
RUN python install.py

RUN /usr/local/bin/python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
COPY requirements-prd.txt /code/
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements-prd.txt
COPY config.yaml /code/
COPY . /code/

顺手做了一个开源教程放于 github 中:ponponon/snapshot

docker 打包 selenium+chromedriver+chrome 遇到的坑和解决方案_第1张图片

你可能感兴趣的:(pythondocker)