Puppeteer 截图及相关问题
Puppeteer 是 Headless Chrome 的 Node.js 封装。通过它可方便地对页面进行截图,或者保存成 PDF。 镜像的设置因为其使用了 Chromium,其源在 Google 域上,最好设置一下 npm 从国内镜像安装,可解决无法安装的问题。 推荐在项目中放置 这是一个整理好的 # .npmrc
$ npx pkgrc
# .yarnc
$ npx pkgrc yarn
截取页面使用 const puppeteer = require("puppeteer");
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto("https://example.com");
await page.screenshot({ path: "screenshot.png" });
await browser.close();
});
实际应用中,你需要加上等待时间,以保证页面已经完全加载,否则截取出来的画面是页面半成品的样子。 通过 const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto('https://example.com');
// 等待一秒钟
+ await page.waitFor(1000);
await page.screenshot({path: 'screenshot.png'});
await browser.close();
});
但这里无论你指定的时长是多少,都是比较主观的值。页面实际加载情况受很多因素影响,机器性能,网络好坏等。即页面加载完成是个无法预期的时长,所以这种方式不靠谱。我们应该使用另一个更加有保障的方式,在调用 const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto('https://example.com’,{
+ waitUtil: 'networkidle2'
});
await page.screenshot({path: 'screenshot.png'});
await browser.close();
});
此时再进行截图,是比较保险的了。 截图时还有个实用的参数 await page.screenshot({ path: "screenshot.png", fullPage: true });
注意,其与 // 抛错!
await page.screenshot({
path: "screenshot.png",
fullPage: true,
clip: {
x: 0,
y: 0,
width: 400,
height: 400
}
});
针对页面中某个元素进行截取如果你使用过 Chrome DevTool 中的截图命令,或许知道,其中有一个针对元素进行截取的命令。 Chrome DevTool 中对元素进行截图的命令 所以,除了对整个页面进行截取,Chrome 还支持对页面某个元素进行截取。通过 这就很实用了,能够满足大部分自定义的需求。大多数情况下,我们只对 body 部分感兴趣,通过只对 body 进行截取,就不用指定长宽而且自动排除掉 body 外多余的留白等。 const puppeteer = require("puppeteer");
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto("https://example.com", {
waitUtil: "networkidle2"
});
const element = await page.$("body");
await element.screenshot({
path: "screenshot.png"
});
});
其参数与 数据的返回生成的图片可直接返回,也可保存成文件后返回文件地址。 其中,截图方法 以 Koa 为例,binary 形式的 buffer 数据直接赋值给 app.use(async ctx => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const buffer = await page.screenshot();
await browser.close();
ctx.response.attachment("screenshot.png");
ctx.body = buffer;
});
字符串形式时,需要注意拿到的并不是标准的图片 base64 格式,它只包含了数据部分,并没有文件类型部分,即 app.use(async ctx => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const base64 = await page.screenshot({ encoding: "base64" });
await browser.close();
ctx.body = `${base64}"/>`;
});
如果你是以异步接口形式返回到前端,只需要将 当然,字符串形式下,仍然是可以返回成文件下载的形式的, app.use(async ctx => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const base64 = await page.screenshot({ encoding: "base64" });
await browser.close();
ctx.response.attachment("screenshot.png");
const image = new Buffer(base64, "base64");
ctx.body = image;
});
PDF 的生成通过 const puppeteer = require("puppeteer");
async function run() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.emulateMedia("screen");
await page.goto("https://www.google.com/chromebook/");
await page.pdf({
path: "puppeteer.pdf",
format: "A4"
});
await browser.close();
}
run();
一般 PDF 用于打印,所以默认以 需要注意的是,如果页面中使用了背景图片,上面代码截取出来是看不到的。 截图 PDF 时背景图片未显示 需要设置截取时的 await page.pdf({
path: "puppeteer.pdf",
format: "A4",
+ printBackground: true
});
修正后截图的 PDF 背景图片正常显示 一些问题服务器字体文件问题部署到全新的 Linux 环境时,大概率你会看到截来的图片中中文无法显示。 中文字体缺失的情况  $ sudo apt-get install language-pack-zh*
$ sudo apt-get install chinese*
服务器上 Chromium 无法启动的问题在 Puppeteer 的 troubleshoting 文档 中有对应的解决方案。 一般是机器上缺少对应的依赖库,安装补上即可。Puppeteer 自带的 Chromium 是非常纯粹的,它不会安装除了自身作为浏览器外的其他东西。 通过
查看缺少的依赖项
$ ldd node_modules/puppeteer/.local-chromium/linux-641577/chrome-linux/chrome | grep not
libX11.so.6 => not found
libX11-xcb.so.1 => not found
libxcb.so.1 => not found
libXcomposite.so.1 => not found
libXcursor.so.1 => not found
libXdamage.so.1 => not found
libXext.so.6 => not found
libXfixes.so.3 => not found
libXi.so.6 => not found
libXrender.so.1 => not found
libXtst.so.6 => not found
libgobject-2.0.so.0 => not found
libglib-2.0.so.0 => not found
libnss3.so => not found
libnssutil3.so => not found
libsmime3.so => not found
libnspr4.so => not found
libcups.so.2 => not found
libdbus-1.so.3 => not found
libXss.so.1 => not found
libXrandr.so.2 => not found
libgio-2.0.so.0 => not found
libasound.so.2 => not found
libpangocairo-1.0.so.0 => not found
libpango-1.0.so.0 => not found
libcairo.so.2 => not found
libatk-1.0.so.0 => not found
libatk-bridge-2.0.so.0 => not found
libatspi.so.0 => not found
libgtk-3.so.0 => not found
libgdk-3.so.0 => not found
libgdk_pixbuf-2.0.so.0 => not found
那么多,一个个搜索(因为这里例出的名称不一定就是直接可用来安装的名称)安装多麻烦。所以需要用其他方法。 以 Debian 系统为例。 tips: 可通过 $ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
脚本安装通过 troubleshoting 页面 Chrome headless doesn't launch 部分其列出的对应系统所需依赖中,将所有依赖复制出来组装成如下的命令执行: sudo apt-get install gconf-service libasound2 libatk1.0-0 libatk-bridge2.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
通过安装 Chrome 来自动安装直接安装一个非 Chromium 版本的 Chrome,它会把依赖自动安装上。 Chrome 是基于 Chromium 的发行版,包括 还是以 Debian 系统为例: $ apt-get update && apt-get install google-chrome-unstable
如果直接执行上面的安装,会报错: E: Unable to locate package google-chrome-unstable
这是安装程序时的一个安全相关策略 ,需要先设置一下 $ wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
然后设置 Chrome 的仓库: $ sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
再次执行安装便正常进行了。 安装时可以看到会提示所需的依赖项:
安装 Chrome 时的提示信息
$ sudo apt-get install google-chrome-unstable
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
adwaita-icon-theme at-spi2-core dbus dconf-gsettings-backend dconf-service fontconfig fontconfig-config fonts-liberation glib-networking
glib-networking-common glib-networking-services gsettings-desktop-schemas gtk-update-icon-cache hicolor-icon-theme libappindicator3-1
libasound2 libasound2-data libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data libatspi2.0-0 libauthen-sasl-perl libavahi-client3
libavahi-common-data libavahi-common3 libcairo-gobject2 libcairo2 libcolord2 libcroco3 libcups2 libdatrie1 libdbus-1-3 libdbusmenu-glib4
libdbusmenu-gtk3-4 libdconf1 libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libegl1-mesa libencode-locale-perl
libepoxy0 libfile-basedir-perl libfile-desktopentry-perl libfile-listing-perl libfile-mimeinfo-perl libfont-afm-perl libfontconfig1
libfontenc1 libgbm1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libglib2.0-0 libglib2.0-data
libgraphite2-3 libgtk-3-0 libgtk-3-bin libgtk-3-common libharfbuzz0b libhtml-form-perl libhtml-format-perl libhtml-parser-perl
libhtml-tagset-perl libhtml-tree-perl libhttp-cookies-perl libhttp-daemon-perl libhttp-date-perl libhttp-message-perl
libhttp-negotiate-perl libice6 libindicator3-7 libio-html-perl libio-socket-ssl-perl libipc-system-simple-perl libjbig0 libjpeg62-turbo
libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libllvm3.9 liblwp-mediatypes-perl liblwp-protocol-https-perl libmailtools-perl
libnet-dbus-perl libnet-http-perl libnet-smtp-ssl-perl libnet-ssleay-perl libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0
libpangoft2-1.0-0 libpciaccess0 libpixman-1-0 libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsensors4 libsm6 libsoup-gnome2.4-1
libsoup2.4-1 libthai-data libthai0 libtie-ixhash-perl libtiff5 libtimedate-perl libtxc-dxtn-s2tc libu2f-udev liburi-perl
libwayland-client0 libwayland-cursor0 libwayland-egl1-mesa libwayland-server0 libwww-perl libwww-robotrules-perl libx11-6 libx11-data
libx11-protocol-perl libx11-xcb1 libxau6 libxaw7 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-render0 libxcb-shape0
libxcb-shm0 libxcb-sync1 libxcb-xfixes0 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxft2 libxi6
libxinerama1 libxkbcommon0 libxml-parser-perl libxml-twig-perl libxml-xpathengine-perl libxml2 libxmu6 libxmuu1 libxpm4 libxrandr2
libxrender1 libxshmfence1 libxss1 libxt6 libxtst6 libxv1 libxxf86dga1 libxxf86vm1 perl-openssl-defaults sgml-base shared-mime-info
x11-common x11-utils x11-xserver-utils xdg-user-dirs xdg-utils xkb-data xml-core
按[Y] 确认即可。 有了这些依赖,Puppeteer 中的 Chromium 便可运行了。 $ google-chrome-unstable --version
Google Chrome 75.0.3745.4 dev
$ ldd node_modules/puppeteer/.local-chromium/linux-641577/chrome-linux/chrome | grep not
我们的目的只是安装依赖,所以装完后可移除 Chome。
卸载 Chrome 时的提示信息
$ sudo apt-get remove google-chrome-unstable
...
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
adwaita-icon-theme at-spi2-core dconf-gsettings-backend dconf-service fontconfig fontconfig-config fonts-liberation glib-networking
glib-networking-common glib-networking-services gsettings-desktop-schemas gtk-update-icon-cache hicolor-icon-theme libappindicator3-1
libasound2 libasound2-data libatk-bridge2.0-0 libatk1.0-0 libatk1.0-data libatspi2.0-0 libauthen-sasl-perl libavahi-client3
libavahi-common-data libavahi-common3 libcairo-gobject2 libcairo2 libcolord2 libcroco3 libcups2 libdatrie1 libdbusmenu-glib4
libdbusmenu-gtk3-4 libdconf1 libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libegl1-mesa libencode-locale-perl
libepoxy0 libfile-basedir-perl libfile-desktopentry-perl libfile-listing-perl libfile-mimeinfo-perl libfont-afm-perl libfontconfig1
libfontenc1 libgbm1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-common libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa libglib2.0-0 libglib2.0-data
libgraphite2-3 libgtk-3-0 libgtk-3-bin libgtk-3-common libharfbuzz0b libhtml-form-perl libhtml-format-perl libhtml-parser-perl
libhtml-tagset-perl libhtml-tree-perl libhttp-cookies-perl libhttp-daemon-perl libhttp-date-perl libhttp-message-perl
libhttp-negotiate-perl libice6 libicu57 libindicator3-7 libio-html-perl libio-socket-ssl-perl libipc-system-simple-perl libjbig0
libjpeg62-turbo libjson-glib-1.0-0 libjson-glib-1.0-common liblcms2-2 libllvm3.9 liblwp-mediatypes-perl liblwp-protocol-https-perl
libmailtools-perl libnet-dbus-perl libnet-http-perl libnet-smtp-ssl-perl libnet-ssleay-perl libnspr4 libnss3 libpango-1.0-0
libpangocairo-1.0-0 libpangoft2-1.0-0 libpciaccess0 libpixman-1-0 libproxy1v5 librest-0.7-0 librsvg2-2 librsvg2-common libsensors4 libsm6
libsoup-gnome2.4-1 libsoup2.4-1 libthai-data libthai0 libtie-ixhash-perl libtiff5 libtimedate-perl libtxc-dxtn-s2tc libu2f-udev
liburi-perl libuv1 libwayland-client0 libwayland-cursor0 libwayland-egl1-mesa libwayland-server0 libwww-perl libwww-robotrules-perl
libx11-6 libx11-data libx11-protocol-perl libx11-xcb1 libxau6 libxaw7 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0
libxcb-render0 libxcb-shape0 libxcb-shm0 libxcb-sync1 libxcb-xfixes0 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxdmcp6 libxext6
libxfixes3 libxft2 libxi6 libxinerama1 libxkbcommon0 libxml-parser-perl libxml-twig-perl libxml-xpathengine-perl libxml2 libxmu6 libxmuu1
libxpm4 libxrandr2 libxrender1 libxshmfence1 libxss1 libxt6 libxtst6 libxv1 libxxf86dga1 libxxf86vm1 perl-openssl-defaults sgml-base
shared-mime-info x11-common x11-utils x11-xserver-utils xdg-user-dirs xdg-utils xkb-data xml-core
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
google-chrome-unstable
0 upgraded, 0 newly installed, 1 to remove and 25 not upgraded.
After this operation, 213 MB disk space will be freed.
Do you want to continue? [Y/n]
(Reading database ... 71778 files and directories currently installed.)
Removing google-chrome-unstable (75.0.3745.4-1) ...
Processing triggers for mime-support (3.60) ...
Processing triggers for man-db (2.7.6.1-2) ...
sandbox 的问题Linux 上 Puppeteer 启动 Chromium 时可能会看到如下的错误提示: [0402/152925.182431:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
错误信息已经很明显,所以在启动时加上 const browser = await puppeteer.launch({ args: ["--no-sandbox"] });
但考虑到安全问题,Puppeteer 是强烈不建议在无沙盒环境下运行,除非加载的页面其内容是绝对可信的。 如果需要设置在沙盒中运行,可参考文档中的两种方法。 相关资源
|