puppeteer(木偶),是Google Chrome团队出品的一款无界面Chrome工具,它提供了丰富的API,让开发者像
鼠标一样控制浏览器的各种行为。它是一个node库,所以安装nodejs以及npm包管理器
apt-get install nodejs
apt-get install npm
node -v
npm install n -g
sudo n
有时候明明版本切换了,终端仍显示旧的版本号怎么办?
原因1:node被nvm接管了。因此,可以通过以下命令切换回来
export NODE_HOME=/usr/local
export PATH=$NODE_HOME/bin:$PATH
export NODE_PATH=$NODE_HOME/lib/node_modules:$PATH
由于安装puppeteer的时候,会额外去下载chromeium包,chromium下载服务器在国外,因此,很多puppeteer
安装失败的原因就是chromeium下载失败。解决办法有哪些呢?
npm install puppeteer --ignore-scripts
然后在脚本中通过配置项 executablePath,指定 Chromium 所在的位置。示例:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
executablePath: '../../chromium/Chromium.app/Contents/MacOS/Chromium'
});
const page = await browser.newPage();
await page.goto('https://y.qq.com');
await page.screenshot({
path: 'yqq.png'});
browser.close();
})();
手动下载Chromuim。墙里下载不到Chromuim怎么办。看仓库源码可以发现下载地址是这样的
const downloadURLs = {
linux: 'https://storage.googleapis.com/chromium-browser-snapshots/Linux_x64/%d/chrome-linux.zip',
mac: 'https://storage.googleapis.com/chromium-browser-snapshots/Mac/%d/chrome-mac.zip',
win32: 'https://storage.googleapis.com/chromium-browser-snapshots/Win/%d/chrome-win32.zip',
win64: 'https://storage.googleapis.com/chromium-browser-snapshots/Win_x64/%d/chrome-win32.zip'
}
选择对应的平台,将 %d 替换成具体的编号,编号可以从 puppeteer/package.json 中的 puppeteer.chromium_revision查看
2. 切换npm源,使用国内源 详细教程
永久使用
npm config set registry https://registry.npm.taobao.org
临时使用
npm --registry https://registry.npm.taobao.org install express
在mac上安装完chromium后能立即使用,在ubuntu上可能存在这些问题
sudo apt-get install libgtk-3.so.0
sudo ldconfig
sudo dpkg --add-architecture i386
sudo apt update
sudo apt install libgtk-3-0:i386
(node:17781) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
[0719/113011.202956:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
at onClose (/home/zhenlinhuo/spider/node_modules/puppeteer/lib/Launcher.js:340:14)
at Interface.helper.addEventListener (/home/zhenlinhuo/spider/node_modules/puppeteer/lib/Launcher.js:329:50)
at Interface.emit (events.js:187:15)
at Interface.close (readline.js:379:8)
at Socket.onend (readline.js:157:10)
at Socket.emit (events.js:187:15)
at endReadableNT (_stream_readable.js:1081:12)
at process._tickCallback (internal/process/next_tick.js:63:19)
(node:17781) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:17781) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
这个问题,只需要配置chromium启动项
browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true
});