免责声明：本人博客所有文章纯属学习之用，不涉及商业利益。不合适引用，自当删除！

先说一些废话

因为是测试，没有给出项目的具体搭建流程。

Puppeteer是谷歌官方出品的一个通过DevTools协议控制headless Chrome的Node库。可以通过Puppeteer的提供的api直接控制Chrome模拟大部分用户操作来进行UI Test或者作为爬虫访问页面来收集数据。所以开发语言当然也就是js啦。

Github项目地址:puppeteer

puppeteer API:puppeteer API，现在看的时候版本是1.7.0。

puppeteer的简单使用，大家可以参照官网上的demo，或者百度出来的文章，都会有相关的代码，但是似乎puppeteer用的人相对较少，所以真实资料也少很多。能找到的文章就是那么几篇。至于puppeteer的安装就不累诉了，搜索一下，相信各位能找到，这里主要是针对使用puppeteer对访问网页后，对全网页进行截图（无对网页重出现的特殊场景进行处理，如：验证码、登录框等）。

以https://www.jd.com为例。注意点：需要翻页，否则页面加载不全，则截图时，展示不全。后续会有优化版。

上代码

无翻页版本，写到这里就简单的给出类似demo的例子：

// 导入包

const puppeteer =require('puppeteer');

(async () => {

// 启动Chromium

const browser =await puppeteer.launch({ignoreHTTPSErrors:true, headless:false, args: ['--no-sandbox']});

// 打开新页面

const page =await browser.newPage();

// 设置页面分辨率

await page.setViewport({width:1920, height:1080});

let request_url ='https://www.jd.com';

// 访问

await page.goto(request_url, {waitUntil:'domcontentloaded'}).catch(err => console.log(err));

await page.waitFor(1000);

let title =await page.title();

console.log(title);

try {

// 截图

await page.screenshot({path:"jd.jpg", fullPage:true}).catch(err => {

console.log('截图失败');

console.log(err);

});

await page.waitFor(5000);

}catch (e) {

console.log('执行异常');

}finally {

await browser.close();

}

})();

运行结果截图：

无翻页版本

可以发现，图片缺失严重，不管你在代码中等待多久，都是没用的，需要对页面进行滚动，触发页面的滚动事件。滚动版本，这个版本可以实现滚动，但是觉得代码写的不好，而且对个别一些网站不兼容，所以才成为第一个版本：

const puppeteer =require('puppeteer');

(async () => {

// 启动Chromium

const browser =await puppeteer.launch({ignoreHTTPSErrors:true, headless:false, args: ['--no-sandbox']});

// 打开新页面

const page =await browser.newPage();

// 设置页面分辨率

await page.setViewport({width:1920, height:1080});

let request_url ='https://www.jd.com';

// 访问

await page.goto(request_url, {waitUntil:'domcontentloaded'}).catch(err => console.log(err));

await page.waitFor(1000);

let title =await page.title();

console.log(title);

// 网页加载最大高度

const max_height_px =20000;

// 滚动高度

let scrollStep =1080;

let height_limit =false;

let mValues = {'scrollEnable':true, 'height_limit': height_limit};

while (mValues.scrollEnable) {

mValues =await page.evaluate((scrollStep, max_height_px, height_limit) => {

// 防止网页没有body时，滚动报错

if (document.scrollingElement) {

let scrollTop = document.scrollingElement.scrollTop;

document.scrollingElement.scrollTop = scrollTop + scrollStep;

if (null != document.body && document.body.clientHeight > max_height_px) {

height_limit =true;

}else if (document.scrollingElement.scrollTop + scrollStep > max_height_px) {

height_limit =true;

}

let scrollEnableFlag =false;

if (null != document.body) {

scrollEnableFlag = document.body.clientHeight > scrollTop +1081 && !height_limit;

}else {

scrollEnableFlag = document.scrollingElement.scrollTop + scrollStep > scrollTop +1081 && !height_limit;

}

return {

'scrollEnable': scrollEnableFlag,

'height_limit': height_limit,

'document_scrolling_Element_scrollTop': document.scrollingElement.scrollTop

};

}

}, scrollStep, max_height_px, height_limit);

await sleep(800);

}

try {

await page.screenshot({path:"jd.jpg", fullPage:true}).catch(err => {

console.log('截图失败');

console.log(err);

});

await page.waitFor(5000);

}catch (e) {

console.log('执行异常');

}finally {

await browser.close();

}

})();

//延时函数

function sleep(delay) {

return new Promise((resolve, reject) => {

setTimeout(() => {

try {

resolve(1)

}catch (e) {

reject(0)

}

}, delay)

})

}

结果截图：

不满意的滚动版本

可以看到，效果还是可以的，至于等待时间，需要根据你的网络环境进行延时。当然，此版本只是为了大家学习。

使用puppeteer之全网页截图-第一个版本

先说一些废话

上代码

你可能感兴趣的:(使用puppeteer之全网页截图-第一个版本)