pyppeteer 类似selenium,可以操作Chrome浏览器
文档:https://miyakogi.github.io/pyppeteer/index.html
github: https://github.com/miyakogi/pyppeteer
环境要求:
python 3.6+
pip install pyppeteer
# -*- coding: utf-8 -*-
import asyncio
from pyppeteer import launch
from pyquery import PyQuery as pq
# 最好指定一下自己浏览器的位置,如果不指定会自动下载,太慢了...
executable_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
# 示例一: 渲染页面
async def crawl_page():
# 打开浏览器
browser = await launch(executablePath=executable_path)
# 打开tab
page = await browser.newPage()
# 输入网址回车
await page.goto('http://quotes.toscrape.com/js/')
# 获取内容并解析
doc = pq(await page.content())
print('Quotes:', doc('.quote').length)
# 关闭浏览器
await browser.close()
# 示例二:截图,保存pdf,执行js
async def save_pdf():
browser = await launch(executablePath=executable_path)
page = await browser.newPage()
await page.goto('http://quotes.toscrape.com/js/')
# 网页截图保存
await page.screenshot(path='example.png')
# 网页导出 PDF 保存
await page.pdf(path='example.pdf')
# 执行 JavaScript
dimensions = await page.evaluate('''() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}''')
print(dimensions)
await browser.close()
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(crawl_page())
# asyncio.get_event_loop().run_until_complete(save_pdf())
异步编程,这个关键字太多了,看的眼花缭乱
参考
别只用 Selenium,新神器 Pyppeteer 绕过淘宝更简单!