本文只是记录本人学习使用playwright时所用到的方法,以及遇到的bug
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
iphone_11 = p.devices['iPhone 11 Pro']
browser = await p.chromium.launch()
context = await browser.new_context(
**iphone_11,
locale='de-DE',
geolocation={ 'longitude': 12.492507, 'latitude': 41.889938 },
permissions=['geolocation'],
color_scheme='dark',
)
page = await browser.new_page()
await browser.close()
asyncio.run(main())
使用async with async_playwright() as p
的方式创建playwright对象有一个缺点,p
只在当前函数(即代码片中的main
)内有效,当跳出函数时无法再调用该playwright对象,以及使用该对象打开的浏览器进程,浏览器进程会自动关闭。playwright = await async_playwright().start()
,可以将playwright定义为类的成员变量,在类对象释放之前都可以对该playwright创建的浏览器进程进行操作,代码示例(启动火狐浏览器):self.playwright = await async_playwright().start()
self.browser = await self.playwright.firefox.launch()
self.content = await self.browser.new_context()
进行网页自动化操作时,选择网页元素是很常用的方法。其他自动化库如selenium、pyppeteer支持的选择器只有css selector
和xpath
,碰到自动生成的网页,即网页标签属性(id
,class
等)是通过js算法生成的网页时,这两个选择器很难选择到想要的网页元素。
playwright提供了多种选择器类型
官方文档: selector
await page.locator("text=Log in").click()
await page.locator("[data-test=login-button]").click()
await page.locator("[aria-label='Sign in']").click()
tips:该方法适用于不知道怎么定位标签的情况,或者是偷懒不想自己找标签selector时
await page.pause
, 运行到这一行时会暂停网页,可以记录你在网页上的操作(比如点击页面,页面跳转等操作)安装playwright_stealth
直接使用pip
命令行安装
pip install playwright_stealth
playwright_stealth的github地址
使用方法
import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
async def main():
async with async_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = await browser_type.launch()
page = await browser.new_page()
await stealth_async(page)
await page.goto('http://whatsmyuseragent.org/')
await page.screenshot(path=f'example-{browser_type.name}.png')
await browser.close()
asyncio.get_event_loop().run_until_complete(main())
执行结果:
使用playwright的浏览器访问下面的网站
绕过特征值效果:
测试地址