一、安装
Pip
pip install playwright
Conda
下载Playwright软件包,并为Chromium、Firefox和WebKit安装浏览器二进制文件。
安装命令:
python -m playwright install
二、使用
安装后,您可以在Python脚本中导入Playwright,并启动三种浏览器(chromium、firefox和webkit)中的任意一种。
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("http://playwright.dev")
print(page.title())
browser.close()
Playwright支持API的两种形式:同步和异步。如果项目使用asyncio(https://docs.python.org/3/library/asyncio.html),则应使用async API:
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto("http://playwright.dev")
print(await page.title())
await browser.close()
asyncio.run(main())
三、第一个脚本
在我们的第一个脚本中,我们将使用WebKit方式跳转到whatsmyuseragent.org,然后截图。
示例:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.webkit.launch()
page = browser.new_page()
page.goto("http://whatsmyuseragent.org/")
page.screenshot(path="example.png")
browser.close()
默认情况下,以无头模式运行浏览器。要查看浏览器UI,请在启动浏览器时传递headless=False标志,你也可以使用slow_mo来降低执行速度。在调试工具部分了解更多信息。
firefox.launch(headless=False, slow_mo=50)
四、录制脚本
命令行工具可用于记录用户交互和生成Python代码。
python -m playwright codegen --target python -o open_baidu.py -b chromium https://www.baidu.com
在当前目录下生成python语言的代码,保存成open_baidu.py
五、交互模式
>>> from playwright.sync_api import sync_playwright
>>> playwright = sync_playwright().start()
# Use playwright.chromium, playwright.firefox or playwright.webkit
# Pass headless=False to launch() to see the browser UI
>>> browser = playwright.chromium.launch()
>>> page = browser.new_page()
>>> page.goto("http://whatsmyuseragent.org/")
>>> page.screenshot(path="example.png")
>>> browser.close()
>>> playwright.stop()
六、Pyinstaller
您可以使用Playwright和Pyinstaller来创建独立的可执行文件。
# main.py
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("http://whatsmyuseragent.org/")
page.screenshot(path="example.png")
browser.close()
如果要将浏览器与可执行文件捆绑在一起:
bash:
PLAYWRIGHT_BROWSERS_PATH=0 playwright install chromium
pyinstaller -F main.py
PowerShell:
$env:PLAYWRIGHT_BROWSERS_PATH="0"
playwright install chromium
pyinstaller -F main.py
注意:
将浏览器与可执行文件捆绑在一起将生成更大的二进制文件。建议只捆绑您使用的浏览器。
七、已知问题
1.time.sleep() 导致的过时的问题
你应该使用page.wait_for_timeout(5000) ,而不是time.sleep(5),最好不要等待超时,但有时这对调试很有用。在这些情况下,使用我们的等待方法,而不是time模块。这是因为我们在内部依赖于异步操作和使用time.sleep(5)他们不能得到正确的处理。
2.与Windows上asyncio的SelectorEventLoop不兼容
Playwright在子进程中运行驱动程序,因此它需要Windows上的ProactorEventLoop的asyncio,因为SelectorEventLoop不支持异步子进程。
在Windows Python 3.7上,Playwright将默认事件循环设置为ProactorEventLoop,因为它是基于Python 3.8+
3.多线程threading
Playwright的API不是 thread-safe。如果在多线程环境中使用Playwright,则应该为每个线程创建一个Playwright实例。有关更多详细信息,请参阅线程问题:https://github.com/microsoft/playwright-python/issues/623。
官方原文档:https://playwright.dev/python/docs/inspector#stepping-through-the-playwright-script