playwright在python版之安装和基本使用(Windows下)

一、安装

Pip
pip install playwright

Conda
下载Playwright软件包,并为Chromium、Firefox和WebKit安装浏览器二进制文件。

安装命令:

python -m playwright install

二、使用

安装后,您可以在Python脚本中导入Playwright,并启动三种浏览器(chromium、firefox和webkit)中的任意一种。

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("http://playwright.dev")
    print(page.title())
    browser.close()

Playwright支持API的两种形式:同步和异步。如果项目使用asyncio(https://docs.python.org/3/library/asyncio.html),则应使用async API:

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("http://playwright.dev")
        print(await page.title())
        await browser.close()

asyncio.run(main())

三、第一个脚本

在我们的第一个脚本中,我们将使用WebKit方式跳转到whatsmyuseragent.org,然后截图。

示例:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.webkit.launch()
    page = browser.new_page()
    page.goto("http://whatsmyuseragent.org/")
    page.screenshot(path="example.png")
    browser.close()

默认情况下,以无头模式运行浏览器。要查看浏览器UI,请在启动浏览器时传递headless=False标志,你也可以使用slow_mo来降低执行速度。在调试工具部分了解更多信息。

firefox.launch(headless=False, slow_mo=50)

四、录制脚本

命令行工具可用于记录用户交互和生成Python代码。

python -m playwright codegen --target python -o open_baidu.py -b chromium https://www.baidu.com

在当前目录下生成python语言的代码,保存成open_baidu.py
五、交互模式

>>> from playwright.sync_api import sync_playwright
>>> playwright = sync_playwright().start()
# Use playwright.chromium, playwright.firefox or playwright.webkit
# Pass headless=False to launch() to see the browser UI
>>> browser = playwright.chromium.launch()
>>> page = browser.new_page()
>>> page.goto("http://whatsmyuseragent.org/")
>>> page.screenshot(path="example.png")
>>> browser.close()
>>> playwright.stop()

六、Pyinstaller

您可以使用Playwright和Pyinstaller来创建独立的可执行文件。

# main.py
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("http://whatsmyuseragent.org/")
    page.screenshot(path="example.png")
    browser.close()

如果要将浏览器与可执行文件捆绑在一起:

bash:

PLAYWRIGHT_BROWSERS_PATH=0 playwright install chromium
pyinstaller -F main.py

PowerShell:

$env:PLAYWRIGHT_BROWSERS_PATH="0"
playwright install chromium
pyinstaller -F main.py

注意:

将浏览器与可执行文件捆绑在一起将生成更大的二进制文件。建议只捆绑您使用的浏览器。

七、已知问题

1.time.sleep() 导致的过时的问题

你应该使用page.wait_for_timeout(5000) ,而不是time.sleep(5),最好不要等待超时,但有时这对调试很有用。在这些情况下,使用我们的等待方法,而不是time模块。这是因为我们在内部依赖于异步操作和使用time.sleep(5)他们不能得到正确的处理。

2.与Windows上asyncio的SelectorEventLoop不兼容

Playwright在子进程中运行驱动程序,因此它需要Windows上的ProactorEventLoop的asyncio,因为SelectorEventLoop不支持异步子进程。

在Windows Python 3.7上,Playwright将默认事件循环设置为ProactorEventLoop,因为它是基于Python 3.8+

3.多线程threading

Playwright的API不是 thread-safe。如果在多线程环境中使用Playwright,则应该为每个线程创建一个Playwright实例。有关更多详细信息,请参阅线程问题:https://github.com/microsoft/playwright-python/issues/623。

官方原文档:https://playwright.dev/python/docs/inspector#stepping-through-the-playwright-script

你可能感兴趣的:(python,windows,开发语言)