playwright介绍和实践

介绍

参考:
playwright gitlab地址
官方文档

简介

Microsoft 于2020 年 1 月 31 日发布Playwright的第一个公共版本时。

playwright-python,是一款基于python的自动化测试工具,可以通过录制功能自动生成测试脚本;
Playwright是一个强大的Python库,仅用一个API即可自动执行Chromium、Firefox、WebKit(Safari )等主流浏览器自动化操作,并同时支持以无头模式、有头模式运行;

优点

跨浏览器,支持Chrome、Firefox、WebKit;
跨系统,支持Windows、Mac、Linux;
跨语言,支持Python、Java、JS;
可用于移动端。

自动等待元素加载

使用

安装

离线whl:https://pypi.org/project/playwright/#files

playwright介绍和实践_第1张图片
安装playwright库

pip install playwright

安装浏览器驱动文件

playwright install 

上面两个pip操作分别安装:

安装Playwright依赖库,需要Python3.7+
安装Chromium、Firefox、WebKit等浏览器的驱动文件

录制

playwright codegen

or

python -m playwright codegen

在baidu.com搜索,用chromium驱动,将结果保存为my.py的python文件

python -m playwright codegen --target python -o 'my.py' -b chromium https://www.baidu.com

编码

参数

选项:
  -V, --version                          输出版本号
  -b, --browser <browserType>            浏览器类型
  --color-scheme <scheme>                更改主题 取值 "light""dark"
  --device <deviceName>                  模拟设备,例如  "iPhone 11"
  --geolocation <coordinates>            指定地理位置 例如 "37.819722,-122.478611"
  --lang <language>                      指定语言区域 "en-GB"
  --save-storage <filename>          保存浏览器状态到指定文件
  --load-storage <filename>              载入指定文件浏览器状态
  --proxy-server <proxy>                 指定代理服务器 例如 "http://myproxy:3128""socks5://myproxy:8080"
  --timezone <time zone>                 失去设置 例如 "Europe/Rome"
  --timeout <timeout>                    超时设置,单位毫秒 (default: "10000")
  --user-agent <ua string>               指定UA
  --viewport-size <size>                 指定浏览器像素 "1280, 720"
命令:
  open [url]                             打开URL或用-b, --browser指定浏览器
  cr [url]                               打开URL用Chromium
  ff [url]                               打开URL用Firefox
  wk [url]                               打开URL用WebKit
  codegen [options] [url]                打开页面生成代码
  screenshot [options] <url> <filename>  页面截图
  pdf [options] <url> <filename>         保存页面为pdf
  install                                确保安装必要的浏览器驱动
  help [command]                         帮助

异步

参考:https://www.cnblogs.com/yoyo1216/p/14228858.html

import time

from playwright.async_api import async_playwright
from playwright.sync_api import sync_playwright
from playwright_stealth   import stealth
async def main():
    async with async_playwright() as playwright:
        browser = await playwright.firefox.launch_persistent_context(
            user_data_dir='./headless_data',
            headless=False,
            args=[
                  "--disable-infobars",
                  "-enable-automation"
                 ]
        )
        index_page = await browser.new_page()
        await index_page.goto(index_url, wait_until= "networkidle")
        page_text = await index_page.content()
        print(page_text)

asyncio.get_event_loop().run_until_complete(main( ))



同步

import time

from playwright.async_api import async_playwright
from playwright.sync_api import sync_playwright
from playwright_stealth   import stealth

def run(playwright):
    browser = playwright.firefox.launch(headless=False)
    # Open new page
    page = browser.new_page()
    page.goto(url, wait_until= "networkidle" )
    print(page.content())
    page.close()
    browser.close()

with sync_playwright() as playwright:
    run(playwright)

常见错误

点击按钮新页面打开,定位不到,定位方法:

async with context.expect_page() as page_info:
    await list_drug_page.click(detail_bnt_selector),
detail_page = await page_info.value
await detail_page.wait_for_load_state("networkidle")

你可能感兴趣的:(爬虫,python,playwright)