【PlayWright教程(二)】核心概念

PlayWright的核心概念包括:

  • Browser
  • Browser contexts
  • Pages and frames
  • Selectors
  • Auto-waiting
  • Execution contexts: Playwright and Browser
  • Evaluation Argument

1. Browser

一个Browser是一个Chromium, Firefox 或 WebKit(plarywright支持的三种浏览器)的实例plarywright脚本通常以启动浏览器实例开始,以关闭浏览器结束。浏览器实例可以在headless(没有 GUI)或head模式下启动。Browser实例创建:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    browser.close()

启动browser实例是比较耗费资源的,plarywright做的就是如何通过一个browser实例最大化多个BrowserContext的性能。

API:

  • Browser

2.BrowserContext

一个BrowserContex就像是一个独立的匿名模式会话(session),非常轻量,但是又完全隔离。

(译者注:每个browser实例可有多个BrowserContex,且完全隔离。比如可以在两个BrowserContext中登录两个不同的账号,也可以在两个 context 中使用不同的代理。 )

context创建:

browser = playwright.chromium.launch()
context = browser.new_context()

context还可用于模拟涉及移动设备、权限、区域设置和配色方案的多页面场景,如移动端context创建:
 

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    iphone_11 = p.devices['iPhone 11 Pro']
    browser = p.webkit.launch(headless=False)
    context = browser.new_context(
        **iphone_11,
        locale='de-DE',
        geolocation={ 'longitude': 12.492507, 'latitude': 41.889938 },
        permissions=['geolocation']
    )
    browser.close()

API:

  • BrowserContext
  • browser.new_context(**kwargs)

3. Page 和 Frame

 一个BrowserContext可以有多个page,每个page代表一个tab或者一个弹窗。page用于导航到URL并与page内的内容交互。

创建page:

page = context.new_page()

# Navigate explicitly, similar to entering a URL in the browser.
page.goto('http://example.com')
# Fill an input.
page.fill('#search', 'query')

# Navigate implicitly by clicking a link.
page.click('#submit')
# Expect a new url.
print(page.url)

# Page can navigate from the script - this will be picked up by Playwright.
# window.location.href = 'https://example.com'

 一个page可以有多个frame对象,但只有一个主frame,所有page-level的操作(比如click),都是作用在主frame上的。page的其他frame会打上iframe HTML标签,这些frame可以在内部操作实现访问。

# 通过name属性获取frame
frame = page.frame('frame-login')

# 通过URL获取frame
frame = page.frame(url=r'.*domain.*')

# 通过其他选择器(selector)获取frame
frame_element_handle = page.query_selector('.frame-class')
frame = frame_element_handle.content_frame()

# 与frame交互
frame.fill('#username-input', 'John')

在录制模式下,会自动识别是否是frame内的操作,不好定位frame时,那么可以使用录制模式来找。

API:

  • Page
  • Frame
  • page.frame(**kwargs)

4. Selector

 playwright可以通过 CSS selector, XPath selector, HTML 属性(比如 iddata-test-id)或者是文本内容定位元素。

除了xpath selector外,所有selector默认都是指向shadow DOM,如果要指向常规DOM,可使用*:light。不过通常不需要。

# Using data-test-id= selector engine
page.click('data-test-id=foo')

# CSS and XPath selector engines are automatically detected
page.click('div')
page.click('//html/body/div')

# Find node by text substring
page.click('text=Hello w')

# Explicit CSS and XPath notation
page.click('css=div')
page.click('xpath=//html/body/div')

# Only search light DOM, outside WebComponent shadow DOM:
page.click('css:light=div')

# 不同的selector可组合使用,用 >>连接
# Click an element with text 'Sign Up' inside of a #free-month-promo.
page.click('#free-month-promo >> text=Sign Up')

# Capture textContent of a section that contains an element with text 'Selectors'.
section_text = page.eval_on_selector('*css=section >> text=Selectors', 'e => e.textContent')

详细:

Element selectors | Playwright Python

5.  Auto-waiting

playwright在执行操作之前对元素执行一系列可操作性检查,以确保这些行动按预期运行。它会自动等待(auto-wait)所有相关检查通过,然后才执行请求的操作。如果所需的检查未在给定的范围内通过timeout,则操作将失败并显示TimeoutError 

如 page.click(selector, **kwargs) 和 page.fill(selector, value, **kwargs) 这样的操作会执行auto-wait ,等待元素变成可见(visible)和 可操作( actionable)。例如,click将会:

  • 等待selectorx选定元素出现在 DOM 中
  • 待它变得可见(visible):有非空的边界框且没有 visibility:hidden
  • 等待它停止移动:例如,等待 css 过渡(css transition)完成
  • 将元素滚动到视图中
  • 等待它在动作点接收点事件:例如,等待元素不被其他元素遮挡
  • 如果在上述任何检查期间元素被分离,则重试
# Playwright waits for #search element to be in the DOM
page.fill('#search', 'query')

# Playwright waits for element to stop animating
# and accept clicks.
page.click('#search')

#也可显示执行等待动作

# Wait for #search to appear in the DOM.
page.wait_for_selector('#search', state='attached')
# Wait for #promo to become visible, for example with `visibility:visible`.
page.wait_for_selector('#promo')

# Wait for #details to become hidden, for example with `display:none`.
page.wait_for_selector('#details', state='hidden')
# Wait for #promo to be removed from the DOM.
page.wait_for_selector('#promo', state='detached')

 

 API:

  • page.click(selector, **kwargs)
  • page.fill(selector, value, **kwargs)
  • page.wait_for_selector(selector, **kwargs)

6. Execution context

API page.evaluate(expression, **kwargs) 可以用来运行web页面中的 JavaScript函数,并将结果返回到plarywright环境中。浏览器的全局变量,如 window 和 document, 可用于 evaluate。

href = page.evaluate('() => document.location.href')

# if the result is a Promise or if the function is asynchronous evaluate will automatically wait until it's resolved

status = page.evaluate("""async () => {
  response = fetch(location.href)
  return response.status
}""")

 

7. Evaluation Argument

  page.evaluate(expression, **kwargs) 方法接收单个可选参数。此参数可以是Serializable值和JSHandle或ElementHandle实例的混合。句柄会自动转换为它们所代表的值

result = page.evaluate("([x, y]) => Promise.resolve(x * y)", [7, 8])
print(result) # prints "56"


print(page.evaluate("1 + 2")) # prints "3"
x = 10
print(page.evaluate(f"1 + {x}")) # prints "11"


body_handle = page.query_selector("body")
html = page.evaluate("([body, suffix]) => body.innerHTML + suffix", [body_handle, "hello"])
body_handle.dispose()


# A primitive value.
page.evaluate('num => num', 42)

# An array.
page.evaluate('array => array.length', [1, 2, 3])

# An object.
page.evaluate('object => object.foo', { 'foo': 'bar' })

# A single handle.
button = page.query_selector('button')
page.evaluate('button => button.textContent', button)

# Alternative notation using elementHandle.evaluate.
button.evaluate('(button, from) => button.textContent.substring(from)', 5)

# Object with multiple handles.
button1 = page.query_selector('.button1')
button2 = page.query_selector('.button2')
page.evaluate("""o => o.button1.textContent + o.button2.textContent""",
    { 'button1': button1, 'button2': button2 })

# Object destructuring works. Note that property names must match
# between the destructured object and the argument.
# Also note the required parenthesis.
page.evaluate("""
    ({ button1, button2 }) => button1.textContent + button2.textContent""",
    { 'button1': button1, 'button2': button2 })

# Array works as well. Arbitrary names can be used for destructuring.
# Note the required parenthesis.
page.evaluate("""
    ([b1, b2]) => b1.textContent + b2.textContent""",
    [button1, button2])

# Any non-cyclic mix of serializables and handles works.
page.evaluate("""
    x => x.button1.textContent + x.list[0].textContent + String(x.foo)""",
    { 'button1': button1, 'list': [button2], 'foo': None })

 

其他参考:

Playwright(python)微软浏览器自动化教程(二)_weixin_44043378的博客-CSDN博客

你可能感兴趣的:(Web,html5,html)