潭州课堂25班:Ph201805201 爬虫基础 第八课 selenium (课堂笔记)

Selenium笔记(1)安装和简单使用


简介

Selenium是一个用于Web应用程序测试的工具。

Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。支持的浏览器包括IE(7, 8, 9, 10, 11),Firefox,Safari,Chrome,Opera等。

这个工具的主要功能包括:测试与浏览器的兼容性——测试你的应用程序看是否能够很好得工作在不同浏览器和操作系统之上。测试系统功能——创建回归测试检验软件功能和用户需求。

而用在爬虫上则是模拟正常用户访问网页并获取数据。


安装

ChromeDriver(浏览器驱动)安装

使用selenium驱动chrome浏览器需要下载chromedriver,而且chromedriver版本需要与chrome的版本对应,版本错误的话则会运行报错。

Chromedriver下载地址:https://chromedriver.storage.googleapis.com/index.html

Chromedriver与Chrome版本映射表:

chromedriver版本 支持的Chrome版本
v2.37 v64-66
v2.36 v63-65
v2.35 v62-64
v2.34 v61-63
v2.33 v60-62
v2.32 v59-61
v2.31 v58-60
v2.30 v58-60
v2.29 v56-58
v2.28 v55-57
v2.27 v54-56
v2.26 v53-55
v2.25 v53-55
v2.24 v52-54
v2.23 v51-53
Mac/Linux

下载完成解压后,将文件移动至/usr/local/bin目录中,则可以正常使用。

Windows

也可将驱动文件许放在脚本文件下

-潭州课堂25班:Ph201805201 爬虫基础 第八课 selenium (课堂笔记)_第1张图片

 

下载完成解压后,将文件移动到一个配置了环境变量的文件夹中,例如你的Python安装文件夹。

Selenium安装

Selenium的安装非常简单,直接pip就可以搞定。

pip install selenium

简单使用

Chrome无界面运行

这是chrome浏览器2017年发布的新特性,需要unix版本的chrome版本高于57,windows版本的chrome版本高于58。

使用selenium无界面运行chrome的代码如下:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# 实例化一个启动参数对象 chrome_options = Options() # 设置浏览器以无界面方式运行 chrome_options.add_argument('--headless') # 官方文档表示这一句在之后的版本会消失,但目前版本需要加上此参数 chrome_options.add_argument('--disable-gpu') # 设置浏览器参数时最好固定好窗口大小,窗口大小不同会在解析网页时出现不同的结果 chrome_options.add_argument('--window-size=1366,768') # 启动浏览器 browser = webdriver.Chrome(chrome_options=chrome_options) 

运行上述代码,则会打开一个无界面chrome浏览器的空白页,去掉headless那一句可以看到效果。

Selenium简单例子

这是一个打开百度首页,在输入框中输入Python,并点击搜索的例子。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.wait import WebDriverWait # 打开一个Chrome浏览器 browser = webdriver.Chrome() # 请求百度首页 browser.get('https://www.baidu.com') # 找到输入框位置 input = WebDriverWait(browser, 10).until( EC.presence_of_element_located((By.XPATH, '//*[@id="kw"]')) ) # 在输入框中输入Python input.send_keys('Python') # 找到输入按钮 button = WebDriverWait(browser, 10).until( EC.element_to_be_clickable( (By.XPATH, '//*[@id="su"]')) ) # 点击一次输入按钮 button.click() browser.quit()





# -*- coding: utf-8 -*-
# 斌彬电脑
# @Time : 2018/9/6 0006 5:08


#  开启谷歌浏览器
from selenium import webdriver
drt = webdriver.Chrome()
drt.get('http://www.baidu.com')

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# 找到搜索框,
input = WebDriverWait(drt, 10).until(EC.presence_of_element_located((By.XPATH,'//input[@id="kw"]')))
input.send_keys('123')
# 找到百度一下按钮
btn = WebDriverWait(drt, 10).until(EC.element_to_be_clickable((By.XPATH,'//*[@id="su"]')))
btn.click()
#关闭浏览器
# drt.quit()

  




Selenium笔记(2)Chrome Webdriver启动选项

Selenium中使用不同的Webdriver可能会有不一样的方法,有些相同的操作会得到不一样的结果,本文主要介绍的是Chrome()的使用方法。

其他Webdriver可以查阅官方文档。

Chrome WebDriver Options

简介

这是一个Chrome的参数对象,在此对象中使用add_argument()方法可以添加启动参数,添加完毕后可以在初始化Webdriver对象时将此Options对象传入,则可以实现以特定参数启动Chrome。

例子

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# 实例化一个启动参数对象 chrome_options = Options() # 添加启动参数 chrome_options.add_argument('--window-size=1366,768') # 将参数对象传入Chrome,则启动了一个设置了窗口大小的Chrome browser = webdriver.Chrome(chrome_options=chrome_options) 

常用的启动参数

启动参数 作用
--user-agent="" 设置请求头的User-Agent
--window-size=1366,768 设置浏览器分辨率
--headless 无界面运行
--start-maximized 最大化运行
--incognito 隐身模式
--disable-javascript 禁用javascript
--disable-infobars 禁用浏览器正在被自动化程序控制的提示

完整启动参数可以到此页面查看:

https://peter.sh/experiments/chromium-command-line-switches/

禁用图片加载

Chrome的禁用图片加载参数设置比较复杂,如下所示:

prefs = {
    'profile.default_content_setting_values' : {
        'images' : 2
    }
}
options.add_experimental_option('prefs',prefs)
禁用浏览器弹窗

使用浏览器时常常会有弹窗弹出,以下选项可以禁止弹窗:

prefs = {  
    'profile.default_content_setting_values' :  {  
        'notifications' : 2  
     }  
}  
options.add_experimental_option('prefs',prefs) 

完整文档

class selenium.webdriver.chrome.options.Options

Bases: object

Method
  • __init__()

  • add_argument(argument)

    Adds an argument to the listArgs:Sets the arguments

  • add_encoded_extension(extension)

    Adds Base64 encoded string with extension data to a list that will be used to extract it to the ChromeDriverArgs:extension: Base64 encoded string with extension data

  • add_experimental_option(name, value)

    Adds an experimental option which is passed to chrome.Args:name: The experimental option name. value: The option value.

  • add_extension(extension)

    Adds the path to the extension to a list that will be used to extract it to the ChromeDriverArgs:extension: path to the *.crx file

  • set_headless(headless=True)

    Sets the headless argumentArgs:headless: boolean value indicating to set the headless option

  • to_capabilities()

    Creates a capabilities with all the options that have been set andreturns a dictionary with everything

Values
  • KEY = 'goog:chromeOptions'

  • arguments

    Returns a list of arguments needed for the browser

  • binary_location

    Returns the location of the binary otherwise an empty string

  • debugger_address

    Returns the address of the remote devtools instance

  • experimental_options

    Returns a dictionary of experimental options for chrome.

  • extensions

    Returns a list of encoded extensions that will be loaded into chrome

  • headless

    Returns whether or not the headless argument is set

Chrome WebDriver对象

简介

这个对象继承自selenium.webdriver.remote.webdriver.WebDriver,这个类会在下一章讲到,Chrome的WebDriver作为子类增添了几个方法。

指定chromedriver.exe的位置

chromedriver.exe一般可以放在环境文件中,但是有时候为了方便部署项目,或者为了容易打包,我们可以将chromedriver.exe放到我们的项目目录中,然后在初始化Chrome Webdriver对象时,传入chromedriver.exe的路径。

如下所示:

from selenium import webdriver
browser = webdriver.Chrome(executable_path='chromedriver.exe')

完整文档

class selenium.webdriver.chrome.webdriver.WebDriver(executable_path='chromedriver', port=0, options=None,service_args=None, desired_capabilities=None, service_log_path=None, chrome_options=None)

Bases: selenium.webdriver.remote.webdriver.WebDriver

Controls the ChromeDriver and allows you to drive the browser.

You will need to download the ChromeDriver executable fromhttp://chromedriver.storage.googleapis.com/index.html

  • __init__(executable_path='chromedriver', port=0, options=None, service_args=None, desired_capabilities=None,service_log_path=None, chrome_options=None)

    Creates a new instance of the chrome driver.

    Starts the service and then creates new instance of chrome driver.

    Args:

    • executable_path - path to the executable. If the default is used it assumes the executable is in the $PATHport

    • port you would like the service to run, if left as 0, a free port will be found.

    • desired_capabilities: Dictionary object with non-browser specific capabilities only, such as “proxy” or “loggingPref”.

    • options: this takes an instance of ChromeOptions

  • create_options()

  • get_network_conditions()

    Gets Chrome network emulation settings.

    Returns:A dict. For example:

    {‘latency’: 4, ‘download_throughput’: 2, ‘upload_throughput’: 2, ‘offline’: False}

  • launch_app(id)

    Launches Chrome app specified by id.

  • quit()

    Closes the browser and shuts down the ChromeDriver executable that is started when starting the ChromeDriver

  • set_network_conditions(**network_conditions)

    Sets Chrome network emulation settings.

    Args:

    • network_conditions: A dict with conditions specification.

    Usage:

    driver.set_network_conditions(offline=False, latency=5, # additional latency (ms)
                                  download_throughput=500 * 1024, # maximal throughput upload_throughput=500 * 1024) # maximal throughput 

Note: ‘throughput’ can be used to set both (for download and upload).

 





Selenium笔记(3)Remote Webdriver

简介

selenium.webdriver.remote.webdriver.WebDriver 这个类其实是所有其他Webdriver的父类,例如Chrome WebdriverFirefox Webdriver都是继承自这个类。这个类中实现了每个Webdriver间相通的方法。

常用操作

  • get(url)

    在当前浏览器会话中访问传入的url地址。

    用法:

    driver.get('https://www.baidu.com')
    
  • close()

    关闭浏览器当前窗口。

  • quit()

    退出webdriver并关闭所有窗口。

  • refresh()

    刷新当前页面。

  • title

    获取当前页的标题。

  • page_source

    获取当前页渲染后的源代码。

  • current_url

    获取当前页面的url。

  • window_handles

    获取当前会话中所有窗口的句柄。

查找元素

Webdriver对象中内置了查找节点元素的方法,使用非常方便。

单个查找

以下是查找单个元素的方法:

方法 作用
find_element_by_xpath() 通过Xpath查找
find_element_by_class_name() 通过class属性查找
find_element_by_css_selector() 通过css选择器查找
find_element_by_id() 通过id查找
find_element_by_link_text() 通过链接文本查找
find_element_by_name() 通过name属性进行查找
find_element_by_partial_link_text() 通过链接文本的部分匹配查找
find_element_by_tag_name() 通过标签名查找

查找后返回的是一个Webelement对象。

多个查找

上面的方法都是将第一个找到的元素进行返回,而将所有匹配的元素进行返回使用的是find_elements_by_*方法。

注:将其中的element加上一个s,则是对应的多个查找方法。

此方法返回的是一个Webelement对象组成的列表。

通过私有方法进行查找

除了以上的多种查找方式,还有两种私有方法find_element()find_elements()可以使用:

例子:

from selenium.webdriver.common.by import By

driver.find_element(By.XPATH, '//button[text()="Some text"]')
driver.find_elements(By.XPATH, '//button')

By这个类是专门用来查找元素时传入的参数,这个类中有以下属性:

ID = "id"
XPATH = "xpath"
LINK_TEXT = "link text"
PARTIAL_LINK_TEXT = "partial link text"
NAME = "name" TAG_NAME = "tag name" CLASS_NAME = "class name" CSS_SELECTOR = "css selector" 

操作Cookie

  • add_cookie(cookie_dict)

    给当前会话添加一个cookie。

    • cookie_dict: 一个字典对象,必须要有"name"和"value"两个键,可选的键有:“path”, “domain”, “secure”, “expiry” 。

    • 用法:

      driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’})
      driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’, ‘path’ : ‘/’})
      driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’, ‘path’ : ‘/’, ‘secure’:True})
      
  • get_cookie(name)

    按name获取单个Cookie,没有则返回None。

  • get_cookies()

    获取所有Cookie,返回的是一组字典。

  • delete_all_cookies()¶

    删除所有Cookies。

  • delete_cookie(name)

    按name删除指定cookie。

获取截屏

  • get_screenshot_as_base64()

    获取当前窗口的截图保存为一个base64编码的字符串。

  • get_screenshot_as_file(filename)

    获取当前窗口的截图保存为一个png格式的图片,filename参数为图片的保存地址,最后应该以.png结尾。如果出现IO错误,则返回False。

    用法:

    driver.get_screenshot_as_file(‘/Screenshots/foo.png’)
    
  • get_screenshot_as_png()

    获取当前窗口的截图保存为一个png格式的二进制字符串。

获取窗口信息

  • get_window_position(windowHandle='current')

    获取当前窗口的x,y坐标。

  • get_window_rect()

    获取当前窗口的x,y坐标和当前窗口的高度和宽度。

  • get_window_size(windowHandle='current')

    获取当前窗口的高度和宽度。

切换

  • switch_to_frame(frame_reference)

    将焦点切换到指定的子框架中

  • switch_to_window(window_name)

    切换窗口

执行JS代码

  • execute_async_script(script, *args)

    在当前的window/frame中异步执行JS代码。

    script:是你要执行的JS代码。

    *args:是你的JS代码执行要传入的参数。

    用法:

    script = “var callback = arguments[arguments.length - 1]; ”
    script2 = “window.setTimeout(function(){ callback(‘timeout’) }, 3000);” 
    driver.execute_async_script(script + script2)
    
  • execute_script(script, *args)

    在当前的window/frame中同步执行JS代码。

    script:是你要执行的JS代码。

    *args:是你的JS代码执行要传入的参数。

完整文档

class selenium.webdriver.remote.webdriver.``WebDriver(command_executor='http://127.0.0.1:4444/wd/hub',desired_capabilities=None, browser_profile=None, proxy=None, keep_alive=False, file_detector=None, options=None)

Bases: object

Controls a browser by sending commands to a remote server. This server is expected to be running the WebDriver wire protocol as defined at

https://github.com/SeleniumHQ/selenium/wiki/JsonWireProtocol 。

  • Attributes:

    • session_id - String ID of the browser session started and controlled by this WebDriver.

    • capabilities - Dictionaty of effective capabilities of this browser session as returned

      by the remote server. See https://github.com/SeleniumHQ/selenium/wiki/DesiredCapabilities

    • command_executor - remote_connection.RemoteConnection object used to execute commands.

    • error_handler - errorhandler.ErrorHandler object used to handle errors.

  • __init__(command_executor='http://127.0.0.1:4444/wd/hub', desired_capabilities=None, browser_profile=None, proxy=None,keep_alive=False, file_detector=None, options=None)

    Create a new driver that will issue commands using the wire protocol.

    Args:

    • command_executor - Either a string representing URL of the remote server or a customremote_connection.RemoteConnection object. Defaults to ‘http://127.0.0.1:4444/wd/hub’.

    • desired_capabilities - A dictionary of capabilities to request whenstarting the browser session. Required parameter.

    • browser_profile - A selenium.webdriver.firefox.firefox_profile.FirefoxProfile object.Only used if Firefox is requested. Optional.

    • proxy - A selenium.webdriver.common.proxy.Proxy object. The browser session willbe started with given proxy settings, if possible. Optional.

    • keep_alive - Whether to configure remote_connection.RemoteConnection to useHTTP keep-alive. Defaults to False.

    • file_detector - Pass custom file detector object during instantiation. If None,then default LocalFileDetector() will be used.

    • options - instance of a driver options.Options class

  • add_cookie(cookie_dict)

    Adds a cookie to your current session.

    Args:

    • cookie_dict: A dictionary object, with required keys - “name” and “value”;optional keys - “path”, “domain”, “secure”, “expiry”

    Usage:

    driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’})
    driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’, ‘path’ : ‘/’})
    driver.add_cookie({‘name’ : ‘foo’, ‘value’ : ‘bar’, ‘path’ : ‘/’, ‘secure’:True})
    
  • back()

    Goes one step backward in the browser history.

    Usage:

    driver.back()

  • close()

    Closes the current window.Usage:driver.close()

  • create_web_element(element_id)

    Creates a web element with the specified element_id.

  • delete_all_cookies()

    Delete all cookies in the scope of the session.

    Usage:

    driver.delete_all_cookies()

  • delete_cookie(name)

    Deletes a single cookie with the given name.

    Usage:

    driver.delete_cookie(‘my_cookie’)

  • execute(driver_command, params=None)

    Sends a command to be executed by a command.CommandExecutor.

    Args:

    • driver_command: The name of the command to execute as a string.

    • params: A dictionary of named parameters to send with the command.

    Returns:

    The command’s JSON response loaded into a dictionary object.

  • execute_async_script(script, *args)

    Asynchronously Executes JavaScript in the current window/frame.

    Args:

    • script: The JavaScript to execute.

    • *args: Any applicable arguments for your JavaScript.

    Usage:

    script = “var callback = arguments[arguments.length - 1]; ” “window.setTimeout(function(){ callback(‘timeout’) }, 3000);”
    driver.execute_async_script(script)
    
  • execute_script(script, *args)

    Synchronously Executes JavaScript in the current window/frame.

    Args:

    • script: The JavaScript to execute.

    • *args: Any applicable arguments for your JavaScript.

    Usage:

    driver.execute_script(‘return document.title;’)
    
  • file_detector_context(*args, **kwds)

    Overrides the current file detector (if necessary) in limited context. Ensures the original file detector is set afterwards.

    Example:

    with webdriver.file_detector_context(UselessFileDetector):
        someinput.send_keys(‘/etc/hosts’)
    

    Args:

    • file_detector_class - Class of the desired file detector. If the class is differentfrom the current file_detector, then the class is instantiated with args and kwargs and used as a file detector during the duration of the context manager.

    • args - Optional arguments that get passed to the file detector class duringinstantiation.

    • kwargs - Keyword arguments, passed the same way as args.

  • find_element(by='id', value=None)

    ‘Private’ method used by the find_element_by_* methods.

    Usage:

    Use the corresponding find_element_by_* instead of this.

    Return type:

    WebElement

  • forward()

    Goes one step forward in the browser history.

    Usage:

    driver.forward()

  • fullscreen_window()

    Invokes the window manager-specific ‘full screen’ operation

  • get(url)

    Loads a web page in the current browser session.

  • get_cookie(name)

    Get a single cookie by name. Returns the cookie if found, None if not.

    Usage:

    driver.get_cookie(‘my_cookie’)

  • get_cookies()

    Returns a set of dictionaries, corresponding to cookies visible in the current session.

    Usage:

    driver.get_cookies()

  • get_log(log_type)

    Gets the log for a given log type

    Args:

    • log_type: type of log that which will be returned

    Usage:

    driver.get_log(‘browser’) driver.get_log(‘driver’) driver.get_log(‘client’) driver.get_log(‘server’)

  • get_screenshot_as_base64()

    Gets the screenshot of the current window as a base64 encoded stringwhich is useful in embedded images in HTML.

    Usage:

    driver.get_screenshot_as_base64()

  • get_screenshot_as_file(filename)

    Saves a screenshot of the current window to a PNG image file. ReturnsFalse if there is any IOError, else returns True. Use full paths in your filename.

    Args:

    • filename: The full path you wish to save your screenshot to. This should end with a .png extension.

    Usage:

    driver.get_screenshot_as_file(‘/Screenshots/foo.png’)

  • get_screenshot_as_png()

    Gets the screenshot of the current window as a binary data.

    Usage:

    driver.get_screenshot_as_png()

  • get_window_position(windowHandle='current')

    Gets the x,y position of the current window.

    Usage:

    driver.get_window_position()

  • get_window_rect()

    Gets the x, y coordinates of the window as well as height and width of the current window.

    Usage:

    driver.get_window_rect()

  • get_window_size(windowHandle='current')

    Gets the width and height of the current window.

    Usage:

    driver.get_window_size()

  • implicitly_wait(time_to_wait)

    Sets a sticky timeout to implicitly wait for an element to be found,or a command to complete. This method only needs to be called one time per session. To set the timeout for calls to execute_async_script, see set_script_timeout.

    Args:

    • time_to_wait: Amount of time to wait (in seconds)

    Usage:

    driver.implicitly_wait(30)

  • maximize_window()

    Maximizes the current window that webdriver is using

  • minimize_window()

    Invokes the window manager-specific ‘minimize’ operation

  • quit()

    Quits the driver and closes every associated window.

    Usage:

    driver.quit()

  • refresh()

    Refreshes the current page.

    Usage:

    driver.refresh()

  • save_screenshot(filename)

    Saves a screenshot of the current window to a PNG image file. ReturnsFalse if there is any IOError, else returns True. Use full paths in your filename.

    Args:

    • filename: The full path you wish to save your screenshot to. This should end with a .png extension.

    Usage:

    driver.save_screenshot(‘/Screenshots/foo.png’)

  • set_page_load_timeout(time_to_wait)

    Set the amount of time to wait for a page load to completebefore throwing an error.

    Args:

    • time_to_wait: The amount of time to wait

    Usage:

    driver.set_page_load_timeout(30)

  • set_script_timeout(time_to_wait)

    Set the amount of time that the script should wait during anexecute_async_script call before throwing an error.

    Args:

    • time_to_wait: The amount of time to wait (in seconds)

    Usage:

    driver.set_script_timeout(30)

  • set_window_position(x, y, windowHandle='current')

    Sets the x,y position of the current window. (window.moveTo)

    Args:

    • x: the x-coordinate in pixels to set the window position

    • y: the y-coordinate in pixels to set the window position

    Usage:

    driver.set_window_position(0,0)

  • set_window_rect(x=None, y=None, width=None, height=None)

    Sets the x, y coordinates of the window as well as height and width of the current window.

    Usage:

    driver.set_window_rect(x=10, y=10) driver.set_window_rect(width=100, height=200) driver.set_window_rect(x=10, y=10, width=100, height=200)

  • set_window_size(width, height, windowHandle='current')

    Sets the width and height of the current window. (window.resizeTo)

    Args:

    • width: the width in pixels to set the window to

    • height: the height in pixels to set the window to

    Usage:

    driver.set_window_size(800,600)

  • start_client()

    Called before starting a new session. This method may be overridden to define custom startup behavior.

  • start_session(capabilities, browser_profile=None)

    Creates a new session with the desired capabilities.

    Args:

    • browser_name - The name of the browser to request.

    • version - Which browser version to request.platform - Which platform to request the browser on.

    • javascript_enabled - Whether the new session should support JavaScript.

    • browser_profile - A selenium.webdriver.firefox.firefox_profile.FirefoxProfile object. Only used if Firefox is requested.

  • stop_client()

    Called after executing a quit command. This method may be overridden to define custom shutdown behavior.

  • switch_to_active_element()

    Deprecated use driver.switch_to.active_element

  • switch_to_alert()

    Deprecated use driver.switch_to.alert

  • switch_to_default_content()

    Deprecated use driver.switch_to.default_content

  • switch_to_frame(frame_reference)

    Deprecated use driver.switch_to.frame

  • switch_to_window(window_name)

    Deprecated use driver.switch_to.window

  • application_cache

    Returns a ApplicationCache Object to interact with the browser app cache

  • current_url

    Gets the URL of the current page.

    Usage:

    driver.current_url

  • current_window_handle

    Returns the handle of the current window.

    Usage:

    driver.current_window_handle

  • desired_capabilities

    returns the drivers current desired capabilities being used

  • file_detector

  • log_types

    Gets a list of the available log types

    Usage:

    driver.log_types

  • mobile

  • name

    Returns the name of the underlying browser for this instance.

    Usage:

    name = driver.name

  • orientation

    Gets the current orientation of the device

    Usage:

    orientation = driver.orientation

  • page_source

    Gets the source of the current page.

    Usage:

    driver.page_source

  • switch_to

    Returns:

    • SwitchTo: an object containing all options to switch focus into

    Usage:

    element = driver.switch_to.active_element alert = driver.switch_to.alert driver.switch_to.default_content() driver.switch_to.frame(‘frame_name’) driver.switch_to.frame(1) driver.switch_to.frame(driver.find_elements_by_tag_name(“iframe”)[0]) driver.switch_to.parent_frame() driver.switch_to.window(‘main’)

  • title

    Returns the title of the current page.

    Usage:

    title = driver.title

  • window_handles

    Returns the handles of all windows within the current session.

    Usage:

    driver.window_handles








Selenium笔记(4)Webelement

 

这是通过find方法找到的页面元素,此对象提供了多种方法,让我们可以与页面元素进行交互,例如点击、清空。

 

方法

  1. clear()清空

    如果当前元素中有文本,则清空文本

  2. click()单击

    点击当前元素

  3. get_attribute(name)获取属性

    获取元素的attribute/property

    优先返回完全匹配属性名的值,如果不存在,则返回属性名中包含name的值。

  4. screenshot(filename) 获取截图

    获取当前元素的截图,保存为png,最好用绝对路径,(谷歌上用不了,火狐可以)。

  5. send_keys(value) 模拟键入元素

    给当前元素模拟输入

    webelement的此方法在Chrome中应该是有bug,无法使用。

  6. submit()提交表单

    提交表单

 

在页面元素中,同样提供find_elements_by_*等查找方法,可以将查找范围限制到当前元素。

 

属性

  1. text

    获取当前元素的文本内容

  2. tag_name

    获取当前元素的标签名

  3. size

    获取当前元素的大小

  4. screenshot_as_png

    将当前元素截屏并保存为png格式的二进制数据

  5. screenshot_as_base64

    将当前元素截屏并保存为base64编码的字符串

  6. rect

    获取一个包含当前元素大小和位置的字典

  7. parent

    获取当前元素的父节点

  8. location

    当前元素的位置

  9. id

    当前元素的id值,主要用来selenium内部使用,可以用来判断两个元素是否是同一个元素

 

Keys

 

我们经常需要模拟键盘的输入,当输入普通的值时,在send_keys()方法中传入要输入的字符串就好了。

 

但是我们有时候会用到一些特殊的按键,这时候就需要用到我们的Keys类。

 

简例

from selenium.webdriver.common.keys import Keys

elem.send_keys(Keys.CONTROL, 'c')
 

属性

 

这个Keys类有很多属性,每个属性对应一个按键。所有的属性如下所示:

ADD = u'\ue025'
ALT = u'\ue00a'
ARROW_DOWN = u'\ue015'
ARROW_LEFT = u'\ue012'
ARROW_RIGHT = u'\ue014' ARROW_UP = u'\ue013' BACKSPACE = u'\ue003' BACK_SPACE = u'\ue003' CANCEL = u'\ue001' CLEAR = u'\ue005' COMMAND = u'\ue03d' CONTROL = u'\ue009' DECIMAL = u'\ue028' DELETE = u'\ue017' DIVIDE = u'\ue029' DOWN = u'\ue015' END = u'\ue010' ENTER = u'\ue007' EQUALS = u'\ue019' ESCAPE = u'\ue00c' F1 = u'\ue031' F10 = u'\ue03a' F11 = u'\ue03b' F12 = u'\ue03c' F2 = u'\ue032' F3 = u'\ue033' F4 = u'\ue034' F5 = u'\ue035' F6 = u'\ue036' F7 = u'\ue037' F8 = u'\ue038' F9 = u'\ue039' HELP = u'\ue002' HOME = u'\ue011' INSERT = u'\ue016' LEFT = u'\ue012' LEFT_ALT = u'\ue00a' LEFT_CONTROL = u'\ue009' LEFT_SHIFT = u'\ue008' META = u'\ue03d' MULTIPLY = u'\ue024' NULL = u'\ue000' NUMPAD0 = u'\ue01a' NUMPAD1 = u'\ue01b' NUMPAD2 = u'\ue01c' NUMPAD3 = u'\ue01d' NUMPAD4 = u'\ue01e' NUMPAD5 = u'\ue01f' NUMPAD6 = u'\ue020' NUMPAD7 = u'\ue021' NUMPAD8 = u'\ue022' NUMPAD9 = u'\ue023' PAGE_DOWN = u'\ue00f' PAGE_UP = u'\ue00e' PAUSE = u'\ue00b' RETURN = u'\ue006' RIGHT = u'\ue014' SEMICOLON = u'\ue018' SEPARATOR = u'\ue026' SHIFT = u'\ue008' SPACE = u'\ue00d' SUBTRACT = u'\ue027' TAB = u'\ue004' UP = u'\ue013'








Selenium笔记(5)动作链

 

简介

 

一般来说我们与页面的交互可以使用Webelement的方法来进行点击等操作。但是,有时候我们需要一些更复杂的动作,类似于拖动,双击,长按等等。

 

这时候就需要用到我们的Action Chains(动作链)了。

 

简例

from selenium.webdriver import ActionChains

element = driver.find_element_by_name("source")
target = driver.find_element_by_name("target")

actions = ActionChains(driver)
actions.drag_and_drop(element, target)
actions.perform()
 

在导入动作链模块以后,需要声明一个动作链对象,在声明时将webdriver当作参数传入,并将对象赋值给一个actions变量。

 

然后我们通过这个actions变量,调用其内部附带的各种动作方法进行操作。

 

注:在调用各种动作方法后,这些方法并不会马上执行,而是会按你代码的顺序存储在ActionChains对象的队列中。当你调用perform()时,这些动作才会依次开始执行。

 

常用动作方法

  • click(on_element=None)

    左键单击传入的元素,如果不传入的话,点击鼠标当前位置。

  • context_click(on_element=None)

    右键单击。

  • double_click(on_element=None)

    双击。

  • click_and_hold(on_element=None)

    点击并抓起

  • drag_and_drop(sourcetarget)

    在source元素上点击抓起,移动到target元素上松开放下。

  • drag_and_drop_by_offset(sourcexoffsetyoffset)

    在source元素上点击抓起,移动到相对于source元素偏移xoffset和yoffset的坐标位置放下。

  • send_keys(*keys_to_send)

    将键发送到当前聚焦的元素。

  • send_keys_to_element(element, *keys_to_send)

    将键发送到指定的元素。

  • reset_actions()

    清除已经存储的动作。

 

完整文档

 

class selenium.webdriver.common.action_chains.``ActionChains(driver)

 

Bases: object

 

ActionChains are a way to automate low level interactions such as mouse movements, mouse button actions, key press, and context menu interactions. This is useful for doing more complex actions like hover over and drag and drop.

 

Generate user actions.

 

When you call methods for actions on the ActionChains object, the actions are stored in a queue in the ActionChains object. When you call perform(), the events are fired in the order they are queued up.

 

ActionChains can be used in a chain pattern:

menu = driver.find_element_by_css_selector(".nav")
hidden_submenu = driver.find_element_by_css_selector(".nav #submenu1")

ActionChains(driver).move_to_element(menu).click(hidden_submenu).perform()
 

Or actions can be queued up one by one, then performed.:

menu = driver.find_element_by_css_selector(".nav")
hidden_submenu = driver.find_element_by_css_selector(".nav #submenu1")

actions = ActionChains(driver)
actions.move_to_element(menu)
actions.click(hidden_submenu)
actions.perform()
 

Either way, the actions are performed in the order they are called, one after another.

  • __init__(driver)

    Creates a new ActionChains.

    Args:

    • driver: The WebDriver instance which performs user actions.

  • click(on_element=None)

    Clicks an element.

    Args:

    • on_element: The element to click. If None, clicks on current mouse position.

  • click_and_hold(on_element=None)

    Holds down the left mouse button on an element.

    Args:

    • on_element: The element to mouse down. If None, clicks on current mouse position.

  • context_click(on_element=None)

    Performs a context-click (right click) on an element.

    Args:

    • on_element: The element to context-click. If None, clicks on current mouse position.

  • double_click(on_element=None)

    Double-clicks an element.

    Args:

    • on_element: The element to double-click. If None, clicks on current mouse position.

  • drag_and_drop(source, target)

    Holds down the left mouse button on the source element,then moves to the target element and releases the mouse button.

    Args:

    • source: The element to mouse down.

    • target: The element to mouse up.

  • drag_and_drop_by_offset(source, xoffset, yoffset)

    Holds down the left mouse button on the source element,then moves to the target offset and releases the mouse button.

    Args:

    • source: The element to mouse down.

    • xoffset: X offset to move to.

    • yoffset: Y offset to move to.

  • key_down(value, element=None)

    Sends a key press only, without releasing it.Should only be used with modifier keys (Control, Alt and Shift).

    Args:

    • value: The modifier key to send. Values are defined in Keys class.

    • element: The element to send keys. If None, sends a key to current focused element.

    Example, pressing ctrl+c:

    ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform() 
    
  • key_up(value, element=None)

    Releases a modifier key.

    Args:

    • value: The modifier key to send. Values are defined in Keys class.

    • element: The element to send keys. If None, sends a key to current focused element.

    Example, pressing ctrl+c:

    ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
    
  • move_by_offset(xoffset, yoffset)

    Moving the mouse to an offset from current mouse position.

    Args:

    • xoffset: X offset to move to, as a positive or negative integer.

    • yoffset: Y offset to move to, as a positive or negative integer.

  • move_to_element(to_element)

    Moving the mouse to the middle of an element.

    Args:

    • to_element: The WebElement to move to.

  • move_to_element_with_offset(to_element, xoffset, yoffset)

    Move the mouse by an offset of the specified element.Offsets are relative to the top-left corner of the element.

    Args:

    • to_element: The WebElement to move to.

    • xoffset: X offset to move to.

    • yoffset: Y offset to move to.

  • pause(seconds)

    Pause all inputs for the specified duration in seconds

  • perform()

    Performs all stored actions.

  • release(on_element=None)

    Releasing a held mouse button on an element.

    Args:

    • on_element: The element to mouse up. If None, releases on current mouse position.

  • reset_actions()

    Clears actions that are already stored on the remote end.

  • send_keys(*keys_to_send)

    Sends keys to current focused element.

    Args:

    • keys_to_send: The keys to send. Modifier keys constants can be found in the ‘Keys’ class.

  • send_keys_to_element(element, *keys_to_send)

    Sends keys to an element.

    Args:

    • element: The element to send keys.

    • keys_to_send: The keys to send. Modifier keys constants can be found in the ‘Keys’ class.












Selenium笔记(6)等待

 

简介

 

在selenium操作浏览器的过程中,每一次请求url,selenium都会等待页面加载完毕以后,才会将操作权限再次交给我们的程序。

 

但是,由于ajax和各种JS代码的异步加载问题,所以我们在使用selenium的时候常常会遇到操作的元素还没有加载出来,就会引发报错。为了解决这个问题,Selenium提供了几种等待的方法,让我们可以等待元素加载完毕后,再进行操作。

 

显式等待

 

例子

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Chrome() driver.get("http://somedomain/url_that_delays_loading") try: element = WebDriverWait(driver, 10).until( EC.presence_of_element_located((By.ID, "myDynamicElement")) ) finally: driver.quit() 
 

在这个例子中,我们在查找一个元素的时候,不再使用find_element_by_*这样的方式来查找元素,而是使用了WebDriverWait

 

try代码块中的代码的意思是:在抛出元素不存在异常之前,最多等待10秒。在这10秒中,WebDriverWait会默认每500ms运行一次until之中的内容,而until中的EC.presence_of_element_located则是检查元素是否已经被加载,检查的元素则通过By.ID这样的方式来进行查找。

 

就是说,在10秒内,默认每0.5秒检查一次元素是否存在,存在则将元素赋值给element这个变量。如果超过10秒这个元素仍不存在,则抛出超时异常。

 

Expected Conditions

 

Expected Conditions这个类提供了很多种常见的检查条件可以供我们使用。

  • title_is

  • title_contains

  • presence_of_element_located

  • visibility_of_element_located

  • visibility_of

  • presence_of_all_elements_located

  • text_to_be_present_in_element

  • text_to_be_present_in_element_value

  • frame_to_be_available_and_switch_to_it

  • invisibility_of_element_located

  • element_to_be_clickable

  • staleness_of

  • element_to_be_selected

  • element_located_to_be_selected

  • element_selection_state_to_be

  • element_located_selection_state_to_be

  • alert_is_present

 

例子:

from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
# 等待直到元素可以被点击 element = wait.until(EC.element_to_be_clickable((By.ID, 'someid'))) 
 

隐式等待

 

隐式等待指的是,在webdriver中进行find_element这一类查找操作时,如果找不到元素,则会默认的轮询等待一段时间。

 

这个值默认是0,可以通过以下方式进行设置:

from selenium import webdriver

driver = webdriver.Chrome()
driver.implicitly_wait(10) # 单位是秒
driver.get("http://somedomain/url_that_delays_loading") myDynamicElement = driver.find_element_by_id("myDynamicElement")






Selenium笔记(7)异常

 

完整文档

 

Exceptions that may happen in all the webdriver code.

    • exceptionselenium.common.exceptions.``ElementClickInterceptedException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionThe Element Click command could not be completed because the element receiving the events is obscuring the element that was requested clicked.

    • exceptionselenium.common.exceptions.``ElementNotInteractableException(msg=None,screen=None, stacktrace=None

      Bases:selenium.common.exceptions.InvalidElementStateExceptionThrown when an element is present in the DOM but interactions with that element will hit another element do to paint order

    • exceptionselenium.common.exceptions.``ElementNotSelectableException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.InvalidElementStateExceptionThrown when trying to select an unselectable element.For example, selecting a ‘script’ element.

    • exceptionselenium.common.exceptions.``ElementNotVisibleException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.InvalidElementStateExceptionThrown when an element is present on the DOM, but it is not visible, and so is not able to be interacted with.Most commonly encountered when trying to click or read text of an element that is hidden from view.

    • exceptionselenium.common.exceptions.``ErrorInResponseException(response,msg)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when an error has occurred on the server side.This may happen when communicating with the firefox extension or the remote driver server.__init__(response, msg)

    • exceptionselenium.common.exceptions.``ImeActivationFailedException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when activating an IME engine has failed.

    • exceptionselenium.common.exceptions.``ImeNotAvailableException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when IME support is not available. This exception is thrown for every IME-related method call if IME support is not available on the machine.

    • exceptionselenium.common.exceptions.``InsecureCertificateException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionNavigation caused the user agent to hit a certificate warning, which is usually the result of an expired or invalid TLS certificate.

    • exceptionselenium.common.exceptions.``InvalidArgumentException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionThe arguments passed to a command are either invalid or malformed.

    • exceptionselenium.common.exceptions.``InvalidCookieDomainException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when attempting to add a cookie under a different domain than the current URL.

    • exceptionselenium.common.exceptions.``InvalidCoordinatesException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionThe coordinates provided to an interactions operation are invalid.

    • exceptionselenium.common.exceptions.``InvalidElementStateException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverException

    • exceptionselenium.common.exceptions.``InvalidSelectorException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.NoSuchElementExceptionThrown when the selector which is used to find an element does not return a WebElement. Currently this only happens when the selector is an xpath expression and it is either syntactically invalid (i.e. it is not a xpath expression) or the expression does not select WebElements (e.g. “count(//input)”).

    • exceptionselenium.common.exceptions.``InvalidSessionIdException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionOccurs if the given session id is not in the list of active sessions, meaning the session either does not exist or that it’s not active.

    • exceptionselenium.common.exceptions.``InvalidSwitchToTargetException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when frame or window target to be switched doesn’t exist.

    • exceptionselenium.common.exceptions.``JavascriptException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionAn error occurred while executing JavaScript supplied by the user.

    • exceptionselenium.common.exceptions.``MoveTargetOutOfBoundsException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when the target provided to the ActionsChains move() method is invalid, i.e. out of document.

    • exceptionselenium.common.exceptions.``NoAlertPresentException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when switching to no presented alert.This can be caused by calling an operation on the Alert() class when an alert is not yet on the screen.

    • exceptionselenium.common.exceptions.``NoSuchAttributeException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when the attribute of element could not be found.You may want to check if the attribute exists in the particular browser you are testing against. Some browsers may have different property names for the same property. (IE8’s .innerText vs. Firefox .textContent)

    • exceptionselenium.common.exceptions.``NoSuchCookieException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionNo cookie matching the given path name was found amongst the associated cookies of the current browsing context’s active document.

    • exceptionselenium.common.exceptions.``NoSuchElementException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when element could not be found.If you encounter this exception, you may want to check the following:Check your selector used in your find_by…Element may not yet be on the screen at the time of the find operation, (webpage is still loading) see selenium.webdriver.support.wait.WebDriverWait() for how to write a wait wrapper to wait for an element to appear.

    • exceptionselenium.common.exceptions.``NoSuchFrameException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.InvalidSwitchToTargetExceptionThrown when frame target to be switched doesn’t exist.

    • exceptionselenium.common.exceptions.``NoSuchWindowException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.InvalidSwitchToTargetExceptionThrown when window target to be switched doesn’t exist.To find the current set of active window handles, you can get a list of the active window handles in the following way:print driver.window_handles

    • exceptionselenium.common.exceptions.``RemoteDriverServerException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverException

    • exceptionselenium.common.exceptions.``ScreenshotException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionA screen capture was made impossible.

    • exceptionselenium.common.exceptions.``SessionNotCreatedException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionA new session could not be created.

    • exceptionselenium.common.exceptions.``StaleElementReferenceException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when a reference to an element is now “stale”.Stale means the element no longer appears on the DOM of the page.Possible causes of StaleElementReferenceException include, but not limited to:You are no longer on the same page, or the page may have refreshed since the element was located.The element may have been removed and re-added to the screen, since it was located. Such as an element being relocated. This can happen typically with a javascript framework when values are updated and the node is rebuilt.Element may have been inside an iframe or another context which was refreshed.

    • exceptionselenium.common.exceptions.``TimeoutException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when a command does not complete in enough time.

    • exceptionselenium.common.exceptions.``UnableToSetCookieException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when a driver fails to set a cookie.

    • exceptionselenium.common.exceptions.``UnexpectedAlertPresentException(msg=None,screen=None, stacktrace=None, alert_text=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when an unexpected alert is appeared.Usually raised when when an expected modal is blocking webdriver form executing any more commands.__init__(msg=None, screen=None,stacktrace=None, alert_text=None)

    • exceptionselenium.common.exceptions.``UnexpectedTagNameException(msg=None,screen=None, stacktrace=None)

      Bases:selenium.common.exceptions.WebDriverExceptionThrown when a support class did not get an expected web element.

    • exceptionselenium.common.exceptions.``UnknownMethodException(msg=None,screen=None, stacktrace=None)

      Bases: selenium.common.exceptions.WebDriverExceptionThe requested command matched a known URL but did not match an method for that URL.

    • exceptionselenium.common.exceptions.``WebDriverException(msg=None,screen=None, stacktrace=None)

      Bases: exceptions.ExceptionBase webdriver exception.__init__(msg=None, screen=None,stacktrace=None)








Selenium笔记(8)常见的坑

 

用Xpath查找数据时无法直接获取节点属性

 

通常在我们使用xpath时,可以使用@class的方式直接获取节点的属性,如下所示:

page.xpath('//div/a/@class')
 

但在Selenium中不支持这种用法,只能在找到节点后,使用get_attribute(name)方法来获取属性:

page.xpath('//div/a').get_attribute('class')
 

同样的,Selenium同样不支持Xpath中的string()text()这类的方法,只能获取元素节点。

 

使用了WebDriverWait以后仍然无法找到元素

 

有很多时候,一个简单的元素,明明也加了显式等待,但就是找不到,代码在仔细查看过后也没有问题后,多半是以下这几种情况:

  1. 由于分辨率设置的原因,查找的元素当前是不可见的。

  2. 某些页面的元素是需要向下滚动页面才会加载的。

  3. 由于某些其他元素的短暂遮挡,所以无法定位到。

 

1.分辨率原因

 

这时候应该设置好分辨率,使当前元素能够显示到页面中。

 

2.需要滚动页面

 

有些页面为了性能的考虑,页面下方不在当前屏幕中的元素是不会加载的,只有当页面向下滚动时才会继续加载。

 

而selenium本身不提供向下滚动的方法,所以我们需要去用JS去滚动页面:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
 

网上查到的一些滚动方式在Chrome上无效。但这一句是有效的。

 

3.由于其他元素的遮挡

 

有时候因为一些弹出元素的原因,如果还使用EC.presence_of_element_located()的话,我们需要定位的元素就无法被找到,这个时候我们就应该改变我们判断元素的方法:

element = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.XPATH, ''))
)
 

使用EC.visibility_of_element_located()方法可以在等待到当前元素可见后,才获取元素。

 

在我们找不到元素,或者跟元素无法交互时,应该多去根据当前的情况,灵活选择显式等待的判断方式。

 




转载于:https://www.cnblogs.com/gdwz922/p/9596008.html

你可能感兴趣的:(爬虫,javascript,测试,ViewUI)