selenium

1.select 下拉框的选取

http://www.cnblogs.com/fnng/p/5361443.html

2.css元素定位

//相对定位不准确
http://blog.sina.com.cn/s/blog_7424a02601014lnn.html
//全部css语法
http://www.w3school.com.cn/cssref/css_selectors.asp

3.innerHTML和outerHTML

两者区别

 
 test1 test2 


test.innerHTML: test标签内的所有全部内容(包涵标签,和标签的内容)
输出: test1 test2 
test.outerHTML: test标签开始位置到结束位置的全部内容(包涵标签,和标签的内容))
 //除了包含innerHTML的全部内容外, 还包含对象标签本身。
输出:test1 test2
ps,还有相应的innerText、outerText,但是是在js中selenium中暂未发现

selenium 中使用

webelement.get_attribute('innerHTML')

4.显示等待和隐式等待

http://www.jb51.net/article/92684.htm

5.错误集合

1.调用代码

action = ActionChains(self.driver)
        action.drag_and_drop_by_offset(dragger, x_offset, y_offset).perform()

错误提示

selenium.common.exceptions.WebDriverException:
Message: POST /session/32bb8e24-dac2-de46-bde3-e6b15c023b01/moveto did not match a known command

解决方法:

firefox 46.0以上会有兼容问题
https://www.zhihu.com/question/53635288
http://stackoverflow.com/questions/40360223/webdriverexception-moveto-did-not-match-a-known-command

so, 更换驱动,

6.frame/iframe

当页面元素定位不到时(css,xpath书写无误), 要判断该元素是否在frame中或iframe中
frame(iframe)相当于在页面中嵌套了一个网页,然而,driver一次只可以操纵一个网页.所以无法定位到frame中的元素.
这事可以使用selenium中的 switch_to_frame 方法
如有多个frame嵌套需要多次switch_to_frame
返回上一个frame的方法

driver.switch_to.frame('frame3')
//如果上一层是主文档则这个方法无效

切回主文档的方法

driver.switch_to.default_content()

7.alert/confirm/prompt 的处理

webdriver 中处理 JavaScript 所生成的 alert、confirm 以及 prompt 是很简单的。具体思路是使用 switch_to.alert()方法定位到 alert/confirm/prompt。然后使用 text/accept/dismiss/send_keys 按需进行操做。

 text      返回 alert/confirm/prompt 中的文字信息。
 accept 点击确认按钮。
 dismiss       点击取消按钮 ,如果有的话。
 send_keys   输入值,这个alert\confirm没有对话框就不能用了,不然会报错。

#获取网页上的警告信息 
alert=driver.switch_to_alert()
#取消对话框(如果有的话)
alert = driver.switch_to_alert() alert.dismiss()

#输入值(如果有的话)
alert = driver.switch_to_alert() 
alert.send_keys(“xxx”)

上传文件

定位都输入框后, drive直接调用send-key发送文件路径即可.

下载

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", os.getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/octet-stream")
browser = webdriver.Firefox(firefox_profile=fp) browser.get("http://pypi.python.org/pypi/selenium") browser.find_element_by_partial_link_text("selenium-2").click()

cookie处理

get_cookies() 获得所有cookie信息

get_cookie(name) 返回特定name有cookie信息

add_cookie(cookie_dict) 添加cookie,必须有name和value值
delete_cookie(name) 删除特定(部分)的 cookie 信息
delete_all_cookies() 删除所有cookie信息

8.网页加载时长(或禁止js加载)设置

写爬虫的过程中,页面等待基本上浪费的很多时间,特别是当你遇到页面元素布局都明明已经加载完毕,浏览器却提示正在加载js (如:js.tongji.linezing.com,量子统计改服务早已停止服务,但是有的网站并未及时更新)的时候,你一定想把这个js加载干掉.
方法如下:

8.1网页加载时长加载设置

from selenium.common.exceptions import TimeoutException
from selenium import web driver

    driver = webdriver.Firefox(executable_path='/Users/Documents/python/driver/geckodriver')
    #设置selenium的页面加载超时时间
    driver.set_page_load_timeout(time_to_wait=15)

    try:
        driver.get('http://www.chinanyjs.com/channel/20925543')
    except Exception ,e:
        if isinstance(e,TimeoutException):
            #停止selenium的记载
            driver.execute_script('window.stop()')
    finally:
        pass

    #进行页面页面元素解析
    ...

set_page_load_timeout方法来设置页面加载时长,如果超出这个时长driver会主动抛出异常.你就可以在捕获的语句中(即上文except Exception ,e:) 停止selenium的加载

8.2, 禁止加载js(慎用)

设置方法同上(8.1), 有的页面是需要js来加载数据的,所以这个方法是需要谨慎使用(一般使用8.1)

9.多窗口切换

        driver = self.driver
        driver.get(self.url)
        now_handle = driver.current_window_handle #获取当前窗口句柄
        print now_handle   #输出当前获取的窗口句柄
        driver.find_element_by_id("kw1").send_keys("selenium")
        driver.find_element_by_id("su1").click()
        driver.find_element_by_xpath("//*[@id='1']/h3/a[1]").click()
        time.sleep(2)
        all_handles = driver.window_handles #获取所有窗口句柄

        for handle in all_handles:
            if handle != now_handle:
                print handle    #输出待选择的窗口句柄
                driver.switch_to_window(handle)
                driver.find_element_by_xpath("//*[@id='menu_projects']/a").click()
                time.sleep(5)
                driver.close() #关闭当前窗口
        time.sleep(3)
        print now_handle   #输出主窗口句柄
        driver.switch_to_window(now_handle) #返回主窗口

多窗口切换主要还是使用driver.window_handles 进行循环便利,也就是当你有多个窗口时,需要自己维护window_handles的对应列表,方便自己切换窗口(一般情况下,不会有太多的窗口共存)