selenium爬取数据打开浏览器新的标签页

如何利用webdriver打开多个标签页和链接呢?
经实践,网上流传的传入“ctrl+t的按键事件”方法针对谷歌浏览器并不适用。实践证明以下方式可以正常打开谷歌浏览器新的标签页。

browser.execute_script("window.open('" + url + "');")

完整例子:

import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
'''
爬取巨潮网招股说明书
Author:西兰
Date:2020-6-12
'''
driver_path = r"D:\chromedriver.exe"
options = webdriver.ChromeOptions()
browser = webdriver.Chrome(executable_path=driver_path)
browser.implicitly_wait(1)

url = 'http://www.cninfo.com.cn/new/fulltextSearch?notautosubmit=&keyWord=招股说明书'
browser.get(url)
browser.maximize_window()
wait = WebDriverWait(browser, 3)
for j in range(3, 856):
    if j > 2:
        for _ in range(j):
            browser.find_element_by_xpath(
                '//*[@id="fulltext-search"]/div/div/div[2]/div[4]/div[2]/div/button[2]').click()
    js = "var q=document.documentElement.scrollTop=100"
    browser.execute_script(js)
    tr_list = browser.find_elements_by_css_selector(
        'div.tab-content > div > div > div:nth-child(3) > table > tbody > tr')
    print(len(tr_list))
    tr_len = len(tr_list)
    for i in range(tr_len):
        window1 = browser.current_window_handle
        td_list = tr_list[i].find_elements_by_css_selector('td')
        a_ = td_list[1].find_element_by_css_selector('div > a')
        data_url = a_.get_attribute('href')
        print(data_url)
        browser.execute_script("window.open('" + data_url + "');")
        all_handles = browser.window_handles
        for handle in all_handles:
            if handle != window1:
                browser.switch_to.window(handle)
        browser.find_element_by_css_selector('div.sub-line > a.sub-download').click()
        print("正在下载第{}页{}个文件....".format(j, i + 1))
        time.sleep(3)
        browser.close()
        browser.switch_to.window(window1)

关注“编程ABC”公众号,不定期搞点事情噢~
selenium爬取数据打开浏览器新的标签页_第1张图片

你可能感兴趣的:(python爬虫)