python爬虫高阶:无头浏览器的使用

1、phantomjs+selenium

示例代码

def phantomjs_url_test(url='http://gaia.imilive.cn/share.html?uid=0&videoid=116682377418697098&cc=TG45624'):
    dcap = dict(DesiredCapabilities.PHANTOMJS)
    dcap["phantomjs.page.settings.userAgent"] = (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"
    )
    # dcap["phantomjs.page.settings.loadImages"] = False
    driver = webdriver.PhantomJS(desired_capabilities=dcap, executable_path='/Users/tv365/phantomjs-2.1.1-macosx/bin/phantomjs')
    driver.get(url)
    video_url = driver.find_element_by_xpath("//video/@src")
    driver.quit()
    return video_url

mac版本phantomjs下载地址(linux服务器同样适用):

http://phantomjs.org/download.html

解压完成后,配置phantomjs的路径即可,示例:

 

2、google无头模式+selenium

google_driver下载(linux&mac)

http://chromedriver.storage.googleapis.com/index.html

浏览器版本:chrome 70.0.3538.77 驱动版本:linux243,mac243

服务器安装谷歌浏览器 服务器安装谷歌浏览器:

https://segmentfault.com/a/1190000007705458

代码示例:

def google_driver(url):
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-gpu')
    client = webdriver.Chrome(chrome_options=chrome_options, executable_path='/soft/chromedriver')
    # executable_path谷歌driver的路径
    client.get(url)
    content = client.page_source
    print(content)
    client.quit()
    pass

google_driver('https://www.taobao.com/')

 

3、firefox无头模式+selenium

 

4.关于selenium的一些进度条滚动等操作(实质上是直接执行js)

https://blog.csdn.net/agent_x/article/details/78662860

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(python&python爬虫)