selenium + chromedriver + BeautifulSoup做简单获取数据

安装对应版本

selenium==2.48.0
beautifulsoup4==4.7.1

pip安装

pip3 install selenium==2.48.0
pip3 install sqlite3, beautifulsoup4,selenium
pip3 install lxml / pip3 install html5lib
Pip3 install PyExecJS 

获取chromedriver

方法一 Mac安装

brew install chromedriver

方法二:
https://npm.taobao.org/mirrors/chromedriver/ 下载地址

找对应的chrome版本-下载chromedriver包,解压后,放入到/usr/local/bin

提升权限:
sudo chmod u+x,o+x  /usr/local/bin/chromedriver

phantomjs安装(和chromedriver二选一)

方法一

使用官网:http://phantomjs.org/download.html

方法二

sudo npm install -g phantomjs-prebuilt

方法三

brew update && brew install phantomjs

selenium使用说明

八种单数形式

1.id定位:find_element_by_id(self, id_)

2.name定位:find_element_by_name(self, name)

3.class定位:find_element_by_class_name(self, name)

4.tag定位:find_element_by_tag_name(self, name)

5.link定位:find_element_by_link_text(self, link_text)

6.partial_link定位find_element_by_partial_link_text(self, link_text)

7.xpath定位:find_element_by_xpath(self, xpath)

8.css定位:find_element_by_css_selector(self, css_selector)

八种复数形式

9.id复数定位find_elements_by_id(self, id_)

10.name复数定位find_elements_by_name(self, name)

11.class复数定位find_elements_by_class_name(self, name)

12.tag复数定位find_elements_by_tag_name(self, name)

13.link复数定位find_elements_by_link_text(self, text)

14.partial_link复数定位find_elements_by_partial_link_text(self, link_text)

15.xpath复数定位find_elements_by_xpath(self, xpath)

16.css复数定位find_elements_by_css_selector(self, css_selector)

综合用例

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

link = 'http://www.baidu.com'
browser = webdriver.PhantomJS() # 使用PhantomJS
# browser = webdriver.Chrome() #使用chromedriver
browser.get(link)
browser.encoding="utf-8"
html_doc = browser.page_source
soup=BeautifulSoup(html_doc,'lxml')
soupArr=soup.select( '[style="text-decoration:none;"]' )
yuanwen_list=soup.find_all("div", "contson")[0]
......

你可能感兴趣的:(selenium + chromedriver + BeautifulSoup做简单获取数据)