目录
- 安装selenium package
- 引入selenium package 建立webdriver对象
- 打开设定的url并等待response
- 通过xpath找到登录框并填入相应帐号密码模拟点击登录
- 验证登录成功与否若currenturl发生变化则认为登录成功
- 通过对象的方法获取当前访问网站的session cookie
- 得到cookie之后就可以通过urllib2访问相应的网站并可实现网页爬取等工作
1.安装selenium package:
- sudo pip install -U selenium
如果没有pip,先安装pip:
- sudo python setup.py install
2.引入selenium package, 建立webdriver对象:
- from selenium import webdriver
-
-
- sel = selenium.webdriver.Chrome()
在这一步,可能会提示chrome path 的错误,这是因为操作chrome浏览器需要有ChromeDriver的驱动来协助,驱动下载地址:
http://chromedriver.storage.googleapis.com/index.html?path=2.7/
.下载相应版本,并解压到目录
3.打开设定的url,并等待response:
- loginurl = 'http://weibo.com/'
- #open the login in page
- sel.get(loginurl)
- time.sleep(10)
4.通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:
- #sign in the username
- try:
- sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername')
- print 'user success!'
- except:
- print 'user error!'
- time.sleep(1)
- #sign in the pasword
- try:
- sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW')
- print 'pw success!'
- except:
- print 'pw error!'
- time.sleep(1)
- #click to login
- try:
- sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
- print 'click success!'
- except:
- print 'click error!'
- time.sleep(3)
5.验证登录成功与否,若currenturl发生变化,则认为登录成功:
- curpage_url = sel.current_url
- print curpage_url
- while(curpage_url == loginurl):
- #print 'please input the verify code:'
- print 'please input the verify code:'
- verifycode = sys.stdin.readline()
- sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode)
- try:
- sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()
- print 'click success!'
- except:
- print 'click error!'
- time.sleep(3)
- curpage_url = sel.current_url
6.通过对象的方法获取当前访问网站的session cookie:
- #get the session cookie
- cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()]
- #print cookie
-
- cookiestr = ';'.join(item for item in cookie)
- print cookiestr
7.得到cookie之后,就可以通过urllib2访问相应的网站,并可实现网页爬取等工作:
- import urllib2
-
-
- print '%%%using the urllib2 !!'
- homeurl = sel.current_url
- print 'homeurl: %s' % homeurl
- headers = {'cookie':cookiestr}
- req = urllib2.Request(homeurl, headers = headers)
- try:
- response = urllib2.urlopen(req)
- text = response.read()
- fd = open('homepage', 'w')
- fd.write(text)
- fd.close()
- print '###get home page html success!!'
- except:
- print '### get home page html error!!'
参考链接:
http://splinter.readthedocs.org/en/latest/drivers/chrome.html
http://www.testwo.com/blog/6931
http://docs.seleniumhq.org/projects/
http://docs.seleniumhq.org/docs/