之前使用的是浏览器打开登录,现在测试在后台运行浏览器,对于爬取数据的话就没必要显示浏览器了。
之前使用浏览器打开:
chromedriver = 'D:/Python35/selenium/webdriver/chromedriver.exe'
driver = webdriver.Chrome(executable_path=chromedriver)
现使用 PhantomJS 运行:
chromedriver = 'D:/Python35/mypy/phantomjs/bin/phantomjs.exe'
driver = webdriver.PhantomJS(chromedriver)
phantomjs 下载解压即可,驱动直接引用,如:
driver = webdriver.PhantomJS('D:/Python35/mypy/phantomjs/bin/phantomjs.exe')
但有可能自己的账户密码输入错误的情况,所以还是判断一下吧。默认情况下,下图红框中的提示信息是为空的,若出现错误,则提示。
当点击登录之后,判断是否错误,没有错误才进行接下来的页面跳转等操作。老方法查看提示信息的标签,获取其文本。
errtext = driver.find_element_by_id('TANGRAM__PSP_3__error').text.strip()
if len(errtext)!=0:
print("登录失败:%s" % errtext)
else:
print("登录成功")
while True:
if loginurl == driver.current_url :
time.sleep(1)
continue
else:
break
driver.get_screenshot_as_file(screenImg)
#-*- coding: utf-8 -*-
# python 3.5.0
import time
from selenium import webdriver
username = "kk"
password = "kk"
#chromedriver = 'D:/Python35/selenium/webdriver/chromedriver.exe'
chromedriver = 'D:/Python35/mypy/phantomjs/bin/phantomjs.exe'
loginurl = 'https://passport.baidu.com/v2/?login'
screenImg = "D:/Python35/selenium/webdriver/screenImg.png"
#打开浏览器
#driver = webdriver.Chrome(executable_path=chromedriver)
#后台打开(phantomjs下载:http://phantomjs.org/download.html)
print(">> getting chromedriver ……")
driver = webdriver.PhantomJS(chromedriver)
#options = webdriver.ChromeOptions()
print(">> initialize page ……")
driver.set_page_load_timeout(20)
driver.get(loginurl)
assert "登录百度帐号" in driver.title
#数据账号&密码,登录
print(">> setting username & password")
driver.find_element_by_id("TANGRAM__PSP_3__userName").send_keys(username)
driver.find_element_by_id("TANGRAM__PSP_3__password").send_keys(password)
driver.find_element_by_id("TANGRAM__PSP_3__submit").click()
#登录时是否提示错误信息(初期登录错误只提示,但仍不需要验证码)
print(">> login ……")
errtext = driver.find_element_by_id('TANGRAM__PSP_3__error').text.strip()
if len(errtext)!=0:
driver.quit()
print("login failed:%s" % errtext)
print(">> haved quitted!")
else:
#登录后加载完新的页面才能继续下一步
print(">> loading new page……")
while True:
if loginurl == driver.current_url :
time.sleep(1)
continue
else:
break
#获取当前新页面地址
print(driver.current_url)
#获取cookie
#print(driver.get_cookies())
#cookieList = driver.get_cookies()
#for dicts in cookieList:
# for key in dicts:
# print("'%s' = '%s'" % (key,dicts[key]))
#当前页面截图
driver.get_screenshot_as_file(screenImg)
print(">> quit after 5 seconds ……")
time.sleep(5)
driver.quit()
print(">> haved quitted!")