最近在在使用selenium爬取数据的时候,需要用到代理和JS渲染,使用PhantomJS渲染的效果无法解析部分数据,所以用了chrome渲染,现在找到的ChromeDriver设置有密码的代理都是Python版本的,昨天试了好几次,终于把Java版本的也调通了,现记录一下:
1、编写background.js
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "你自己的代理IP或域名",
port: 你自己的代理端口(Int整数)
},
bypassList: ["不需要代理的域名清单,使用逗号分隔"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "代理的用户名如:user1",
password: "代理的密码如:pwd1"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: [""]},
['blocking']
);
2、编写manifest.json
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
3、将background.js和manifest.json 压缩到proxy.zip文件中,记住proxy.zip里的background.js和manifest.json必须在根目录下,不能嵌套任何目录,如下:
4、将proxy的配置信息添加到ChromeOptions中,并配置chromedriver的路径信息:
ChromeOptions co = new ChromeOptions();
o.addExtensions(new File("f:/tmp/proxy/proxy.zip")); //将proxy的信息添加到ChromeOptions中
System.setProperty("webdriver.chrome.driver","drivers/chromedriver.exe"); //配置chromedriver.exe的路径信息
5、以百度为例,实现一个webdriver,并等待直到输入框加载完毕,代码如下:
RemoteWebDriver webdriver = new ChromeDriver(co);
webdriver.get("https://www.baidu.com/");
WebDriverWait wait = new WebDriverWait(webdriver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.cssSelector("input#kw")));
完整代码如下:
import java.io.File;
import org.openqa.selenium.By;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
public class ProxyedChromeDriver {
public static void main(String[] args) {
ChromeOptions co = new ChromeOptions();
co.addExtensions(new File("f:/tmp/proxy/proxy.zip")); //将proxy的信息添加到ChromeOptions中
System.setProperty("webdriver.chrome.driver","drivers/chromedriver.exe"); //配置chromedriver.exe的路径信息
RemoteWebDriver webdriver = new ChromeDriver(co);
webdriver.get("https://www.baidu.com/");
WebDriverWait wait = new WebDriverWait(webdriver, 10);
wait.until(ExpectedConditions.visibilityOfElementLocated(By.cssSelector("input#kw")));
webdriver.quit();
}
}
效果如下: