数据爬取的对象主要分为两种:
静态数据:静态数据很好爬取,直接通过httpClient等框架就能爬取,因为请求后会直接将数据结果返回给你。
动态数据:动态数据则需要将网络数据加载到浏览器端,通过一定的javascript脚本运行后,才能输出我们想要的第一步结果,为什么说第一步结果,因为在数据爬虫中,我们想要的数据往往不是来源于一个页面,通常都是通过主页面,点击某个按钮后获取到一个信息列表,然后在这个信息列表中点击某个信息详情才能获取到这个信息的详细数据,而数据爬取往往都是先获取到主页,然后通过某些关键字或者主页上的某些按钮进行数据自动爬取,那么我们接下来说说如何通过Java实现网页数据自动爬取。
原来我使用过一些其他的爬虫框架,比如Jsoup,htmlunit等,这个框架对于一些简单的网站爬取没有任何问题,但是对于比较复杂的网站爬取就存在数据获取不到的问题了,比如动态页面Jsoup就不支持了,而对于存在内部跳转的网页htmlunit就获取不到信息了,在后来的研究中,我找到了selenium这个框架,一开始为了测试我是用的是selenium webdriver,这个就是基于浏览器浏览器去获取数据,每调用一次就会打开一个浏览器窗口去加载你的请求地址。
使用到的框架如下:
org.seleniumhq.selenium selenium-java 3.4.0 provided com.codeborne phantomjsdriver 1.2.1
需要手动下载相对于的应用应用程序,由于我的系统是macOS所以我下载的是phantomjs-2.1.1-macosx,下载地址为:phantomjs下载地址,根据不同系统下载不同版本。
我们要爬取的网站为:www.hapag-lloyd.cn 这个是物流信息网站,接下来我们来爬取这个物流信息。
网站展示界面如下:
操作流程:
我们觉得应该通过程序自动输入集装箱号,然后自动点击查询按钮,提交订单查询请求,如下图:
首先分析页面html代码,我们要做的事情就是通过代码将文本框的集装箱号码自动填上,然后自动触发跟踪按钮的点击事件,好比网页的自动化测试,首先分析html结构,结构如下如:
核心代码如下:
try{ WebElement inputElement = driver.findElement(By.id("tracing_by_booking_f:hl21")); WebElement tableElement = inputElement.findElement(By.id("tracing_by_booking_f:hl27")); WebElement tbodyElement = tableElement.findElement(By.tagName("tbody")); ListlistElement = tbodyElement.findElements(By.tagName("tr")); JSONArray arrayJson = new JSONArray(); for (WebElement trElement : listElement) { WebElement radioElement = trElement.findElement(By.tagName("input")); List tdElements = trElement.findElements(By.tagName("td")); JSONObject infoJsonObject = new JSONObject(); arrayJson.add(infoJsonObject); infoJsonObject.put("箱型", tdElements.get(1).getText()); infoJsonObject.put("箱号", tdElements.get(2).getText()); infoJsonObject.put("操作状态", tdElements.get(3).getText()); infoJsonObject.put("操作日期", tdElements.get(4).getText()); infoJsonObject.put("操作地点", tdElements.get(5).getText()); String snNumber = tdElements.get(2).getText(); String[] nums = snNumber.split("\\s+"); // String type = radioElement.getAttribute("type"); String url = null; if (nums.length > 1) { url = "https://www.hapag-lloyd.cn/zh/online-business/tracing/tracing-by-booking.html?view=S8510&container=" + nums[0] + "++" + nums[1]; } else { url = "https://www.hapag-lloyd.cn/zh/online-business/tracing/tracing-by-booking.html?view=S8510"; } PhantomJSDriver newDriver = getDriver(); newDriver.get(url); try { Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } JSONObject newinfoobject = new JSONObject(); infoJsonObject.put("detail", newinfoobject); System.out.println(newDriver.getPageSource()); loadsuccess = false; while(!loadsuccess){ if(initOk(newDriver,checkTimes,By.id("tracing_by_booking_f:hl29"))){ loadsuccess = true; } try { Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } } WebElement newinfoTableElement = newDriver.findElement(By.id("tracing_by_booking_f:hl29")); WebElement newinfotrElmenet = newinfoTableElement.findElement(By.className("boxContent")).findElement(By.tagName("tr")); List newinfotdsElmenet = newinfotrElmenet.findElements(By.xpath("td[contains(@style,'white-space:nowrap;')]")); System.out.println(newinfoTableElement.getText()); newinfoobject.put("箱型", newinfotdsElmenet.get(0).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("描述", newinfotdsElmenet.get(1).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("尺寸", newinfotdsElmenet.get(2).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("Tare(kg)", newinfotdsElmenet.get(3).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("Max.Payload(kg)", newinfotdsElmenet.get(4).findElements(By.tagName("td")).get(1).getText()); WebElement newltableElement = newDriver.findElement(By.id("tracing_by_booking_f:hl62")); WebElement newTableElement = newltableElement.findElement(By.id("tracing_by_booking_f:hl66")); WebElement newtbodyElement = newTableElement.findElement(By.tagName("tbody")); List newtrsElement = newtbodyElement.findElements(By.tagName("tr")); //JsonArray JSONArray newtrarray = new JSONArray(); infoJsonObject.put("infolist", newtrarray); for (WebElement newtrElement : newtrsElement) { List newtdsElement = newtrElement.findElements(By.tagName("td")); JSONObject newtdjson = new JSONObject(); newtdjson.put("operate", newtdsElement.get(0).getText()); newtdjson.put("operateaddress", newtdsElement.get(1).getText()); newtdjson.put("date", newtdsElement.get(2).getText()); newtdjson.put("time", newtdsElement.get(3).getText()); newtdjson.put("transport", newtdsElement.get(4).getText()); newtdjson.put("hc", newtdsElement.get(5).getText()); newtrarray.add(newtdjson); } Object[] strList = newDriver.getWindowHandles().toArray(); newDriver.close(); newDriver.quit(); } jedis.sadd("so:hlc:"+bkNo+":reslut",arrayJson.toString()); System.out.println(arrayJson.toString()); }catch(Exception e){ e.printStackTrace(); }
数据爬取结果如下:
[
{
"操作状态": "Gate in empty",
"操作日期": "2018-02-15",
"箱型": "45GP",
"箱号": "TRLU 7359129",
"infolist": [
{
"date": "2018-04-26",
"operate": "Gate out empty",
"time": "03:57",
"transport": "Truck",
"hc": "",
"operateaddress": "NINGBO"
},
{
"date": "2018-04-26",
"operate": "Arrival in",
"time": "13:42",
"transport": "Truck",
"hc": "",
"operateaddress": "NINGBO"
},
{
"date": "2018-05-04",
"operate": "Loaded",
"time": "06:29",
"transport": "LINAH",
"hc": "005W",
"operateaddress": "NINGBO"
},
{
"date": "2018-05-04",
"operate": "Vessel departed",
"time": "07:00",
"transport": "LINAH",
"hc": "005W",
"operateaddress": "NINGBO"
},
{
"date": "2018-06-04",
"operate": "Vessel arrived",
"time": "10:48",
"transport": "LINAH",
"hc": "005W",
"operateaddress": "HAMBURG"
},
{
"date": "2018-06-05",
"operate": "Discharged",
"time": "11:21",
"transport": "LINAH",
"hc": "005W",
"operateaddress": "HAMBURG"
}
],
"detail": {
"箱型": "45GP",
"尺寸": "40' X 8' X 9'6\"",
"Max.Payload(kg)": "28620",
"描述": "HIGH CUBE CONT.",
"Tare(kg)": "3880"
},
"操作地点": "SAVANNAH, GA"
},
{
"操作状态": "Gate in empty",
"操作日期": "2018-02-15",
"箱型": "45GP",
"箱号": "HLXU 8065397",
"infolist": [
{
"date": "2018-04-06",
"operate": "Gate out empty",
"time": "19:11",
"transport": "Truck",
"hc": "",
"operateaddress": "QINGDAO"
},
{
"date": "2018-04-07",
"operate": "Arrival in",
"time": "21:52",
"transport": "Truck",
"hc": "",
"operateaddress": "QINGDAO"
},
{
"date": "2018-04-12",
"operate": "Loaded",
"time": "19:45",
"transport": "ALEXANDRA",
"hc": "817E",
"operateaddress": "QINGDAO"
},
{
"date": "2018-04-13",
"operate": "Vessel departed",
"time": "00:45",
"transport": "ALEXANDRA",
"hc": "817E",
"operateaddress": "QINGDAO"
},
{
"date": "2018-04-15",
"operate": "Vessel arrived",
"time": "11:30",
"transport": "ALEXANDRA",
"hc": "817E",
"operateaddress": "BUSAN"
},
{
"date": "2018-04-15",
"operate": "Discharged",
"time": "15:03",
"transport": "ALEXANDRA",
"hc": "817E",
"operateaddress": "BUSAN"
},
{
"date": "2018-04-29",
"operate": "Loaded",
"time": "20:00",
"transport": "COPIAPO",
"hc": "816E",
"operateaddress": "BUSAN"
},
{
"date": "2018-04-30",
"operate": "Vessel departed",
"time": "12:24",
"transport": "COPIAPO",
"hc": "816E",
"operateaddress": "BUSAN"
},
{
"date": "2018-05-27",
"operate": "Vessel arrived",
"time": "15:39",
"transport": "COPIAPO",
"hc": "816E",
"operateaddress": "IQUIQUE"
},
{
"date": "2018-05-27",
"operate": "Discharged",
"time": "17:21",
"transport": "COPIAPO",
"hc": "816E",
"operateaddress": "IQUIQUE"
},
{
"date": "2018-05-31",
"operate": "Departure from",
"time": "08:43",
"transport": "Truck",
"hc": "",
"operateaddress": "IQUIQUE"
},
{
"date": "2018-05-31",
"operate": "Gate in empty",
"time": "10:15",
"transport": "Truck",
"hc": "",
"operateaddress": "IQUIQUE"
}
],
"detail": {
"箱型": "45GP",
"尺寸": "40' X 8' X 9'6\"",
"Max.Payload(kg)": "28550",
"描述": "HIGH CUBE CONT.",
"Tare(kg)": "3950"
},
"操作地点": "SAVANNAH, GA"
},
{
"操作状态": "Gate in empty",
"操作日期": "2018-02-15",
"箱型": "45GP",
"箱号": "CPSU 6435050",
"infolist": [
{
"date": "2018-03-21",
"operate": "Arrival in",
"time": "13:02",
"transport": "Truck",
"hc": "",
"operateaddress": "BELFAST, NORTHERN IRELAND"
},
{
"date": "2018-03-23",
"operate": "Loaded",
"time": "23:35",
"transport": "PHOENIX J",
"hc": "181243",
"operateaddress": "BELFAST, NORTHERN IRELAND"
},
{
"date": "2018-03-24",
"operate": "Vessel departed",
"time": "00:01",
"transport": "PHOENIX J",
"hc": "181243",
"operateaddress": "BELFAST, NORTHERN IRELAND"
},
{
"date": "2018-03-24",
"operate": "Vessel arrived",
"time": "07:00",
"transport": "PHOENIX J",
"hc": "181243",
"operateaddress": "LIVERPOOL"
},
{
"date": "2018-03-24",
"operate": "Discharged",
"time": "17:29",
"transport": "PHOENIX J",
"hc": "181243",
"operateaddress": "LIVERPOOL"
},
{
"date": "2018-03-31",
"operate": "Loaded",
"time": "11:36",
"transport": "MSC ALYSSA",
"hc": "127W12",
"operateaddress": "LIVERPOOL"
},
{
"date": "2018-03-31",
"operate": "Vessel departed",
"time": "23:30",
"transport": "MSC ALYSSA",
"hc": "127W12",
"operateaddress": "LIVERPOOL"
},
{
"date": "2018-04-08",
"operate": "Vessel arrived",
"time": "10:54",
"transport": "MSC ALYSSA",
"hc": "127W12",
"operateaddress": "MONTREAL, QC"
},
{
"date": "2018-04-10",
"operate": "Discharged",
"time": "05:17",
"transport": "MSC ALYSSA",
"hc": "127W12",
"operateaddress": "MONTREAL, QC"
},
{
"date": "2018-04-11",
"operate": "Departure from",
"time": "06:11",
"transport": "Rail",
"hc": "",
"operateaddress": "MONTREAL, QC"
},
{
"date": "2018-04-13",
"operate": "Arrival in",
"time": "20:10",
"transport": "Rail",
"hc": "",
"operateaddress": "CHICAGO, IL"
}
],
"detail": {
"箱型": "45GP",
"尺寸": "40' X 8' X 9'6\"",
"Max.Payload(kg)": "28560",
"描述": "HIGH CUBE CONT.",
"Tare(kg)": "3940"
},
"操作地点": "SAVANNAH, GA"
},
{
"操作状态": "Gate in empty",
"操作日期": "2018-02-15",
"箱型": "45GP",
"箱号": "GLDU 7588155",
"infolist": [
{
"date": "2018-04-21",
"operate": "Gate out empty",
"time": "16:14",
"transport": "Truck",
"hc": "",
"operateaddress": "QINGDAO"
},
{
"date": "2018-04-23",
"operate": "Arrival in",
"time": "10:37",
"transport": "Truck",
"hc": "",
"operateaddress": "QINGDAO"
},
{
"date": "2018-05-02",
"operate": "Loaded",
"time": "22:38",
"transport": "MANHATTAN BRIDGE",
"hc": "014W",
"operateaddress": "QINGDAO"
},
{
"date": "2018-05-03",
"operate": "Vessel departed",
"time": "01:30",
"transport": "MANHATTAN BRIDGE",
"hc": "014W",
"operateaddress": "QINGDAO"
},
{
"date": "2018-06-02",
"operate": "Vessel arrived",
"time": "22:24",
"transport": "MANHATTAN BRIDGE",
"hc": "014W",
"operateaddress": "PIRAEUS"
},
{
"date": "2018-06-03",
"operate": "Discharged",
"time": "08:52",
"transport": "MANHATTAN BRIDGE",
"hc": "014W",
"operateaddress": "PIRAEUS"
},
{
"date": "2018-06-05",
"operate": "Loaded",
"time": "06:01",
"transport": "FRITZ REUTER",
"hc": "1822N",
"operateaddress": "PIRAEUS"
},
{
"date": "2018-06-05",
"operate": "Vessel departure",
"time": "23:00",
"transport": "FRITZ REUTER",
"hc": "1822N",
"operateaddress": "PIRAEUS"
},
{
"date": "2018-06-13",
"operate": "Vessel arrival",
"time": "08:00",
"transport": "FRITZ REUTER",
"hc": "1822N",
"operateaddress": "CONSTANTA"
}
],
"detail": {
"箱型": "45GP",
"尺寸": "40' X 8' X 9'6\"",
"Max.Payload(kg)": "28720",
"描述": "HIGH CUBE CONT.",
"Tare(kg)": "3780"
},
"操作地点": "SAVANNAH, GA"
},
{
"操作状态": "Gate in empty",
"操作日期": "2018-02-16",
"箱型": "45GP",
"箱号": "HLXU 8105036",
"infolist": [
{
"date": "2018-05-30",
"operate": "Gate out empty",
"time": "16:36",
"transport": "Truck",
"hc": "",
"operateaddress": "MIAMI, FL"
},
{
"date": "2018-06-04",
"operate": "Arrival in",
"time": "08:48",
"transport": "Truck",
"hc": "",
"operateaddress": "PORT EVERGLADES, FL"
},
{
"date": "2018-06-07",
"operate": "Vessel departure",
"time": "15:00",
"transport": "LIMARI",
"hc": "0170S",
"operateaddress": "PORT EVERGLADES, FL"
},
{
"date": "2018-06-21",
"operate": "Vessel arrival",
"time": "09:00",
"transport": "LIMARI",
"hc": "0170S",
"operateaddress": "SAN ANTONIO"
}
],
"detail": {
"箱型": "45GP",
"尺寸": "40' X 8' X 9'6\"",
"Max.Payload(kg)": "28550",
"描述": "HIGH CUBE CONT.",
"Tare(kg)": "3950"
},
"操作地点": "SAVANNAH, GA"
}
]
该网页爬取一共爬取了三层网页,通过模拟页面按钮点击事件触发页面请求跳转。
以下为数据爬取全部代码:
HLCServiceThread.java
import com.alibaba.fastjson.JSONArray; import com.alibaba.fastjson.JSONObject; import org.openqa.selenium.By; import org.openqa.selenium.WebElement; import org.openqa.selenium.phantomjs.PhantomJSDriver; import redis.clients.jedis.Jedis; import java.util.List; import java.util.Set; /** * Created by pengweikang on 2018/3/6. */ public class HLCServiceThread extends BaseService { private String indexUrl; private String bkNo; private int checkTimes = 10; public HLCServiceThread(String url ,int checkTimes){ JSONObject json = UrlParamsUtils.getParams(url); this.indexUrl = json.getString("indexUrl"); this.bkNo = json.getString("bkNo"); this.checkTimes = checkTimes; } public HLCServiceThread(){} @Override public void run() { PhantomJSDriver driver = getDriver(); Jedis jedis = RedisUtils.getJedis(); driver.get("https://www.hapag-lloyd.cn/zh/home.html"); /** * pengweikang 该代码主要解决网站存在内部跳转而导致网页数据爬取失败问题 */ boolean loadsuccess = false; while(!loadsuccess){ if(initOk(driver,checkTimes,By.id("tracingvalue"))){ loadsuccess = true; } try { Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } } WebElement valueElement = driver.findElement(By.id("tracingvalue")); valueElement.sendKeys("80596513"); WebElement buttonElement = driver.findElements(By.className("hal-toggle-input-content")).get(0).findElement(By.tagName("button")); String text = buttonElement.getText(); System.out.println(text); buttonElement.click(); loadsuccess = false; while(!loadsuccess){ if(initOk(driver,checkTimes,By.id("tracing_by_booking_f:hl21"))){ loadsuccess = true; } try { Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } } try{ WebElement inputElement = driver.findElement(By.id("tracing_by_booking_f:hl21")); WebElement tableElement = inputElement.findElement(By.id("tracing_by_booking_f:hl27")) WebElement tbodyElement = tableElement.findElement(By.tagName("tbody")); ListlistElement = tbodyElement.findElements(By.tagName("tr")); JSONArray arrayJson = new JSONArray(); for (WebElement trElement : listElement) { WebElement radioElement = trElement.findElement(By.tagName("input")); List tdElements = trElement.findElements(By.tagName("td")); JSONObject infoJsonObject = new JSONObject(); arrayJson.add(infoJsonObject); infoJsonObject.put("箱型", tdElements.get(1).getText()); infoJsonObject.put("箱号", tdElements.get(2).getText()); infoJsonObject.put("操作状态", tdElements.get(3).getText()); infoJsonObject.put("操作日期", tdElements.get(4).getText()); infoJsonObject.put("操作地点", tdElements.get(5).getText()); String snNumber = tdElements.get(2).getText(); String[] nums = snNumber.split("\\s+"); // String type = radioElement.getAttribute("type"); String url = null; if (nums.length > 1) { url = "https://www.hapag-lloyd.cn/zh/online-business/tracing/tracing-by-booking.html?view=S8510&container=" + nums[0] + "++" + nums[1]; } else { url = "https://www.hapag-lloyd.cn/zh/online-business/tracing/tracing-by-booking.html?view=S8510"; } PhantomJSDriver newDriver = getDriver(); newDriver.get(url); try { Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } //InfoJson JSONObject newinfoobject = new JSONObject(); infoJsonObject.put("detail", newinfoobject); System.out.println(newDriver.getPageSource()); loadsuccess = false; while(!loadsuccess){ if(initOk(newDriver,checkTimes,By.id("tracing_by_booking_f:hl29"))){ loadsuccess = true; } try { Thread.sleep(1000L); } catch (InterruptedException e) { e.printStackTrace(); } } WebElement newinfoTableElement = newDriver.findElement(By.id("tracing_by_booking_f:hl29")); WebElement newinfotrElmenet = newinfoTableElement.findElement(By.className("boxContent")).findElement(By.tagName("tr")); List newinfotdsElmenet = newinfotrElmenet.findElements(By.xpath("td[contains(@style,'white-space:nowrap;')]")); System.out.println(newinfoTableElement.getText()); newinfoobject.put("箱型", newinfotdsElmenet.get(0).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("描述", newinfotdsElmenet.get(1).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("尺寸", newinfotdsElmenet.get(2).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("Tare(kg)", newinfotdsElmenet.get(3).findElements(By.tagName("td")).get(1).getText()); newinfoobject.put("Max.Payload(kg)", newinfotdsElmenet.get(4).findElements(By.tagName("td")).get(1).getText()); WebElement newltableElement = newDriver.findElement(By.id("tracing_by_booking_f:hl62")); WebElement newTableElement = newltableElement.findElement(By.id("tracing_by_booking_f:hl66")); WebElement newtbodyElement = newTableElement.findElement(By.tagName("tbody")); List newtrsElement = newtbodyElement.findElements(By.tagName("tr")); //JsonArray JSONArray newtrarray = new JSONArray(); infoJsonObject.put("infolist", newtrarray); for (WebElement newtrElement : newtrsElement) { List newtdsElement = newtrElement.findElements(By.tagName("td")); JSONObject newtdjson = new JSONObject(); newtdjson.put("operate", newtdsElement.get(0).getText()); newtdjson.put("operateaddress", newtdsElement.get(1).getText()); newtdjson.put("date", newtdsElement.get(2).getText()); newtdjson.put("time", newtdsElement.get(3).getText()); newtdjson.put("transport", newtdsElement.get(4).getText()); newtdjson.put("hc", newtdsElement.get(5).getText()); newtrarray.add(newtdjson); } Object[] strList = newDriver.getWindowHandles().toArray(); newDriver.close(); newDriver.quit(); } jedis.sadd("so:hlc:"+bkNo+":reslut",arrayJson.toString()); System.out.println(arrayJson.toString()); }catch(Exception e){ e.printStackTrace(); } } public static void main(String[] args) { HLCServiceThread thread = new HLCServiceThread(); thread.run(); } }
BaseService.java
import org.openqa.selenium.By; import org.openqa.selenium.Dimension; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.phantomjs.PhantomJSDriver; import org.openqa.selenium.phantomjs.PhantomJSDriverService; import org.openqa.selenium.remote.CapabilityType; import org.openqa.selenium.remote.DesiredCapabilities; import org.springframework.stereotype.Component; import java.util.concurrent.TimeUnit; /** * Created by pengweikang on 2018/1/24. */ @Component public abstract class BaseService implements Runnable{ public PhantomJSDriver getDriver(){ DesiredCapabilities dcaps = new DesiredCapabilities(); //ssl证书支持 dcaps.setCapability("acceptSslCerts", true); //截屏支持 dcaps.setCapability("takesScreenshot", true); //css搜索支持 dcaps.setCapability("cssSelectorsEnabled", true); dcaps.setCapability("phantomjs.page.settings.XSSAuditingEnabled",true); dcaps.setCapability("phantomjs.page.settings.webSecurityEnabled",false); dcaps.setCapability("phantomjs.page.settings.localToRemoteUrlAccessEnabled",true); dcaps.setCapability("phantomjs.page.settings.XSSAuditingEnabled",true); dcaps.setCapability("phantomjs.page.settings.loadImages",false); //js支持 dcaps.setJavascriptEnabled(true); //驱动支持 dcaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, System.getProperty(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY)); //dcaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,systemProps.getPhantomjsPath()); //dcaps.setCapability("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0"); //dcaps.setCapability("phantomjs.page.customHeaders.User-Agent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0"); dcaps.setCapability("ignoreProtectedModeSettings", true); // org.openqa.selenium.Proxy proxy = new org.openqa.selenium.Proxy(); // proxy.setProxyType(org.openqa.selenium.Proxy.ProxyType.MANUAL); // proxy.setHttpProxy("http://180.155.128.87:47593/"); // dcaps.setCapability(CapabilityType.PROXY, proxy); //创建无界面浏览器对象 PhantomJSDriver driver = new PhantomJSDriver(dcaps); driver.manage().timeouts().pageLoadTimeout(120,TimeUnit.SECONDS); driver.manage().timeouts().setScriptTimeout(120,TimeUnit.SECONDS); driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS); driver.manage().deleteAllCookies(); driver.manage().window().setSize(new Dimension(1920,1080)); return driver; } public boolean initOk(WebDriver driver,int CheckTImes, By by){ int index = CheckTImes; while(index > 0){ try { System.out.println(driver.getPageSource()); WebElement element = driver.findElement(by); if(element != null){ return true; } }catch ( Exception e){ }finally { } try { Thread.sleep(5000); }catch (Exception e){ } index --; } return false; } public void setProxy(PhantomJSDriver driver){ } }
RedisUtils.java
import redis.clients.jedis.Jedis; import redis.clients.jedis.JedisPool; import redis.clients.jedis.JedisPoolConfig; import java.io.*; /** * Created by pengweikang on 2018/1/22. */ public class RedisUtils { //Redis服务器IP private static String ADDR = "127.0.0.1"; //Redis的端口号 private static int PORT = 6379; //访问密码 private static String AUTH = "admin"; //可用连接实例的最大数目,默认值为8; //如果赋值为-1,则表示不限制;如果pool已经分配了maxActive个jedis实例,则此时pool的状态为exhausted(耗尽)。 private static int MAX_ACTIVE = 1024; //控制一个pool最多有多少个状态为idle(空闲的)的jedis实例,默认值也是8。 private static int MAX_IDLE = 200; //等待可用连接的最大时间,单位毫秒,默认值为-1,表示永不超时。如果超过等待时间,则直接抛出JedisConnectionException; private static int MAX_WAIT = 10000; private static int TIMEOUT = 10000; //在borrow一个jedis实例时,是否提前进行validate操作;如果为true,则得到的jedis实例均是可用的; private static boolean TEST_ON_BORROW = true; private static JedisPool jedisPool = null; /** * 初始化Redis连接池 */ static { try { JedisPoolConfig config = new JedisPoolConfig(); config.setMaxWaitMillis(MAX_WAIT); //config.setMaxActive(MAX_ACTIVE); config.setMaxTotal(MAX_ACTIVE); config.setMaxIdle(MAX_IDLE); // config.setMaxWait(MAX_WAIT); config.setTestOnBorrow(TEST_ON_BORROW); jedisPool = new JedisPool(config, ADDR, PORT, TIMEOUT, AUTH); } catch (Exception e) { e.printStackTrace(); } } /** * 获取Jedis实例 * @return */ public synchronized static Jedis getJedis() { try { if (jedisPool != null) { Jedis resource = jedisPool.getResource(); return resource; } else { return null; } } catch (Exception e) { e.printStackTrace(); return null; } } /** * 释放jedis资源 * @param jedis */ public static void returnResource(final Jedis jedis) { if (jedis != null) { jedisPool.returnResource(jedis); } } //序列化 public static byte [] serialize(Object obj){ ObjectOutputStream obi=null; ByteArrayOutputStream bai=null; try { bai=new ByteArrayOutputStream(); obi=new ObjectOutputStream(bai); obi.writeObject(obj); byte[] byt=bai.toByteArray(); return byt; } catch (IOException e) { e.printStackTrace(); } return null; } // //反序列化 // public static Object unserizlize(byte[] byt){ // ObjectInputStream oii=null; // ByteArrayInputStream bis=null; // bis=new ByteArrayInputStream(byt); // try { // oii=new ObjectInputStream(bis); // Object obj=oii.readObject(); // return obj; // } catch (Exception e) { // // e.printStackTrace(); // } // return null; // } //反序列化 public static <T> T unserizlizeToObject(byte[] byt,Class<T> clazz){ ObjectInputStream oii=null; ByteArrayInputStream bis=null; bis=new ByteArrayInputStream(byt); try { oii=new ObjectInputStream(bis); T obj=(T)oii.readObject(); return obj; } catch (Exception e) { e.printStackTrace(); } return null; } }
UrlParamsUtils.java
import com.alibaba.fastjson.JSONObject; /** * Created by pengweikang on 2018/1/24. */ public class UrlParamsUtils { public static JSONObject getParams(String url){ JSONObject jsonObject = new JSONObject(); String paramUrl = url.substring(url.indexOf("?")+1,url.length()); String indexUrl = url.substring(0,url.indexOf("?")); jsonObject.put("indexUrl",indexUrl); String paramObj [] = paramUrl.split("&"); for(String obj : paramObj){ String [] keyvalue = obj.split("="); jsonObject.put(keyvalue[0],keyvalue[1]); } return jsonObject; } }