在浏览一些网站的时候由于各种原因,无法进行访问。
频繁访问也可能导致ip被封锁而导致无法访问网站,这在我们爬取数据过程中经常遇到。
这时我们需要通过IE,FireFox进行Http的代理设置,
当然httpClient也为我们提供这样的设置 。
代理ip用很多种形式,可以找到一个可用Ip或者使用goagent,或者购买代理ip池。
那在httpclicent中要怎么样运用呢。我们以goagent为例。
goagent打开后,只要通过127.0.0.1 端口8087来访问 就可以了
httpclient的具体用法有两种 post和get,具体使用方法可参考:
HTTPClient模块的HttpGet和HttpPost
要调用代理,只需要增加以下代码:
HttpHost proxy = new HttpHost("127.0.0.1",8087, null); httpclient.getParams().setParameter(ConnRouteParams.DEFAULT_PROXY, proxy);
如果代理需要用户,密码进行验证
httpClient.getCredentialsProvider().setCredentials( new AuthScope(proxyHost, proxyPort), new UsernamePasswordCredentials(userName, password));
完整例子:
public static void main(String args[]) { StringBuffer sb = new StringBuffer(); //创建HttpClient实例 HttpClient client = getHttpClient(); //创建httpGet HttpGet httpGet = new HttpGet("http://www.csdn.net"); //执行 try { HttpResponse response = client.execute(httpGet); HttpEntity entry = response.getEntity(); if(entry != null) { InputStreamReader is = new InputStreamReader(entry.getContent()); BufferedReader br = new BufferedReader(is); String str = null; while((str = br.readLine()) != null) { sb.append(str.trim()); } br.close(); } } catch (ClientProtocolException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println(sb.toString()); } //设置代理 public static HttpClient getHttpClient() { DefaultHttpClient httpClient = new DefaultHttpClient(); String proxyHost = "proxycn2.huawei.com"; int proxyPort = 8080; String userName = "china\\******"; String password = "*******“ httpClient.getCredentialsProvider().setCredentials( new AuthScope(proxyHost, proxyPort), new UsernamePasswordCredentials(userName, password)); HttpHost proxy = new HttpHost(proxyHost,proxyPort); httpClient.getParams().setParameter(ConnRouteParams.DEFAULT_PROXY, proxy); return httpClient; } 导入:commons-logging-1.1.jar,httpclient-4.0-beta2.jar ,httpcore-4.1-alpha1.jar 和 commons-codec-1.4.jar架包