Java爬取需要登录的网页的数据(HTTPClient)

  1. 先获取登录页面登录之后要请求的urlJava爬取需要登录的网页的数据(HTTPClient)_第1张图片
  2. 通过上述url去获取html内容,并解析URL获取需要的数据

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;

import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import org.apache.http.util.EntityUtils;


public class Test {
	public static void main(String[] args) throws ClientProtocolException, IOException {
        //初始化一个httpclient
        HttpClient client = new DefaultHttpClient();
        // 登录url
        String url="xxxx";
        HttpPost httpost = new HttpPost(url); 
		List nvp = new ArrayList();
		nvp.add(new BasicNameValuePair("emp_DomainName","whzhangt"));
		nvp.add(new BasicNameValuePair("emp_Password", "****"));
		httpost.setEntity(new UrlEncodedFormEntity(nvp, HTTP.UTF_8));
		HttpResponse response1 = client.execute(httpost);
		httpost.abort();//关闭httppost,不关闭的话下面使用httpget会报错
		if (response1.getStatusLine().getStatusCode() == 302) {//使用httppost执行,会导致302重定向,response中会包含重定向的地址yyy,需使用get方式访问
			HttpGet httpget = new HttpGet("yyy");
			HttpResponse response = client.execute(httpget);
			String entity = EntityUtils.toString (response.getEntity(),"utf-8");  
			System.out.println(entity);//输出的就是html的内容
		}else {
			System.out.println("失败");
		}
	}
}

pom.xml中的依赖:


            org.apache.httpcomponents
            httpclient
            4.5.6

参考链接:https://blog.csdn.net/qy20115549/article/details/52203722

你可能感兴趣的:(Java爬取需要登录的网页的数据(HTTPClient))