HttpClient4.4.1模拟登录知乎

HttpClient4.4.1模拟登录知乎

一,登录要Post的表单数据是什么

这部分可以使用Wireshark工具来抓包就可以了,发现需要以下数据:

“_xsrf” = xxxx(这是一个变动的数据,需要先活取获取知乎首页源码来获得)
“email” = 邮箱
“password” = 密码
“rememberme” = “true”
“captcha” = 验证码(知乎有两种验证码,你们可以去看,我使用的是数字字符的那种验证码)

  • 获取_xsrf数据:
String xsrfValue = responseHtml.split("hidden\" name= \"_xsrf\" value=\"")[1].split("\"/>")[0];

responseHtml是首页的源码,根据网页的组织形式,把_xsrf数据分割出来。

二,我的登录代码

RequestConfig requestConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD_STRICT).build();
        CloseableHttpClient httpClient = HttpClients.custom().setDefaultRequestConfig(requestConfig).build();

        HttpGet getHomePage = new HttpGet("http://www.zhihu.com/");
        try {
            //填充登陆请求中基本的参数
            CloseableHttpResponse response = httpClient.execute(getHomePage);
            String responseHtml = EntityUtils.toString(response.getEntity());
            String xsrfValue = responseHtml.split(")[1].split("\"/>")[0];
            System.out.println("_xsrf:" + xsrfValue);
            response.close();
            List valuePairs = new LinkedList();
            valuePairs.add(new BasicNameValuePair("_xsrf" , xsrfValue));
            valuePairs.add(new BasicNameValuePair("email", 用户名));
            valuePairs.add(new BasicNameValuePair("password", 密码));
            valuePairs.add(new BasicNameValuePair("rememberme", "true"));

            //获取验证码
            HttpGet getCaptcha = new HttpGet("http://www.zhihu.com/captcha.gif?r=" + System.currentTimeMillis() + "&type=login");
            CloseableHttpResponse imageResponse = httpClient.execute(getCaptcha);
            FileOutputStream out = new FileOutputStream("/tmp/zhihu.gif");
            byte[] bytes = new byte[8192];
            int len;
            while ((len = imageResponse.getEntity().getContent().read(bytes)) != -1) {
                out.write(bytes,0,len);
            }
            out.close();
            Runtime.getRuntime().exec("eog /tmp/zhihu.gif");//ubuntu下看图片的命令是eog

            //请用户输入验证码
            System.out.print("请输入验证码:");
            Scanner scanner = new Scanner(System.in);
            String captcha = scanner.next();
            valuePairs.add(new BasicNameValuePair("captcha", captcha));

            //完成登陆请求的构造
            UrlEncodedFormEntity entity = new UrlEncodedFormEntity(valuePairs, Consts.UTF_8);
            HttpPost post = new HttpPost("http://www.zhihu.com/login/email");
            post.setEntity(entity);
            httpClient.execute(post);//登录

            HttpGet g = new HttpGet("http://www.zhihu.com/question/following");//获取“我关注的问题”页面
            CloseableHttpResponse r = httpClient.execute(g);
            System.out.println(EntityUtils.toString(r.getEntity()));
            r.close();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                httpClient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

此处要注意开头的RequestConfig,我一开始是没有设置cookie这方面的额内容的,结果一直提示有cookie错误,所以查看了HttpClient手册,上面提到了选择Cookie策略,通过这种方法设置一个全局的Cookie策略,

RequestConfig requestConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD_STRICT).build();//标准Cookie策略
CloseableHttpClient httpClient = HttpClients.custom().setDefaultRequestConfig(requestConfig).build();//设置进去

你可能感兴趣的:(Java,网络爬虫)