Android网页爬虫

爬取静态页面

需求:获取本人博客页面的 title “yhao的博客- 博客频道 - CSDN.NET”

首先通过okhttp以get方式请求页面:

 final String url = "http://blog.csdn.net/yhaolpz?viewmode=contents";
        Request request = new Request.Builder().url(url).build();
        mOkHttpClient.newCall(request).enqueue(new Callback() {
            @Override
            public void onFailure(Call call, IOException e) {
                Log.e(TAG, "onFailure ");
            }

            @Override
            public void onResponse(Call call, Response response) throws IOException {
                if (response.code() == 200) {
                    String html = response.body().string();
                    Log.d(TAG, "onResponse: " + html);                 
                }
            }
        });

返回页面数据onResponse如下:

 
 <html xmlns="http://www.w3.org/1999/xhtml">                                                                  
 <head>  
 <script type="text/javascript" src="http://c.csdnimg.cn/pubfooter/js/tracking.js" charset="utf-8">script>  
 <script type="text/javascript">
    var protocol = window.location.protocol;
    document.write('
                    
                    

你可能感兴趣的:(Android)