Android新闻阅读器(数据抓取)

第一篇技术博客,写得不好请见谅,谢谢(^_^)
由于最近师弟师妹们学习Android的需求,于是就写了此篇博客并且与各位分享一下。

整篇博客总共分为两部分。

第一部分搭建一个新闻列表界面(ListView列表)。
第二部分新闻数据的抓取(使用正则表达式)

涉及到的技术,java正则表达式,java网络编程(IO流)。
编译器:android studio

整个Demo项目的结构如下所示。
Android新闻阅读器(数据抓取)_第1张图片

1. 第一部分,搭建一个新闻列表界面

MainActivity.java文件代码如下

package per.edward.androidnewsreader;

import android.app.Activity;
import android.os.Bundle;
import android.view.View;
import android.widget.AdapterView;
import android.widget.ListView;
import android.widget.Toast;

import java.util.ArrayList;
import java.util.List;

import per.edward.androidnewsreader.adapter.NewsAdapter;
import per.edward.androidnewsreader.bean.NewsItemModel;


public class MainActivity extends Activity {
    private ListView mListView;
    private List list;
    private NewsAdapter adapter;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        //初始化视图
        initView();
        //初始化数据
        initData();
    }

    public void initView() {
        list = new ArrayList();
        mListView = (ListView) findViewById(R.id.list_view);
    }

    public void initData() {

        for (int i = 0; i < 15; i++) {
            list.add(i+"");
        }

        //新闻列表适配器
        adapter = new NewsAdapter(this, list, R.layout.adapter_news_item);
        mListView.setAdapter(adapter);
        //设置点击事件
        mListView.setOnItemClickListener(new ItemClickListener());
    }

    /**
     * 新闻列表点击事件
     */
    public class ItemClickListener implements AdapterView.OnItemClickListener{
        @Override
        public void onItemClick(AdapterView adapterView, View view, int i, long l) {
            Toast.makeText(getApplicationContext(),""+i,Toast.LENGTH_SHORT ).show();
        }
    }
}

activity_main.xml文件如下所示

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <ListView
        android:id="@+id/list_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

RelativeLayout>

adapter_news_item.xml文件如下所示


<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:padding="10dp">

    <ImageView
        android:id="@+id/image_view"
        android:layout_width="80dp"
        android:layout_height="wrap_content"
        android:scaleType="centerCrop"
        android:layout_centerVertical="true"
        android:background="@mipmap/ic_launcher" />

    <LinearLayout
        android:layout_marginLeft="10dp"
        android:id="@+id/line"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_toRightOf="@+id/image_view"
        android:orientation="vertical">

        <TextView
            android:id="@+id/txt_title"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:text="Edward"
            android:textSize="16dp" />

        <TextView
            android:layout_marginTop="5dp"
            android:id="@+id/txt_summary"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:text="岭南学院"
            android:textSize="12dp" />
    LinearLayout>

RelativeLayout>

~~~整个界面的效果就是如下图,非常的简单。

Android新闻阅读器(数据抓取)_第2张图片

整个新闻列表界面搭建完成。就是如此简单。接下来就是分享一下如何去抓取新闻网站的数据。

2. 第二部分,数据抓取分析

抓取目标URL地址:http://news.qq.com/china_index.shtml
下面咋们看看这个网站中的内容,内容中左边有个图片右边有新闻标题和新闻摘要。
接下来目标很明确,就是将这些数据全部拿下来,再将其显示在第一部搭建的界面中。

Android新闻阅读器(数据抓取)_第3张图片

查看此页面的源代码,如下图所示,我用红色边框勾出了三条新闻的源代码。

<a target="_blank" class="pic" href="/a/20150909/036168.htm"><img class="picto" src="http://img1.gtimg.com/news/pics/hv1/51/43/1920/124859016.jpg">a><em class="f14 l24"><a target="_blank" class="linkto" href="/a/20150909/036168.htm">英航客机美国拉斯维加斯起火 14人轻伤送医治疗a>em><p class="l22">美国联邦航空管理局发布声明说,飞机左引擎起火,机组中断起飞,指挥乘客紧急疏散。p>

我们可以发现,除了新闻的图片地址,新闻标题,新闻的摘要,新闻详情地址会改变之外,其它的标签对都不会改变。因此我们根据此规则,可以简单的使用正则表达式匹配出我们想要的数据出来。
正则表达式的核心代码如下

Pattern pattern = Pattern
                .compile("\"_blank\" class=\"pic\" href=\"([^\"]*)\">\"picto\" src=\"([^\"]*)\">\"f14 l24\">\"_blank\" class=\"linkto\" href=\"[^\"]*\">([^]*)

\"l22\">([^

]*)

");

可以看到compile中字符串里面的内容基本和每条新闻源码相似,其中([^\"]*),([^]*),([^

]*)这三个比较奇怪的语句,咋们可以简单的认为在此限定的字符串中任意匹配所有字符直到遇到\”结束。其它两个([^]*),([^

]*)
也差不多同样的意思。

Android新闻阅读器(数据抓取)_第4张图片

Function.java文件的代码

package per.edward.androidnewsreader.function;

import android.util.Log;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import per.edward.androidnewsreader.bean.NewsItemModel;

/**
 * description:解析新闻数据
 * 

* author:Edward *

* 2015/9/9 */ public class Function { public static List parseHtmlData(String result) { List list = new ArrayList<>(); Pattern pattern = Pattern .compile("([^]*)

([^

]*)

"
); Matcher matcher = pattern.matcher(result); StringBuffer sb = new StringBuffer(); while (matcher.find()) { NewsItemModel model = new NewsItemModel(); model.setNewsDetailUrl(matcher.group(1).trim()); model.setUrlImgAddress(matcher.group(2).trim()); model.setNewsTitle(matcher.group(3).trim()); model.setNewsSummary(matcher.group(4).trim()); sb.append("详情页地址:" + matcher.group(1).trim() + "\n"); sb.append("图片地址:" + matcher.group(2).trim() + "\n"); sb.append("标题:" + matcher.group(3).trim() + "\n"); sb.append("概要:" + matcher.group(4).trim() + "\n\n"); list.add(model); } Log.e("----------------->", sb.toString()); return list; } }

NewsItemModel.java

package per.edward.androidnewsreader.bean;

import android.graphics.Bitmap;

/**
 * description:新闻Model
 * 

* author:Edward *

* 2015/9/9 */ public class NewsItemModel { //存储加载完成的图片 private Bitmap newsBitmap; //新闻详情地址 private String newsDetailUrl; //新闻图片地址 private String urlImgAddress; //新闻标题 private String newsTitle; //新闻概要 private String newsSummary; public Bitmap getNewsBitmap() { return newsBitmap; } public void setNewsBitmap(Bitmap newsBitmap) { this.newsBitmap = newsBitmap; } public String getUrlImgAddress() { return urlImgAddress; } public void setUrlImgAddress(String urlImgAddress) { this.urlImgAddress = urlImgAddress; } public String getNewsDetailUrl() { return newsDetailUrl; } public void setNewsDetailUrl(String newsDetailUrl) { this.newsDetailUrl = newsDetailUrl; } public String getNewsTitle() { return newsTitle; } public void setNewsTitle(String newsTitle) { this.newsTitle = newsTitle; } public String getNewsSummary() { return newsSummary; } public void setNewsSummary(String newsSummary) { this.newsSummary = newsSummary; } }

CommonTool.java代码

package per.edward.androidnewsreader.tool;

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;

public class CommonTool {
    /**
     * get请求(获取指定地址的数据)
     *
     * @param urlString
     * @return
     */
    public static String getRequest(String urlString, String codingType) {
        BufferedInputStream bis = null;
        ByteArrayOutputStream bos = null;
        InputStream is = null;
        try {
            URL url = new URL(urlString);

            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            // 决定返回值为JSON格式,不可缺少
            conn.setRequestProperty("Accept", "*/*");

            conn.connect();

            int responseCode = conn.getResponseCode();

            if (responseCode == 200) {
                is = conn.getInputStream();

                bis = new BufferedInputStream(is);
                bos = new ByteArrayOutputStream();

                int length = 0;
                byte[] by = new byte[1024];
                while ((length = bis.read(by)) != -1) {
                    bos.write(by, 0, length);
                }
                bos.flush();

                String result = new String(bos.toByteArray(), codingType);

                // System.out.println(result);
                return result;

            }

        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            try {
                if (bos != null) {
                    bos.close();
                }

                if (bis != null) {
                    bis.close();
                }

                if (is != null) {
                    is.close();
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
                System.out.println("关闭失败!");
            }

        }
        return null;
    }

    /**
     * 下载图片网络
     *
     * @param urlString
     *
     * @return
     */
    public static InputStream getImgInputStream(String urlString) {
        try {
            URL url = new URL(urlString);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");   //设置请求方法为GET
            connection.setReadTimeout(10 * 1000);    //设置请求过时时间为10秒
            connection.connect();
            if (connection.getResponseCode() == 200) {
                return connection.getInputStream();
            } else {
                return null;
            }
        } catch (Exception e) {
            return null;
        }
    }


}

NewsAdapter.java文件

package per.edward.androidnewsreader.adapter;

import android.content.Context;
import android.view.LayoutInflater;
import android.view.View;
import android.view.ViewGroup;
import android.widget.BaseAdapter;
import android.widget.ImageView;
import android.widget.TextView;

import java.util.List;

import per.edward.androidnewsreader.R;
import per.edward.androidnewsreader.bean.NewsItemModel;

/**
 * description:
 * 

* author:Edward *

* 2015/9/9 */ public class NewsAdapter extends BaseAdapter { private Context mContext; private List list; private int layoutId; private ViewHolder viewHolder = null; public NewsAdapter(Context mContext, List list, int layoutId) { this.mContext = mContext; this.list = list; this.layoutId = layoutId; } @Override public int getCount() { return list.size(); } @Override public Object getItem(int i) { return list.get(i); } @Override public long getItemId(int i) { return i; } @Override public View getView(final int position, View view, ViewGroup viewGroup) { if (view == null) { viewHolder = new ViewHolder(); view = LayoutInflater.from(mContext).inflate(layoutId, null); viewHolder.imageView = (ImageView) view.findViewById(R.id.image_view); viewHolder.txtTitle = (TextView) view.findViewById(R.id.txt_title); viewHolder.txtSummary = (TextView) view.findViewById(R.id.txt_summary); view.setTag(viewHolder); } else { viewHolder = (ViewHolder) view.getTag(); } if (list.get(position).getNewsBitmap() != null) { viewHolder.imageView.setImageBitmap(list.get(position).getNewsBitmap()); } else { //如果没有图片,则将imageview控件隐藏 viewHolder.imageView.setVisibility(View.GONE); } viewHolder.txtTitle.setText(list.get(position).getNewsTitle()); viewHolder.txtSummary.setText(list.get(position).getNewsSummary()); return view; } public class ViewHolder { ImageView imageView; TextView txtTitle, txtSummary; } }

最后在进行网络操作之后别忘了AndroidManifest.xml的网络权限。


<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="per.edward.androidnewsreader">
    
    <uses-permission android:name="android.permission.INTERNET" />
    
    <uses-sdk
        android:maxSdkVersion="22"
        android:minSdkVersion="9" />


    <application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:theme="@style/AppTheme">
        <activity
            android:name=".MainActivity"
            android:label="@string/app_name">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            intent-filter>
        activity>
    application>

manifest>

最后再修改一下第一部分贴过的MainActivity.java文件的代码。

package per.edward.androidnewsreader;

import android.app.Activity;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Bundle;
import android.os.Handler;
import android.os.Message;
import android.util.Log;
import android.view.View;
import android.widget.AdapterView;
import android.widget.ListView;
import android.widget.Toast;

import java.util.ArrayList;
import java.util.List;

import per.edward.androidnewsreader.adapter.NewsAdapter;
import per.edward.androidnewsreader.bean.NewsItemModel;
import per.edward.androidnewsreader.function.Function;
import per.edward.androidnewsreader.tool.CommonTool;


public class MainActivity extends Activity {
    private ListView mListView;
    private List list;
    private NewsAdapter adapter;
    //获取数据成功
    private final static int GET_DATA_SUCCEED = 1;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        //初始化视图
        initView();
        //初始化数据
        initData();
    }

    public void initView() {
        list = new ArrayList();
        mListView = (ListView) findViewById(R.id.list_view);
    }


    public void initData() {
        //开启一个线程执行耗时操作
        new Thread(new Runnable() {
            @Override
            public void run() {
                //获取网络数据
                String result = CommonTool.getRequest("http://news.qq.com/china_index.shtml", "gbk");
                Log.e("结果------------->", result);
                //解析新闻数据
                List list = Function.parseHtmlData(result);

                for (int i = 0; i < list.size(); i++) {
                    NewsItemModel model = list.get(i);
                    //获取新闻图片
                    Bitmap bitmap = BitmapFactory.decodeStream(CommonTool.getImgInputStream(list.get(i).getUrlImgAddress()));

                    model.setNewsBitmap(bitmap);
                }
                mHandler.sendMessage(mHandler.obtainMessage(GET_DATA_SUCCEED, list));
            }
        }).start();
    }


    public Handler mHandler = new Handler() {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case GET_DATA_SUCCEED:
                    List list = (List) msg.obj;
                    //新闻列表适配器
                    adapter = new NewsAdapter(MainActivity.this, list, R.layout.adapter_news_item);
                    mListView.setAdapter(adapter);
                    //设置点击事件
                    mListView.setOnItemClickListener(new ItemClickListener());
                    Toast.makeText(getApplicationContext(), String.valueOf(list.size()), Toast.LENGTH_LONG).show();
                    break;
            }
        }
    };

    /**
     * 新闻列表点击事件
     */
    public class ItemClickListener implements AdapterView.OnItemClickListener {
        @Override
        public void onItemClick(AdapterView adapterView, View view, int i, long l) {
            NewsItemModel temp =(NewsItemModel) adapter.getItem(i);
            Toast.makeText(getApplicationContext(), temp.getNewsTitle(), Toast.LENGTH_SHORT).show();
        }
    }
}

Demo的最终效果图
Android新闻阅读器(数据抓取)_第5张图片

程序源码请戳这里

你可能感兴趣的:(Android基础)