效果图太大了,我放到github上了,想看效果的点击以下链接:
效果图一
效果图二
首先这个jsoup只能对html爬取数据,js里面的东西爬不到,暂时先只爬html的数据,这里先说明一下,博主仅仅出于学习的目的,不用做商业,也不是恶意窃取数据,现在的版权问题懂得好怕怕。
他们家的数据
第一件事就是引入依赖
compile 'org.jsoup:jsoup:1.10.1'
然后比较恶心的上一下html的源代码,这里我格式化了,我把代码贴上,恶心一下(看了下效果,代码片太长了,截取一下得了),我这里用的sublime编辑的
<div class="wrap">
<div class="w clear">
<div class="space_left">
<div class="ui_newlist_1 get_num" id="J_list">
<ul>
<li data-id="285511">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-285511.html" title="孔雀开屏鱼">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/17/20161217148196906319013.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-285511.html">孔雀开屏鱼a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-9541848.html">零下一度0511a>p>
<p class="subcontent">原料:武昌鱼、姜、豆豉、葱、青红椒、盐、胡椒粉、蒸鱼豉油、花生油、料酒。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304148">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304148.html" title="花开富贵">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/12/20161212148152950212413.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304148.html">花开富贵a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-9014474.html">小厨妞1688a>p>
<p class="subcontent">原料:大白菜菜叶、辣椒、猪肉、盐。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304224">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304224.html" title="千层葱花饼">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/12/20161212148155127748313.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304224.html">千层葱花饼a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-2261565.html">香儿厨房a>p>
<p class="subcontent">原料:馄饨皮、葱花、盐、蛋液、油。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304301">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304301.html" title="猪蹄冻----高逼格新年宴客菜">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/13/20161213148161404254213.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304301.html">猪蹄冻----高逼格新年宴客菜a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-7482619.html">允儿小妞的厨房a>p>
<p class="subcontent">原料:猪蹄儿、料酒、生姜、大蒜、桂皮、香叶、八角、丁香、大葱、小米辣。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="302700">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-302700.html" title="辣酱粉丝">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/13/20161213148159716214013.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-302700.html">辣酱粉丝a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-9764821.html">天国的女儿a>p>
<p class="subcontent">原料:五花肉、海米、粉丝、郫县豆瓣酱、蚝油、蒜蓉辣酱、蒜、葱、白糖、香菜末p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304481">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304481.html" title="【桃李厨艺】正宗祖传灌汤小笼包的做法,鲜美多汁!吃货赶紧来试试!">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/14/20161214148170810381213.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304481.html">【桃李厨艺】正宗祖传灌汤小笼包的做法,鲜美多汁!吃货赶紧来试试!a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-8893689.html">桃李烹饪a>p>
<p class="subcontent">原料:猪肉、盐、细砂糖、色拉油、料酒、鸡精、老抽、葱姜水。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304135">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304135.html" title="千层肉饼">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/12/20161212148152421589213.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304135.html">千层肉饼a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-9118742.html">满宝妈妈a>p>
<p class="subcontent">原料:猪肉馅、清水、食用油、酱油、花椒油、盐、面粉、葱花、料酒、蚝油、香油、鸡精。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304363">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304363.html" title="秘制红烧肉">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/13/20161213148163593838413.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304363.html">秘制红烧肉a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-7194731.html">多幸福多快乐a>p>
<p class="subcontent">原料:猪五花肉、大蒜、葱段、姜片、八角、、盐、食用油、清水、冰糖、黄酒、生抽。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304458">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304458.html" title="麻婆豆腐">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/14/20161214148170177631113.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304458.html">麻婆豆腐a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-6591561.html">梦~桃缘a>p>
<p class="subcontent">原料:肉末、豆腐、郫县豆瓣酱、花椒粉、淀粉、葱、水、辣椒粉、酱油p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
<li data-id="304129">
<div class="pic">
<a target="_blank" href="http://home.meishichina.com/recipe-304129.html" title="香辣土豆片">
<img width="180" height="180" src="http://static.meishichina.com/v6/img/blank.gif" data-src="http://i3.meishichina.com/attachment/recipe/2016/12/12/20161212148153522532813.jpg@!c320" class="imgLoad">a>
div>
<div class="detail">
<h2>
<a target="_blank" href="http://home.meishichina.com/recipe-304129.html">香辣土豆片a>h2>
<p class="subline">
<a target="_blank" href="http://home.meishichina.com/space-1478694.html">斯佳丽WHa>p>
<p class="subcontent">原料:土豆、青蒜、辣椒粉、干辣椒、蒜、油盐。p>
<div class="substatus clear">
<div class="left">div>
div>
div>
li>
ul>
div>
抓的是这个列表的数据,再来看一下jsoup比较实用的代码
Document document = Jsoup.connect("http://home.meishichina.com/show-top-type-recipe-page-" + page + ".html").get();
Log.d("jsoup:", "http://home.meishichina.com/show-top-type-recipe-page-" + page + ".html");
Elements elements = document.select("div.top-bar");
// Log.d("jsoup:", elements.select("a").attr("title"));
Elements titleAndPic = document.select("div.pic");
// Log.d("jsoup", "数量:" + titleAndPic.size());
// Log.d("jsoup", "title:" + titleAndPic.get(1).select("a").attr("title") + "pic:" + titleAndPic.get(1).select("a").select("img").attr("data-src"));
Elements url = document.select("div.detail").select("h2").select("a");
// Log.d("jsoup", "url:" + url.get(1).attr("href"));
Elements burden = document.select("p.subcontent");
// Log.d("jsoup", "burden:" + burden.get(1).text());
select是选取一个节点,attr是取数据,简单可以这么理解,爬虫其实并不是一味地知道网站就直接去爬,还要了解里面的数据结构以后才可以,当然这是相对于jsoup来说的,其他的博主没用过也不细说
看一下android的代码:
fragement
package com.fanyafeng.recreation.fragment;
import android.content.Context;
import android.net.Uri;
import android.os.AsyncTask;
import android.os.Bundle;
import android.os.Handler;
import android.os.Message;
import android.support.annotation.Nullable;
import android.support.v4.app.Fragment;
import android.support.v7.widget.GridLayoutManager;
import android.support.v7.widget.RecyclerView;
import android.support.v7.widget.StaggeredGridLayoutManager;
import android.support.v7.widget.Toolbar;
import android.util.Log;
import android.view.LayoutInflater;
import android.view.View;
import android.view.ViewGroup;
import android.widget.Toast;
import com.fanyafeng.recreation.R;
import com.fanyafeng.recreation.adapter.MenuAdapter;
import com.fanyafeng.recreation.bean.MainItemBean;
import com.fanyafeng.recreation.bean.MenuBean;
import com.fanyafeng.recreation.network.NetUtil;
import com.fanyafeng.recreation.network.Urls;
import com.fanyafeng.recreation.refreshview.XRefreshView;
import com.fanyafeng.recreation.refreshview.XRefreshViewFooter;
import com.fanyafeng.recreation.util.StringUtil;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class TwoFragment extends BaseFragment {
private static final String ARG_PARAM1 = "param1";
private static final String ARG_PARAM2 = "param2";
private static final int XREFRESH_GET_DATA = 0;
private static final int XREFRESH_FRESH = 1;
private static final int XREFRESH_LOAD_MORE = 2;
private String mParam1;
private String mParam2;
private Toolbar toolbar_two;
private XRefreshView refreshTwo;
private RecyclerView rvTwo;
private List menuBeanList = new ArrayList<>();
private MenuAdapter menuAdapter;
private int page = 1;
public TwoFragment() {
// Required empty public constructor
}
public static TwoFragment newInstance(String param1, String param2) {
TwoFragment fragment = new TwoFragment();
Bundle args = new Bundle();
args.putString(ARG_PARAM1, param1);
args.putString(ARG_PARAM2, param2);
fragment.setArguments(args);
return fragment;
}
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
if (getArguments() != null) {
mParam1 = getArguments().getString(ARG_PARAM1);
mParam2 = getArguments().getString(ARG_PARAM2);
}
}
@Override
public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
// Inflate the layout for this fragment
return inflater.inflate(R.layout.fragment_two, container, false);
}
@Override
public void onActivityCreated(@Nullable Bundle savedInstanceState) {
super.onActivityCreated(savedInstanceState);
initView();
Thread thread = new Thread(new LoadThread(XREFRESH_GET_DATA));
thread.start();
}
private void initView() {
toolbar_two = (Toolbar) getActivity().findViewById(R.id.toolbar_two);
toolbar_two.setLogo(R.drawable.simle_logo_02);
toolbar_two.setTitle("美食");
refreshTwo = (XRefreshView) getActivity().findViewById(R.id.refreshTwo);
refreshTwo.setAutoLoadMore(true);
refreshTwo.setPullLoadEnable(true);
rvTwo = (RecyclerView) getActivity().findViewById(R.id.rvTwo);
rvTwo.setLayoutManager(new StaggeredGridLayoutManager(2, StaggeredGridLayoutManager.VERTICAL));
menuAdapter = new MenuAdapter(getActivity(), menuBeanList);
menuAdapter.setCustomLoadMoreView(new XRefreshViewFooter(getActivity()));
rvTwo.setAdapter(menuAdapter);
refreshTwo.setXRefreshViewListener(new XRefreshView.SimpleXRefreshListener() {
@Override
public void onRefresh() {
super.onRefresh();
// getData(1, XREFRESH_FRESH);
new Handler().postDelayed(new Runnable() {
@Override
public void run() {
Thread thread = new Thread(new LoadThread(XREFRESH_FRESH));
thread.start();
}
}, 1000);
}
@Override
public void onLoadMore(boolean isSilence) {
super.onLoadMore(isSilence);
new Handler().postDelayed(new Runnable() {
@Override
public void run() {
Thread thread = new Thread(new LoadThread(XREFRESH_LOAD_MORE));
thread.start();
}
}, 1000);
}
});
}
private void getData(int page, int refreshState) {
try {
// Document document = Jsoup.connect("http://home.meishichina.com/show-top-type-recipe.html").get();
// http://home.meishichina.com/show-top-type-recipe-page-2.html
Document document = Jsoup.connect("http://home.meishichina.com/show-top-type-recipe-page-" + page + ".html").get();
Log.d("jsoup:", "http://home.meishichina.com/show-top-type-recipe-page-" + page + ".html");
Elements elements = document.select("div.top-bar");
// Log.d("jsoup:", elements.select("a").attr("title"));
Elements titleAndPic = document.select("div.pic");
// Log.d("jsoup", "数量:" + titleAndPic.size());
// Log.d("jsoup", "title:" + titleAndPic.get(1).select("a").attr("title") + "pic:" + titleAndPic.get(1).select("a").select("img").attr("data-src"));
Elements url = document.select("div.detail").select("h2").select("a");
// Log.d("jsoup", "url:" + url.get(1).attr("href"));
Elements burden = document.select("p.subcontent");
// Log.d("jsoup", "burden:" + burden.get(1).text());
for (int i = 0; i < titleAndPic.size(); i++) {
// Log.d("jsoup", "title:" + titleAndPic.get(i).select("a").attr("title") + "pic:" + titleAndPic.get(i).select("a").select("img").attr("data-src"));
// Log.d("jsoup", "url:" + url.get(i).attr("href"));
// Log.d("jsoup", "burden:" + burden.get(i).text());
int imgLength = titleAndPic.get(i).select("a").select("img").attr("data-src").length();
String img = titleAndPic.get(i).select("a").select("img").attr("data-src");
// Log.d("jsoup", img.substring(0, imgLength - 3) + "640");
String title = titleAndPic.get(i).select("a").attr("title");
String pic = img.substring(0, imgLength - 3) + "640";
String itemUrl = url.get(i).attr("href");
String itemBurden = burden.get(i).text();
MenuBean menuBean = new MenuBean();
menuBean.setTitle(title);
menuBean.setPic(pic);
menuBean.setUrl(itemUrl);
menuBean.setBurden(itemBurden);
menuBeanList.add(menuBean);
Message message = Message.obtain();
message.what = refreshState;
handler.sendMessage(message);
}
} catch (Exception e) {
e.printStackTrace();
}
}
class LoadThread implements Runnable {
int refreshState;
public LoadThread(int refreshState) {
this.refreshState = refreshState;
}
@Override
public void run() {
if (refreshState == XREFRESH_GET_DATA) {
menuBeanList.clear();
page = 1;
} else if (refreshState == XREFRESH_FRESH) {
menuBeanList.clear();
page = 1;
} else if (refreshState == XREFRESH_LOAD_MORE) {
page++;
}
getData(page, refreshState);
}
}
Handler handler = new Handler() {
@Override
public void handleMessage(Message msg) {
super.handleMessage(msg);
switch (msg.what) {
case XREFRESH_FRESH:
refreshTwo.stopRefresh();
break;
case XREFRESH_GET_DATA:
break;
case XREFRESH_LOAD_MORE:
refreshTwo.stopLoadMore();
break;
}
menuAdapter.notifyDataSetChanged();
}
};
}
xml:
"http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:background="@android:color/white"
tools:context="com.fanyafeng.recreation.fragment.TwoFragment">
.support.design.widget.CoordinatorLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
android:fitsSystemWindows="false"
tools:context="com.fanyafeng.recreation.activity.MainActivity">
.support.design.widget.AppBarLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:theme="@style/AppTheme.AppBarOverlay">
.support.v7.widget.Toolbar
android:id="@+id/toolbar_two"
android:layout_width="match_parent"
android:layout_height="?attr/actionBarSize"
android:background="?attr/colorPrimary"
app:layout_scrollFlags="scroll|exitUntilCollapsed"
app:popupTheme="@style/AppTheme.PopupOverlay">
"@+id/toolbar_center_title"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_gravity="center"
android:textColor="@android:color/white"
android:textSize="@dimen/activity_horizontal_margin" />
.support.v7.widget.Toolbar>
.support.design.widget.AppBarLayout>
"match_parent"
android:layout_height="wrap_content"
app:layout_behavior="@string/appbar_scrolling_view_behavior">
<com.fanyafeng.recreation.refreshview.XRefreshView
android:id="@+id/refreshTwo"
android:layout_width="match_parent"
android:layout_height="match_parent">
.support.v7.widget.RecyclerView
android:id="@+id/rvTwo"
android:layout_width="match_parent"
android:layout_height="match_parent" />
com.fanyafeng.recreation.refreshview.XRefreshView>
.support.design.widget.CoordinatorLayout>