本文为原创博客,仅供技术学习使用。未经允许,禁止将其复制下来上传到百度文库等平台。如有转载请注明本文博客的地址(链接)。
源码或者jar包如有需要请联系:[email protected]
这个项目要抓取的是东方财富网的板块数据。
链接为http://quote.eastmoney.com/center/BKList.html#trade_0_0?sortRule=0
抓包具体请看我之前的博客。
链接为http://blog.csdn.net/qq_22499377/article/details/78114734
本文使用的框架,如下图所示:
db:主要放的是数据库操作文件,包含MyDataSource和MYSQLControl。
model:用来封装对象,将要操作的对象的属性封装起来。
parse:这里面存放的是针对util获取的文件,进行解析,一般采用Jsoup解析。
main:程序起点,用来获取数据,执行数据库语句以及存放数据。
job:用来执行的job任务。
jobmain:控制器,即设定执行一次job的时间。股票数据每天下午3点钟收盘,即设置为3点钟以后的某个时间点开始爬行相关股票数据。
model用来封装我要爬的数据,如当天的日期,板块的id,板块的名称,板块价格等等。如下面程序:
package model;
//创建对象及我们爬取的数据内容包含以下字段
public class ExtMarketPlateModel {
private String date;
private String plate_rank;
private String plate_id;
private String plate_name;
private float plate_price;
private float plate_change;
private float plate_range;
private String plate_market_value;
private float plate_turnover_rate;
private String craw_time;
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getPlate_rank() {
return plate_rank;
}
public void setPlate_rank(String plate_rank) {
this.plate_rank = plate_rank;
}
public String getPlate_id() {
return plate_id;
}
public void setPlate_id(String plate_id) {
this.plate_id = plate_id;
}
public String getPlate_name() {
return plate_name;
}
public void setPlate_name(String plate_name) {
this.plate_name = plate_name;
}
public float getPlate_price() {
return plate_price;
}
public void setPlate_price(float plate_price) {
this.plate_price = plate_price;
}
public float getPlate_change() {
return plate_change;
}
public void setPlate_change(float plate_change) {
this.plate_change = plate_change;
}
public float getPlate_range() {
return plate_range;
}
public void setPlate_range(float plate_range) {
this.plate_range = plate_range;
}
public String getPlate_market_value() {
return plate_market_value;
}
public void setPlate_market_value(String plate_market_value) {
this.plate_market_value = plate_market_value;
}
public float getPlate_turnover_rate() {
return plate_turnover_rate;
}
public void setPlate_turnover_rate(float plate_turnover_rate) {
this.plate_turnover_rate = plate_turnover_rate;
}
public String getCraw_time() {
return craw_time;
}
public void setCraw_time(String craw_time) {
this.craw_time = craw_time;
}
}
在写程序之前,先根据model的属性来建立数据表。建表的时候,一定要记得注明每一个属性的真实含义,以便以后的人可以轻松交接。
CREATE TABLE `ext_market_plate` (
`date` date NOT NULL COMMENT '当天日期',
`plate_rank` char(20) NOT NULL COMMENT '板块排名',
`plate_id` char(20) NOT NULL COMMENT '板块代码',
`plate_name` char(50) DEFAULT NULL COMMENT '板块名称',
`plate_price` float(10,2) DEFAULT NULL COMMENT '板块最新价格',
`plate_change` float(10,2) DEFAULT NULL COMMENT '涨跌额',
`plate_range` float(10,4) DEFAULT NULL COMMENT '涨跌幅度',
`plate_market_value` char(50) DEFAULT NULL COMMENT '总市值',
`plate_turnover_rate` float(10,4) DEFAULT NULL COMMENT '换手率',
`craw_time` datetime DEFAULT NULL COMMENT '爬取时间',
`update_time` datetime DEFAULT NULL COMMENT '更新时间',
`extract_time` datetime DEFAULT NULL COMMENT '抽取时间',
PRIMARY KEY (`date`,`plate_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
main,如下面程序:
package navi.main;
import java.util.ArrayList;
import java.util.List;
import db.MYSQLControl;
import model.ExtMarketPlateModel;
import model.ExtMarketPlateModel;
import parse.ExtMarketPlateParse;
//以下程序采集的是股票板块数据
public class ExtMarketPlateMain {
public static void main(String[] args) throws Exception {
List plate =new ArrayList();
//板块股票的地址
String url="";
url="http://nufm.dfcfw.com/EM_Finance2014NumericApplication/JS.aspxtype=CT&cmd=C._BKHY&sty=FPGBKI&st=c&sr=-1&p=1&ps=5000&cb=&js=var%20BKCache[(x)]&token=7bc05d0d4c3c22ef9fca8c2a912d779c&v=0.3196612374630905";
plate=ExtMarketPlateParse.parseurl(url);
MYSQLControl.insertoilStocks(plate);
}
}
这里有三个文件,HTTPUtils,TimeUtils,UumericalUtil。
HTTPUtils的程序如下:
package util;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
//采用URLConnection获取响应的html文件或json文件
public abstract class HTTPUtils {
public static String getRawHtml(String personalUrl) throws InterruptedException,IOException {
URL url = new URL(personalUrl);
URLConnection conn = url.openConnection();
InputStream in=null;
try {
conn.setConnectTimeout(3000);
in = conn.getInputStream();
} catch (Exception e) {
}
//将获取的数据转化为String
String html = convertStreamToString(in);
return html;
}
//这个方法是将InputStream转化为String
public static String convertStreamToString(InputStream is) throws IOException {
if (is == null)
return "";
BufferedReader reader = new BufferedReader(new InputStreamReader(is,"utf-8"));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
sb.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
reader.close();
return sb.toString();
}
}
TimeUtils主要是各种日期的转化,如String转化为date,获取当前时间等等。以后别的地方需要用到也可以拿去直接用。
package util;
import java.text.DateFormat;
import java.text.DecimalFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.List;
public class TimeUtils {
public static void main( String[] args ) throws ParseException{
String time = getMonth("2002-1-08 14:50:38");
System.out.println(time);
System.out.println(getDay("2002-1-08 14:50:38"));
System.out.println(TimeUtils.parseTime("2016-05-19 19:17","yyyy-MM-dd HH:mm"));
}
//get current time
public static String GetNowDate(String formate){
String temp_str="";
Date dt = new Date();
SimpleDateFormat sdf = new SimpleDateFormat(formate);
temp_str=sdf.format(dt);
return temp_str;
}
public static String getMonth( String time ){
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM");
Date date = null;
try {
date = sdf.parse(time);
Calendar cal = Calendar.getInstance();
cal.setTime(date);
} catch (ParseException e) {
e.printStackTrace();
}
return sdf.format(date);
}
public static String getDay( String time ){
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
Date date = null;
try {
date = sdf.parse(time);
Calendar cal = Calendar.getInstance();
cal.setTime(date);
} catch (ParseException e) {
e.printStackTrace();
}
return sdf.format(date);
}
public static Date parseTime(String inputTime) throws ParseException{
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date date = sdf.parse(inputTime);
return date;
}
public static String dateToString(Date date, String type) {
DateFormat df = new SimpleDateFormat(type);
return df.format(date);
}
public static Date parseTime(String inputTime, String timeFormat) throws ParseException{
SimpleDateFormat sdf = new SimpleDateFormat(timeFormat);
Date date = sdf.parse(inputTime);
return date;
}
public static Calendar parseTimeToCal(String inputTime, String timeFormat) throws ParseException{
SimpleDateFormat sdf = new SimpleDateFormat(timeFormat);
Date date = sdf.parse(inputTime);
Calendar calendar = Calendar.getInstance();
calendar.setTime(date);
return calendar;
}
public static int getDaysBetweenCals(Calendar cal1, Calendar cal2) throws ParseException{
return (int) ((cal2.getTimeInMillis()-cal1.getTimeInMillis())/(1000*24*3600));
}
public static Date parseTime(long inputTime){
// SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date date= new Date(inputTime);
return date;
}
public static String parseTimeString(long inputTime){
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date date= new Date(inputTime);
return sdf.format(date);
}
public static String parseStringTime(String inputTime){
String date=null;
try {
Date date1 = new SimpleDateFormat("yyyyMMddHHmmss").parse(inputTime);
date=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(date1);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return date;
}
public static List YearMonth(int year) {
List yearmouthlist=new ArrayList();
for (int i = 1; i < 13; i++) {
DecimalFormat dfInt=new DecimalFormat("00");
String sInt = dfInt.format(i);
yearmouthlist.add(year+sInt);
}
return yearmouthlist;
}
public static List YearMonth(int startyear,int finistyear) {
List yearmouthlist=new ArrayList();
for (int i = startyear; i < finistyear+1; i++) {
for (int j = 1; j < 13; j++) {
DecimalFormat dfInt=new DecimalFormat("00");
String sInt = dfInt.format(j);
yearmouthlist.add(i +"-"+sInt);
}
}
return yearmouthlist;
}
public static List TOAllDay(int year){
List daylist=new ArrayList();
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
int m=1;//月份计数
while (m<13)
{
int month=m;
Calendar cal=Calendar.getInstance();//获得当前日期对象
cal.clear();//清除信息
cal.set(Calendar.YEAR,year);
cal.set(Calendar.MONTH,month-1);//1月从0开始
cal.set(Calendar.DAY_OF_MONTH,1);//设置为1号,当前日期既为本月第一天
System.out.println("##########___" + sdf.format(cal.getTime()));
int count=cal.getActualMaximum(Calendar.DAY_OF_MONTH);
System.out.println("$$$$$$$$$$________" + count);
for (int j=0;j<=(count - 2);)
{
cal.add(Calendar.DAY_OF_MONTH,+1);
j++;
daylist.add(sdf.format(cal.getTime()));
}
m++;
}
return daylist;
}
//获取昨天的日期
public static String getyesterday(){
Calendar cal = Calendar.getInstance();
cal.add(Calendar.DATE, -1);
String yesterday = new SimpleDateFormat( "yyyy-MM-dd ").format(cal.getTime());
return yesterday;
}
}
UumericalUtil,股票价格需要保留几位小数,这个类的作用就是保留几位小数。
package util;
import java.math.BigDecimal;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.List;
public class UumericalUtil {
public static float FloatTO(float f, int number) {
BigDecimal b = new BigDecimal(f);
float f1 = b.setScale(number, BigDecimal.ROUND_HALF_UP).floatValue();
return f1;
}
public static String NumberTO(int number) {
DecimalFormat dfInt=new DecimalFormat("00");
String sInt = dfInt.format(number);
System.out.println(sInt);
return sInt;
}
}
parse主要是通过Jsoup或者其他工具来解析html文件。并将解析后的数据,封装在List集合中,将数据通过层层返回到main方法中。如这里采用最简单的字符串解析的方式。
package parse;
import java.util.ArrayList;
import java.util.List;
import model.ExtMarketPlateModel;
import util.HTTPUtils;
import util.TimeUtils;
import util.UumericalUtil;
public class ExtMarketPlateParse {
public static List parseurl(String url) throws Exception {
List list=new ArrayList();
//根据网址获取html文件
String response=HTTPUtils.getRawHtml(url);
String html = response.toString();
//解析html文件,并存储在集合中
String jsonarra=html.split("BKCache=")[1];
String plates[]=jsonarra.split("\",");
List platelist=new ArrayList();
for (int i = 0; i < plates.length; i++) {
platelist.add(plates[i].replace("[\"", "").replace("\"", "").replace("]", ""));
//System.out.println(plates[i].replace("[\"", "").replace("\"", "").replace("]", ""));
}
for (int i = 0; i < platelist.size(); i++) {
String date=TimeUtils.GetNowDate("yyyy-MM-dd");
String plate_rank=Integer.toString(i+1);
String plate_id=platelist.get(i).split(",")[1];
String plate_name=platelist.get(i).split(",")[2];
float plate_price=0;
float plate_change=0;
float plate_range=0;
String plate_market_value=null;
float plate_turnover_rate=0;
if (!platelist.get(i).split(",")[3].equals("-")) {
//价格 plate_price=Float.parseFloat(platelist.get(i).split(",")[18]);
//涨跌额
plate_change=Float.parseFloat(platelist.get(i).split(",")[19]);
//涨跌幅
plate_range=UumericalUtil.FloatTO((float) (Float.parseFloat(platelist.get(i).split(",")[3].replace("%", ""))*0.01),4);
plate_market_value=platelist.get(i).split(",")[4];
System.out.println(plate_market_value);
plate_turnover_rate=UumericalUtil.FloatTO((float) (Float.parseFloat(platelist.get(i).split(",")[5].replace("%", ""))*0.01),4);;
//System.out.println(plate_lz_range);
}
String craw_time=TimeUtils.GetNowDate("yyyy-MM-dd HH:mm:ss");
ExtMarketPlateModel model=new ExtMarketPlateModel();
model.setDate(date);
model.setPlate_rank(plate_rank);;
model.setPlate_id(plate_id);
model.setPlate_name(plate_name);;
model.setPlate_price(plate_price);;
model.setPlate_change(plate_change);;
model.setPlate_range(plate_range);;
model.setPlate_market_value(plate_market_value);;
model.setPlate_turnover_rate(plate_turnover_rate);;
model.setCraw_time(craw_time);
list.add(model);
}
//返回集合
return list;
}
}
db中包含两个java文件:MyDataSource,MYSQLControl。MyDataSource用来进行数据库驱动注册、连接数据库的用户名、密码,
MYSQLControl用来连接数据库,插入操作、更新操作、建表操作。
MyDataSource的程序如下:
package db;
import javax.sql.DataSource;
import org.apache.commons.dbcp2.BasicDataSource;
public class MyDataSource {
public static DataSource getDataSource(String connectURI){
BasicDataSource ds = new BasicDataSource();
//MySQL的jdbc驱动
ds.setDriverClassName("com.mysql.jdbc.Driver");
ds.setUsername("root"); //所要连接的数据库名
ds.setPassword("123456"); //MySQL的登陆密码
ds.setUrl(connectURI);
return ds;
}
}
MYSQLControl的程序如下:
package db;
import java.sql.SQLException;
import java.util.List;
import javax.sql.DataSource;
import org.apache.commons.dbutils.QueryRunner;
import org.apache.commons.dbutils.ResultSetHandler;
import org.apache.commons.dbutils.handlers.BeanListHandler;
import org.apache.commons.dbutils.handlers.ColumnListHandler;
import org.apache.commons.dbutils.handlers.ScalarHandler;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import model.ExtMarketPlateModel;
public class MYSQLControl {
static final Log logger = LogFactory.getLog(MYSQLControl.class);
//设置数据库地址,及所需数据库
static DataSource ds = MyDataSource.getDataSource("jdbc:mysql://127.0.0.1:3306/gupiaobankuai?useUnicode=true&characterEncoding=UTF-8");
static QueryRunner qr = new QueryRunner(ds);
//第一类方法
public static void executeUpdate(String sql){
try {
qr.update(sql);
} catch (SQLException e) {
logger.error(e);
}
}
//此种数据库操作方法需要优化
public static int insertoilStocks ( List plate ) {
Object[][] params = new Object[plate.size()][12];
int c = 0; //success number of update
int[] sum;
for ( int i = 0; i < plate.size(); i++ ){
params[i][0] = plate.get(i).getDate();
params[i][1] = plate.get(i).getPlate_rank();
params[i][2] = plate.get(i).getPlate_id();
params[i][3] = plate.get(i).getPlate_name();
params[i][4] = plate.get(i).getPlate_price();
params[i][5] = plate.get(i).getPlate_change();
params[i][6] = plate.get(i).getPlate_range();
params[i][7] = plate.get(i).getPlate_market_value();
params[i][8] = plate.get(i).getPlate_turnover_rate();
params[i][9] = plate.get(i).getCraw_time();
params[i][10] = null;
params[i][11] = null;
}
QueryRunner qr = new QueryRunner(ds);
try {
sum = qr.batch("INSERT INTO `gupiaobankuai`.`ext_market_plate` VALUES (?,?,?,?,?,?,?,?,?,?,?,?)", params);
} catch (SQLException e) {
System.out.println(e);
}
System.out.println("板块数据入库完毕");
return c;
}
}
股票数据有点特殊,因为只有周一到周五才需要爬取,这个我们用定时操作来自动爬取数据。还有一个关键点在于股票有不开盘的日子,当网页中存在此数据,即网页中的显示,没有时间标签,这个我们在建数据表的时候就想到了,所以在建表时设置爬取当天的日期和板块id作为联合主键。
job程序如下:
package job;
import java.util.ArrayList;
import java.util.List;
import org.quartz.Job;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;
import db.MYSQLControl;
import model.ExtMarketPlateModel;
import parse.ExtMarketPlateParse;
import util.TimeUtils;
public class ExtMarketPlateJob implements Job {
@Override
public void execute(JobExecutionContext arg0) throws JobExecutionException {
//加入判断是否为节假日
String yesterday=TimeUtils.getyesterday();
List randomlist = MYSQLControl.getListInfoBySQL("select plate_id,plate_price,plate_change from ext_market_plate where date='"+yesterday+"' ORDER BY rand() LIMIT 3",ExtMarketPlateModel.class);
//表格更新时间
List plate=new ArrayList();
String url="http://nufm.dfcfw.com/EM_Finance2014NumericApplication/JS.aspx?type=CT&cmd=C._BKHY&sty=FPGBKI&st=c&sr=-1&p=1&ps=5000&cb=&js=var%20BKCache=[(x)]&token=7bc05d0d4c3c22ef9fca8c2a912d779c&v=0.3196612374630905";
int judge=0;
try {
plate=ExtMarketPlateParse.parseurl(url);
} catch (Exception e) {
e.printStackTrace();
}
for (int j = 0; j < plate.size(); j++) {
String plate_id=plate.get(j).getPlate_id();
float plate_price=plate.get(j).getPlate_price();
if (plate_id.equals(randomlist.get(0).getPlate_id())) {
if (plate_price==randomlist.get(0).getPlate_price()) {
judge++;
}
}
if (plate_id.equals(randomlist.get(1).getPlate_id())) {
if (plate_price==randomlist.get(1).getPlate_price()) {
judge++;
}
}
if (plate_id.equals(randomlist.get(2).getPlate_id())) {
if (plate_price==randomlist.get(2).getPlate_price()) {
judge++;
}
}
}
if (judge!=3) {
MYSQLControl.insertoilStocks(plate);
}
}
}
jobmain程序如下:
package jobmain;
import static org.quartz.CronScheduleBuilder.cronSchedule;
import static org.quartz.JobBuilder.newJob;
import static org.quartz.TriggerBuilder.newTrigger;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.quartz.CronTrigger;
import org.quartz.JobDetail;
import org.quartz.Scheduler;
import org.quartz.SchedulerFactory;
import org.quartz.impl.StdSchedulerFactory;
import job.ExtMarketPlateJob;
//以下是定时操作任务,每周一到周五下午3点半去爬相关股票数据
public class ExtMarketPlateJobMain {
public void go() throws Exception {
// 首先,必需要取得一个Scheduler的引用
SchedulerFactory sf = new StdSchedulerFactory();
Scheduler sched = sf.getScheduler();
//jobs可以在scheduled的sched.start()方法前被调用
JobDetail job = newJob(ExtMarketPlateJob.class).withIdentity("platejob", "plategroup").build();
//每周一到周五15点30分开始
CronTrigger trigger = newTrigger().withIdentity("platetrigger", "plategroup").withSchedule(cronSchedule("0 30 15 ? * MON-FRI")).build();
Date ft = sched.scheduleJob(job, trigger);
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss SSS");
System.out.println(job.getKey() + " 已被安排执行于: " + sdf.format(ft) + ",并且以如下重复规则重复执行: " + trigger.getCronExpression());
sched.start();
}
public static void main(String[] args) throws Exception {
ExtMarketPlateJobMain maingo = new ExtMarketPlateJobMain();
maingo.go();
}
}
友情提醒一下:该项目要导入相应的jar包或者写pom.xml直接从网上下载jar包。