SpringBatch是Spring FrameWork的子项目.据说可以承受千万级的压力.
SpringBatch适合做什么?
1.大规模的数据集需要处理
2.自动化不需要人工干预的
3.可靠性要求较高的
4.在性能上要求较高的
SpringBatch工作时序图
我这里做一个简单SpringBatch的实战,案例是想要做点数据清洗
Spring 3.1,Springbatch 2.1.8,hsqldb 2.2.9
SQL:
CREATE TABLE SYS_APPSTORE ( APP_ID VARCHAR(20) NOT NULL, PARENT_ID VARCHAR(20), APP_DESC VARCHAR(100) NOT NULL, APP_URL VARCHAR(200), FOLDER BOOLEAN, PRIMARY KEY(APP_ID) );java bean:
public class SysAppStore implements Serializable { private final static long serialVersionUID = 19890414L; private String appId = null; private String parentId = null; private String appDesc = null; private String appURL = null; private Boolean folder = null; ...getter,setter... }
Spring JDBC Mapper
public class SysAppStoreMapper implements RowMapper<SysAppStore> { public SysAppStoreMapper() { super(); } @Override public SysAppStore mapRow(ResultSet resultSet, int i) throws SQLException { SysAppStore sysAppStore = new SysAppStore(); sysAppStore.setAppId(resultSet.getString("APP_ID")); sysAppStore.setParentId(resultSet.getString("PARENT_ID")); sysAppStore.setAppDesc(resultSet.getString("APP_DESC")); sysAppStore.setAppURL(resultSet.getString("APP_URL")); sysAppStore.setFolder(resultSet.getBoolean("FOLDER")); return sysAppStore; } }SpringBatch Processer:
public class SysAppStoreProcessor implements ItemProcessor<SysAppStore, SysAppStore> { public SysAppStoreProcessor() { } @Override public SysAppStore process(SysAppStore item) throws Exception { System.out.println(item.getAppDesc()); //这里什么都不做,输出一下 return item; } }
SpringBatch Writer:
public class SysAppStoreWriter implements ItemWriter<SysAppStore> { @Override public void write(List items) throws Exception { for (Object item : items) { System.out.println(item); //也不做任何事 } } }Spring Schema
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:batch="http://www.springframework.org/schema/batch" xmlns:context="http://www.springframework.org/schema/context" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd"> <context:property-placeholder location="classpath:jdbc.properties" /> <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource"> <property name="driverClassName" value="${jdbc.driverClass}" /> <property name="url" value="${jdbc.url}" /> <property name="username" value="${jdbc.username}" /> <property name="password" value="${jdbc.password}" /> </bean> <bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager"> <property name="dataSource" ref="dataSource" /> </bean> <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean" /> <bean id="sysAppStoreMapper" class="net.dbatch.mapper.SysAppStoreMapper" /> <bean id="dbReader" class="org.springframework.batch.item.database.JdbcPagingItemReader"> <property name="dataSource" ref="dataSource"/> <property name="rowMapper" ref="sysAppStoreMapper"/> <property name="queryProvider" ref="appQueryProvider"/> </bean> <bean id="appQueryProvider" class="org.springframework.batch.item.database.support.HsqlPagingQueryProvider"> <property name="selectClause" value="a.APP_ID, a.PARENT_ID, a.APP_DESC, a.APP_URL, a.FOLDER, a.SEQ"/> <property name="fromClause" value="sys_appstore a"/> <property name="sortKey" value="SEQ"/> </bean> <bean id="sysAppStoreProcessor" class="net.dbatch.process.SysAppStoreProcessor" /> <bean id="sysAppStoreWriter" class="net.dbatch.writer.SysAppStoreWriter" /> <bean id="itemSqlParameterSourceProvider" class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" /> <batch:job id="testJdbcBatch"> <batch:step id="firstCleanStep"> <batch:tasklet> <batch:chunk reader="dbReader" processor="sysAppStoreProcessor" writer="sysAppStoreWriter" commit-interval="5" chunk-completion-policy=""/> </batch:tasklet> </batch:step> </batch:job> </beans>测试类:
public class JdbcORMJobMain { public static void main(String[] args) { ApplicationContext context = new ClassPathXmlApplicationContext("jdbcorm_job.xml"); SimpleJobLauncher launcher = new SimpleJobLauncher(); launcher.setJobRepository((JobRepository) context.getBean("jobRepository")); launcher.setTaskExecutor(new SyncTaskExecutor()); try { JobExecution je = launcher.run(context.getBean("testJdbcBatch", Job.class), new JobParametersBuilder().toJobParameters()); System.out.println("======================================================================"); System.out.println(je); System.out.println(je.getJobInstance()); System.out.println(je.getStepExecutions()); } catch (Exception e) { e.printStackTrace(); } } }输出:
10-20 09:20:35 INFO [config.PropertyPlaceholderConfigurer] - <Loading properties file from class path resource [jdbc.properties]> 10-20 09:20:35 INFO [support.DefaultListableBeanFactory] - <Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@2dea1ba6: defining beans [org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0,dataSource,transactionManager,jobRepository,sysAppStoreMapper,dbReader,appQueryProvider,sysAppStoreProcessor,sysAppStoreWriter,itemSqlParameterSourceProvider,org.springframework.batch.core.scope.internalStepScope,org.springframework.beans.factory.config.CustomEditorConfigurer,org.springframework.batch.core.configuration.xml.CoreNamespacePostProcessor,firstCleanStep,testJdbcBatch]; root of factory hierarchy> 10-20 09:20:35 INFO [datasource.DriverManagerDataSource] - <Loaded JDBC driver: org.hsqldb.jdbcDriver> 10-20 09:20:35 INFO [support.SimpleJobLauncher] - <Job: [FlowJob: [name=testJdbcBatch]] launched with the following parameters: [{}]> 10-20 09:20:35 INFO [job.SimpleStepHandler] - <Executing step: [firstCleanStep]> SourceForge 树节点查看 网易163 WEBQQ ITeye net.dbatch.entity.SysAppStore@6944da12[appId=11102880045318725,parentId=11102880044233464,appDesc=SourceForge,appURL=http://sourceforge.net/,folder=false] net.dbatch.entity.SysAppStore@2c1e29ca[appId=11102881323428897,parentId=11102881323057218,appDesc=树节点查看,appURL=index.jsp,folder=false] net.dbatch.entity.SysAppStore@7049a366[appId=11102880050094388,parentId=11102880049448584,appDesc=网易163,appURL=http://mail.163.com/,folder=false] net.dbatch.entity.SysAppStore@7286b721[appId=11102880048511704,parentId=11102880047038128,appDesc=WEBQQ,appURL=http://web.qq.com/,folder=false] net.dbatch.entity.SysAppStore@6a611244[appId=11102880047497417,parentId=11102880047240743,appDesc=ITeye,appURL=http://www.iteye.com/,folder=false] 社区 Intel IBM 微软 软件公司 net.dbatch.entity.SysAppStore@30f224d9[appId=11102880047038128,parentId=11102880016088125,appDesc=社区,appURL=,folder=true] net.dbatch.entity.SysAppStore@69513ba9[appId=11102880041502775,parentId=11102880041300615,appDesc=Intel,appURL=http://www.intel.com/,folder=false] net.dbatch.entity.SysAppStore@54240a43[appId=11102880041149608,parentId=11102880039316139,appDesc=IBM,appURL=http://www.ibm.com/,folder=false] net.dbatch.entity.SysAppStore@a1ddfdd[appId=11102880040025640,parentId=11102880039316139,appDesc=微软,appURL=http://www.microsoft.com/,folder=false] net.dbatch.entity.SysAppStore@2f542b5b[appId=11102880039316139,parentId=11102880038314190,appDesc=软件公司,appURL=,folder=true] 国内 分页显示程序 网易126 新浪微博 CSDN net.dbatch.entity.SysAppStore@e316834[appId=11102880016088125,parentId=Root,appDesc=国内,appURL=,folder=true] net.dbatch.entity.SysAppStore@4db03533[appId=11102881324298312,parentId=11102881323057218,appDesc=分页显示程序,appURL=powerasapp.jsp,folder=false] net.dbatch.entity.SysAppStore@6b74cf1d[appId=11102880050404071,parentId=11102880049448584,appDesc=网易126,appURL=http://mail.126.com/,folder=false] net.dbatch.entity.SysAppStore@41c9b008[appId=11102880049211044,parentId=11102880047038128,appDesc=新浪微博,appURL=http://weibo.com/,folder=false] net.dbatch.entity.SysAppStore@2043fef6[appId=11102880048200884,parentId=11102880047240743,appDesc=CSDN,appURL=http://www.csdn.net/,folder=false] 开源社区 AMD 硬件公司 Apache Google net.dbatch.entity.SysAppStore@100917f0[appId=11102880044233464,parentId=11102880016418917,appDesc=开源社区,appURL=,folder=true] net.dbatch.entity.SysAppStore@450295c9[appId=11102880042470026,parentId=11102880041300615,appDesc=AMD,appURL=http://www.amd.com/,folder=false] net.dbatch.entity.SysAppStore@2cb7e284[appId=11102880041300615,parentId=11102880038314190,appDesc=硬件公司,appURL=,folder=true] net.dbatch.entity.SysAppStore@5c785f0b[appId=11102880045542267,parentId=11102880044233464,appDesc=Apache,appURL=http://www.apache.org/,folder=false] net.dbatch.entity.SysAppStore@62a7fa9a[appId=11102880040236939,parentId=11102880039316139,appDesc=Google,appURL=http://www.google.com/,folder=false] 腾讯 苹果 苹果 Eclipse IT学习 net.dbatch.entity.SysAppStore@70630657[appId=11102880035183022,parentId=11102880031124887,appDesc=腾讯,appURL=http://www.qq.com/,folder=false] net.dbatch.entity.SysAppStore@75357365[appId=11102880040488906,parentId=11102880039316139,appDesc=苹果,appURL=http://www.apple.com/,folder=false] net.dbatch.entity.SysAppStore@82b2801[appId=11102880043182136,parentId=11102880041300615,appDesc=苹果,appURL=http://www.apple.com/,folder=false] net.dbatch.entity.SysAppStore@494f5dd7[appId=11102880046118737,parentId=11102880044233464,appDesc=Eclipse,appURL=http://eclipse.org/,folder=false] net.dbatch.entity.SysAppStore@7999f3da[appId=11102880047240743,parentId=11102880016088125,appDesc=IT学习,appURL=,folder=true] 新浪邮箱 测试连接 授权程序 搜狐 摩托罗拉 net.dbatch.entity.SysAppStore@1d984f10[appId=11102880051055401,parentId=11102880049448584,appDesc=新浪邮箱,appURL=http://mail.sina.com.cn/,folder=false] net.dbatch.entity.SysAppStore@7a6eb29d[appId=11102881323057218,parentId=Root,appDesc=测试连接,appURL=,folder=true] net.dbatch.entity.SysAppStore@7990a036[appId=11102881325080465,parentId=11102881323057218,appDesc=授权程序,appURL=powerasapptree.jsp,folder=false] net.dbatch.entity.SysAppStore@6067794[appId=11102880035434221,parentId=11102880031124887,appDesc=搜狐,appURL=http://www.souhu.com/,folder=false] net.dbatch.entity.SysAppStore@129498a3[appId=11102880044032342,parentId=11102880041300615,appDesc=摩托罗拉,appURL=http://www.motorala.com/,folder=false] 阿里巴巴 Oracle[甲骨文] 邮箱 挂接程序 我的博客 net.dbatch.entity.SysAppStore@6819f939[appId=11102880036079524,parentId=11102880031124887,appDesc=阿里巴巴,appURL=http://www.alibaba.com/,folder=false] net.dbatch.entity.SysAppStore@1394294[appId=11102880044595761,parentId=11102880039316139,appDesc=Oracle[甲骨文],appURL=http://www.oracle.com/,folder=false] net.dbatch.entity.SysAppStore@5642032c[appId=11102880049448584,parentId=11102880016088125,appDesc=邮箱,appURL=,folder=true] net.dbatch.entity.SysAppStore@7de69f2[appId=11102881326070340,parentId=11102881323057218,appDesc=挂接程序,appURL=sysapptree.jsp,folder=false] net.dbatch.entity.SysAppStore@1afd92e7[appId=11102880052400411,parentId=Root,appDesc=我的博客,appURL=http://zhzhenqin.iteye.com/,folder=false] 10-20 09:20:37 INFO [support.SimpleJobLauncher] - <Job: [FlowJob: [name=testJdbcBatch]] completed with the following parameters: [{}] and the following status: [COMPLETED]> ====================================================================== JobExecution: id=0, version=2, startTime=Sat Oct 20 09:20:35 CST 2012, endTime=Sat Oct 20 09:20:37 CST 2012, lastUpdated=Sat Oct 20 09:20:37 CST 2012, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=0, version=0, JobParameters=[{}], Job=[testJdbcBatch]] JobInstance: id=0, version=0, JobParameters=[{}], Job=[testJdbcBatch] [StepExecution: id=1, version=10, name=firstCleanStep, status=COMPLETED, exitStatus=COMPLETED, readCount=35, filterCount=0, writeCount=35 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=8, rollbackCount=0, exitDescription=]可以看到,测试程序每次读取1条数据放入Processor中执行,然后组成5个bean的List一次性放入Writer中执行.然后有一次commit的过程
commit-interval="5"当然,从这点上足看出SpringBatch架构是非常不错的.后面会陆续做一些SpringBatch的分析.