一、批处理
我们在一些项目中如:银行、保险、零商业门店系统中的对帐、结帐、核算、日结等操作中经常会碰到一些"批处理“作业。
这些批处理经常会涉及到一些大数据处理,同时处理一批增、删、改、查等SQL,往往涉及到好几张表,这边取点数据那边写点数据,运行一些存储过程等。
批处理往往耗时、耗资源,往往还会用到多线程去设计程序代码,有时处理不好还会碰到内存泄漏、溢出、不够、CPU占用高达99%,服务器被严重堵塞等现象。
笔者曾经经历过一个批处理的3次优化,该批处理笔者按照数据库连接池的原理实现了一个线程池,使得线程数可以动态设定,不用的线程还可还回线程池,其过程经历了加入cache等操作,最后连负载截均衡都想到了。
最终虽然取得了较满意的结果,但是当中不断的优化程序、算法上耗时彼多,尤其是想到了负截匀衡。大家知道,一个web或者是一个app容器当达到极限时可以通过加入集群以及增加节点的手段来提高总体的处理能力,但是这个批处理往往是一个应用程序,要把应用程序做成集群通过增加节点的方式来提高处理能力可不是一件简单的事,对吧?
当然我不是说不能实现,硬代码通过SOCKET通讯利用MQ机制加数据库队列是一种比较通用的设计手段,可以做到应用程序在处理大数据大事务时的集群这样的能力。
但是对于一些较大型的商业客户尤其是银行、保险、大型零售行业或者是电信等客户,他们都是有成品套件的,你自己设计的具有集群能力的批处理固然有成就感,但是从另外一个方面来说,从稳定性、可用性、维护性来说相信同类的现成的成熟商业用品一定要超过你自己编写的批处理程序吧?
因为我这套架构师的道路走的是企业级架构师路线,因此也不得不经常提到一些商业级解决方案,对于自己设计这个具有集群扩展能力的批处理我会放到以后去讲,今天主要讲的就是使用商业成品来完成你的批处理的集成。
二、商业级解决方案
来看一个批处理的需求:
1.能够做批处理
2. 能够通过增加节点来提高批处理的能力,相当于集群
3. 具有错误重跑的能力
4. 断点处理能力,比如说5000批次作业,我跑了2000批后失败了,这时后面3000批在我做了一些调整后会接着前面的2000批继续跑下去
5. 完善的日志、监督、暂停(挂起)、定时(嘿嘿,这个夸张)跑批处理的能力
我们先用传统的自己手工来做批处理的设计思想来考虑这个需求,至少要用到下面几个技术:
1. 我们得写个线程池,就和我上面提到过的自己按照jdbc connection pool的原理去写这个线程池,快手的话2周至少要得吧,对吧?
2. 集成JMS或者是MQ机制,使得该程序具有节点间通讯能力这样就能做到负载均衡了。
3. 将任务做任务记录持久化到数据库,这样可以做到错误记录、断点、重跑等功能
4. 得要用Quartz类似的组件来实现这个”定时,定期跑批”的功能吧
好了,上述这些需求就够你做一个工程了,不是吗?
我现在告诉你,这个批处理的需求是一个最低层次的商业级批处理作业的需求,如果用我以前工程上涉及到的完整需求,这可写上10几页(仅需求部分)。
大家知道在做大型客户如:银行、保险等项目时,这些客户是怎么考虑的吗?
首先,你如果告诉他你需要1个月左右来做这个批处理(包含测试,这个已经是非常快的速度了),它会告诉你它只给你1周或者2周时间(最多了)。
因为客户认为,批处理无非就是输入->处理->输出就完了,怎么处理的,有这么复杂吗,嘿嘿,客户一般认为它的需求都是很简单的,开发商往往都是喜欢多要时间这样可以多算人力成本。
但是看官们请注意,客户这些考虑也是没有错的,是的,批处理从大的方面来说就是read->handle->write是这样的,至于read数据库+read文件再+read socket对于客户来说,它就认为只有一个read,这其中的苦处啊,只有我们程序员自己知道。
现在好了,目前已经有成熟的商业成品套件来实现上面所有的这些需求了,它让程序员只需要关注:
- how to read
- how to handle(your own logic)
- how to write
就行了,至于其它的事,它来帮你做。
开源的有spring batch,不过spring batch走的是云计算道路,商业产品中比较有名的就是这个IBM Websphere Compute Grid。
三、IBM Websphere Compute Grid介绍
3.1 基本概念
这边是compute而不是computer啊,不要搞错了。
这个IBM Websphere Compute Grid简称WAS XD CG,是属于IBM Websphere套件中的附属品,它是架构在WASND的基础上来实现批处理作业的.
它利用WASND的功能,因为IBM的WASND是支持网格规划的,WASND不仅可以把单个节点与节点间进行联合,甚至可以把几个联结起来的节点-称为cell(单元),cell与cell之间再进行联合。
而IBM Compute Grid就是架构在这样的基础之上,对于提高批处理的能力来说无非就是再布署一个基于WASND的节点或者是WASND的单元。
所以这边有一个概念需要注意啦,它是基于WASND的。
而IBM Compute Grid的安装也与相对应的WASND的版本必须绝对的对应的,哪怕两者间的小版本号有差异都不能安装,下面是经本人测试了各个WAS与IBM Compute Grid后得出的一个IBM Compute Grid与WAS产品的版本对应关系:
当然,如果你手上比较含酸,没有WASND只有WAS也可以用WAS来安装IBM Compute Grid,但是版本一定要按照我的这个对照表去下载IBM Compute Grid啊。
3.2 IBM Compute Grid的运作原理
挺复杂一张图,没什么难看懂的,抓关键。
- 它的运作原理就是作业流程自定义,相当于一个批处理作业写完后,它有先后的执行步骤,a, b, c, d,我现在的执行步骤是a->b->c->d,但下次我的执行顺序是:a->c->d->b,对吧。
这个作业步骤或者我们说流程是可以自定义的,你不要和我说你因为作业步骤的执行顺序变了你去改你的代码改你的if else哦,这个就是传统的思维了。
商业的吗,它讲究的就是一个:
业务快速实现、业务直接可由业务人员制定、业务直接转换成IT资产并且快速推向市场、垂直的业务解决方案能力
请牢牢记住上面这一长溜红色加粗的方案,它是你作为一个企业级架构师将来奋斗的目标,死写代码是没有用的,理念同样重要。
- 并行的作业处理能力,就是我说的具有通过增加一个节点就可以提高你的作业处理能力的功能。
- 它的作业流程的制定其实和制定我们的webservice一样,是一个描述性文档,将这个描述性文档(通常是一个xml文件)即我们的作业描述上传到IBM Compute Grid作业中心后,即可实现定时啦、重跑啦、断点续跑啦等等等等功能。
- 批处理作业也作为了“企业IT资产”的一部分-IT ASSET,在WAS中我们的IT ASSET典型的为一个.ear布署包,即你的批处理啊,以后我们的批处理就是是一个.ear的包,要布署到WAS中去的。
喏,下面这张是IBM Compute Grid的部件图
此处的xJCL就是一个XML格式的作业描述文件,它是以EJB形式并且以.ear包布署入WAS的。
它不是类似于web service吗,web service的入口被称为end point对吧?它也有end point,它叫grid end point即GEE。
GEE通过PJM即Parallel Job Manager并行作业管理器来协调和实现集群的,这一切都是布署在WAS上的。
下面这张是IBM Compute Grid的作业逻辑流程图
四、IBM Compute Grid的安装、开发与布署
4.1 安装IBM Compute Grid
在这个例子里我们使用WAS7.x+IBM Compute Grid6.1.x来做,都可以在IBM的官网上下载到Trial Version,当然你也可以使用WASND8.0 trial version与IBM Compute Gridv8.0来做测试。
这边我们需要安装三样东西:
- IBM WAS7.0
- IBM Computev6114
- IBM Compute Grid的开发框架,该框架使用ANT可帮助你通过ANT STEP1, ANT STEP2, ANT STEP3这样一步步实现你的xJCL文件,生成EAR,自动布署,简化你的开发,使得程序开发人员只要专注于写它们的reader, handler, writer就行了。
4.1.1 安装WAS
安装完WAS后不要忘了建一个profile。
然后把我们的WAS打一下补丁,打补丁前不要忘了先装IBM WAS 7.0.0.23 Updater
装完后在开始菜单选择这个选项来把WAS7.X的最新补丁(IBM官网去下)打上:
打完补丁后就可以安装IBM Compute Grid了.
4.2 安装IBM Compute Grid
去官网下载这个包:
安装时它会自动认出你在WAS中建立的profile,一般WAS的profile都是带有一个AppServer的对吧?喏,它会自动认出
在下面这一步请先把这个自动认出的WAS中的profile里的AppServer的自动勾选改成“不勾选”,即我们不需要在Compute Grid安装时自动augment(扩展)我们的节点:
下一步下一步直到开始安装
安装结束后不要勾选“start my first step to validation”这一选项,把这个选项取掉后可以直接点“finish"按钮。
装完后把IBM Compute Gridv6114升级到6115,也用我们的IBM WAS Updater来安装,你要在官网先下载这个补丁:
点下一步
然后把刚才下载的IBM Compute Grid 6115补丁(一个.pak文件)选入进updater manager后进行打补丁的操作
上面步骤全部执行完毕后,在开始菜单选择IBM WAS的菜单,在菜单中选择下面这一项:
弹出下面这个界面
点“launch profile management tool”这个按钮
选择你要扩展的profile中的AppServer后点“Augment”按钮,然后下一步
下一步
最后点"Augment"开始在我们的IBM WAS的profile中安装IBM Compute Grid组件
最后到了这个界面,按照我界面的选项,不要勾选任何选项,然后点“finish”完成安装
这时你可以看到你的WAS中除了原有的你建立的profile外,还会多出一项:
可以进入IBM Compute Grid的job management console
这就代表了我们的IBM Compute Grid已经正确的安装在了我们的WAS中了。
4.2 开发我们的第一个基于IBM Compute Grid的批处理程序
这个程序很简单:
- 读一个文件文件,几十万行。
- 进行简单的整理程序,即把读入的文本文件中的列按照数据库内的表结构整理成一个java bean,java bean中每个字段对应着数据库表中的每一个字段。
- 将读入的几十万行写入我们的数据库中去。
下面是我们的批处理的流程,够简单吧。
4.2.1 下载IBM Compute Grid开发框架IBM Compute Grid Batch Development Frameworkv.6.1.1.1.1
你可以通过官网下载,下载网址如下:
我回头也会上传到我的博客的资源中去.
这是一个eclipse工程, 可以直接导入eclipse的workspace中去。
我们要读入的文本文件放在data目录,里面有一个input.txt文件,内容如下:
Date,Open,High,Low,Close,Volume,Adj Close
2008-02-19,106.94,107.62,104.64,105.00,7376400,105.00
2008-02-15,105.27,106.25,105.00,106.16,6235600,106.16
2008-02-14,107.94,108.50,105.50,106.13,7340600,106.13
2008-02-13,107.13,108.93,106.80,108.42,6608200,108.42
2008-02-12,105.16,107.33,104.70,106.53,7650200,106.53
2008-02-11,103.05,105.55,102.87,105.14,6098300,105.14
2008-02-08,102.19,103.67,102.07,103.27,6085600,103.27
2008-02-07,102.89,104.00,100.60,102.34,11255800,102.34
2008-02-06,105.05,106.49,103.58,103.59,8265700,103.59
2008-02-05,107.06,108.05,104.68,105.02,9048900,104.62
2008-02-04,108.67,109.00,107.23,107.93,5985500,107.52
2008-02-01,107.16,109.40,105.86,109.08,8047100,108.66
2008-01-31,104.21,107.97,103.70,107.11,9054600,106.70
2008-01-30,105.85,107.65,104.86,105.65,7431100,105.25
2008-01-29,105.50,106.80,104.60,106.10,6616200,105.70
2008-01-28,104.44,105.77,103.83,104.98,7858500,104.58
我差不多定义了60万行吧。
在这边我使用的是mysql数据库,因此我把mysql相关的jdbc jar放入到了工程的lib目录中去了
接下去我要修改props.simulator文件夹中的Echo.props文件
# (C) Copyright IBM Corp. 2008 - All Rights Reserved.
# DISCLAIMER:
# The following source code is sample code created by IBM Corporation.
# This sample code is provided to you solely for the purpose of assisting you
# in the use of the product. The code is provided 'AS IS', without warranty or
# condition of any kind. IBM shall not be liable for any damages arising out of your
# use of the sample code, even if IBM has been advised of the possibility of
# such damages.
job-name=Echo
application-name=Echo
#The following property references the WebSphere XD Compute Grid provided batch controller EJB
#when run in the batch simulator, this actually specifies a pojo wrapper class to the batch step.
#When you deploy this to a batch container running within an application server, this JNDI name
#has to be updated to reference the controller EJB for this step (which is generated for you by
#the batch packager).
controller-jndi-name=ejb/com/ibm/ws/batch/EchoBatchController
##################################################################
# The utilityjars property specifies libraries required by
# this job.
#
# NOTE: this property is used only by the WSBatchPackager utility,
# which is used to create an ear file for deploying this
# batch application.
#
utilityjars=../lib/batchframework.jar;../lib/ibmjzos-1.4.jar
checkpoint-algorithm=com.ibm.wsspi.batch.checkpointalgorithms.RecordbasedBase
checkpoint-algorithm-prop.recordcount=1000
#Input Stream declarations
bds.inputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.FileByteReader
bds-prop.inputStream.PATTERN_IMPL_CLASS=com.ibm.batch.streams.inputstreams.EchoReader
bds-prop.inputStream.FILENAME=${echo.data}/input.txt
bds-prop.inputStream.debug=false
bds-prop.inputStream.EnablePerformanceMeasurement=false
bds-prop.inputStream.EnableDetailedPerformanceMeasurement=false
#data transformation declarations
batch_bean-name=IVTStep1
batch-bean-jndi-name=ejb/GenericXDBatchStep
batch-step-class=com.ibm.websphere.batch.devframework.steps.technologyadapters.GenericXDBatchStep
#batch-bean-jndi-name=ejb/com.ibm.websphere.batch.devframework.steps.technologyadapters.GenericXDBatchStep
prop.BATCHRECORDPROCESSOR=com.ibm.batch.steps.Echo
prop.debug=false
prop.EnablePerformanceMeasurement=false
prop.EnableDetailedPerformanceMeasurement=false
#Output stream declarations
#bds.outputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.FileByteWriter
bds.outputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.LocalJDBCWriter
bds-prop.outputStream.PATTERN_IMPL_CLASS=com.ibm.batch.streams.outputstreams.EchoWriter
#oracle
#bds-prop.outputStream.jdbc_url=jdbc:oracle:thin:@localhost:1521:ymkorcl
#bds-prop.outputStream.jdbc_driver=oracle.jdbc.OracleDriver
#bds-prop.outputStream.userid=ymk
#bds-prop.outputStream.pswd=password_1
bds-prop.outputStream.jdbc_url=jdbc:mysql://localhost:3306/eltdb?useUnicode=true&characterEncoding=utf8
bds-prop.outputStream.jdbc_driver=com.mysql.jdbc.Driver
bds-prop.outputStream.userid=elt
bds-prop.outputStream.pswd=password_1
bds-prop.outputStream.tablename=t_grid_output_test
bds-prop.outputStream.FILENAME=${echo.data}/output.txt
bds-prop.outputStream.AppendJobIdToFileName=false
bds-prop.outputStream.EnablePerformanceMeasurement=false
bds-prop.outputStream.EnableDetailedPerformanceMeasurement=false
bds-prop.outputStream.debug=false
注意这几句:
#Input Stream declarations
bds.inputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.FileByteReader
bds-prop.inputStream.PATTERN_IMPL_CLASS=com.ibm.batch.streams.inputstreams.EchoReader
bds-prop.inputStream.FILENAME=${echo.data}/input.txt
bds.outputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.LocalJDBCWriter
bds-prop.outputStream.PATTERN_IMPL_CLASS=com.ibm.batch.streams.outputstreams.EchoWriter
bds-prop.outputStream.jdbc_url=jdbc:mysql://localhost:3306/eltdb?useUnicode=true&characterEncoding=utf8
bds-prop.outputStream.jdbc_driver=com.mysql.jdbc.Driver
bds-prop.outputStream.userid=myuserid
bds-prop.outputStream.pswd=password_1
通过这个properties文件我们可以得知我们的程序主要由3个类组成,它们分别是:
- com.ibm.batch.streams.inputstreams.EchoReader
- com.ibm.batch.streams.outputstreams.EchoWriter
- com.ibm.batch.steps.Echo
这三个类的调用是离散的,只是通过Echo.props文件中的描述:
第一步:读入数据
#Input Stream declarations
bds.inputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.FileByteReader
bds-prop.inputStream.PATTERN_IMPL_CLASS=com.ibm.batch.streams.inputstreams.EchoReader
bds-prop.inputStream.FILENAME=${echo.data}/input.txt
bds-prop.inputStream.debug=false
bds-prop.inputStream.EnablePerformanceMeasurement=false
bds-prop.inputStream.EnableDetailedPerformanceMeasurement=false
#data transformation declarations
batch_bean-name=IVTStep1
batch-bean-jndi-name=ejb/GenericXDBatchStep
batch-step-class=com.ibm.websphere.batch.devframework.steps.technologyadapters.GenericXDBatchStep
#batch-bean-jndi-name=ejb/com.ibm.websphere.batch.devframework.steps.technologyadapters.GenericXDBatchStep
第二步:处理数据
prop.BATCHRECORDPROCESSOR=com.ibm.batch.steps.Echo
prop.debug=false
prop.EnablePerformanceMeasurement=false
prop.EnableDetailedPerformanceMeasurement=false
第三步:写数据
#Output stream declarations
#bds.outputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.FileByteWriter
bds.outputStream=com.ibm.websphere.batch.devframework.datastreams.patterns.LocalJDBCWriter
bds-prop.outputStream.PATTERN_IMPL_CLASS=com.ibm.batch.streams.outputstreams.EchoWriter
来定义的,读者可以根据这个规划自己定义自己的Step即工作步骤,并且在properties里任意调整各个步骤的顺序
下面给出
com.ibm.batch.streams.inputstreams.EchoReader
com.ibm.batch.streams.outputstreams.EchoWriter
com.ibm.batch.steps.Echo
这三个类的源码:
com.ibm.batch.streams.inputstreams.EchoReader
public class EchoReader implements FileReaderPattern, JDBCReaderPattern,
ByteReaderPattern, RecordOrientedDatasetReaderPattern {
protected BDSFWLogger logger;
protected EchoDataHolder echoDataHolder;
// Properties for reading from a JDBC input source
protected String tableNameKey = "tablename";
protected String tableName = "ALG.TIVPWXD0";
protected String echoQuery = "select * from ";
// Properties for reading bytes from a Byte or Dataset input source
protected String RecordLengthKey = "EchoReader_record_length";
protected String defaultRecordLength = "80";
protected int recordLength;
protected byte[] buf;
/**
* Initialize method is driven upon object creation. The properties object
* passed in contains the properties specified for the input stream in the
* xJCL.
*/
public void initialize(Properties props) {
logger = new BDSFWLogger(props);
if (logger.isDebugEnabled())
logger.debug("entering EchoReader.initialize()");
// initialize any JDBC properties that have been defined
tableName = props.getProperty(this.tableNameKey, tableName);
echoQuery += tableName;
// -----------------------------------------------------
// initialize any byte reader properties that have been defined.
recordLength = Integer.valueOf(
props.getProperty(RecordLengthKey, defaultRecordLength))
.intValue();
buf = new byte[recordLength];
// -----------------------------------------------------
if (logger.isDebugEnabled())
logger.debug("exiting EchoReader.initialize()");
}
// The fetchHeader method exposes any header data that has been parsed for
// this input stream.
public Object fetchHeader() {
// no header data to expose.
return null;
}
// File read methods - FileReaderPattern implementation
// the processHeader method's task is to parse the data in the buffered
// reader and extract any header information that could be used for this
// stream.
// the java io methods of mark() and reset() should be used here to ensure
// that the BufferedReader, upon completion of parsing the header data, is
// positioned
// at the start of the first data record to be processed.
public void processHeader(BufferedReader arg0) throws IOException {
// n/a for this example.
}
// the task of the fetchRecord() method is to parse the raw data and map it
// to the domain object. The java io methods of mark() and reset() should be
// used
// to ensure the buffered reader, upon completion of obtaining the record,
// is at the starting position of the next data record to be processed.
public Object fetchRecord(BufferedReader reader) throws IOException {
String line = reader.readLine();
if (line != null) {
if (logger.isDebugEnabled())
logger.debug("EchoReader.fetchRecord(bufferedReader)- line= "
+ line);
return new EchoDataHolder(line);
} else {
if (logger.isDebugEnabled())
logger.debug("EchoReader.fetchRecord(bufferedReader)- returning null");
return null;
}
}
// -----------------------------------------------------
// JDBC read methods - JDBCReaderPattern implementation
// the task for getInitialLookupQuery is to return the SQL query to be
// executed the first time this job is every invoked. Note, a seperate
// method is used
// for obtaining the query that should be executed if the job is restarted.
// For example: select * from table 1;
public String getInitialLookupQuery() {
return echoQuery;
}
// the task for the getRestartQuery method is to create the SQL string that
// should be executed if the job is restarted. The String parameter to the
// method
// contains the data returned in getRestartTokens(). This data was persisted
// by the Batch container on behalf of this stream during a checkpoint.
// Upon restart, that data should be used to determine where to reposition
// the stream to (for example, as arguments for a WHERE clause in your SQL
// query).
// for example, select * from table1 where recordName is between B and Z
public String getRestartQuery(String arg0) {
// TODO Auto-generated method stub
return null;
}
// the task for the getRestartTokens() method is to return any data that
// should be stored during a checkpoint for this stream. The data stored
// would be used
// to reposition the tream upon restart of the job. Think of this as
// returning data that you would need to populate the WHERE clause of an SQL
// query.
// For example, select * from table1 where recordName is between B and Z
public String getRestartTokens() {
// restart logic for echo JDBC reader is not implemented.
return null;
}
// This method maps the columns of a database row to a hashmap.
// The hashmap can get be queried by the record processor for fields, where
// key = column name, value = column value.
public Object fetchRecord(ResultSet resultSet) {
if (logger.isDebugEnabled())
logger.debug("entering EchoReader.fetchRecord(resultSet)");
try {
ResultSetMetaData rsmd = resultSet.getMetaData();
int columnCount = rsmd.getColumnCount();
HashMap<String, Object> dbMap = new HashMap<String, Object>();
for (int i = 1; i <= columnCount; i++) {
String columnName = rsmd.getColumnName(i);
Object columnValue = resultSet.getObject(i);
dbMap.put(columnName, columnValue);
}
if (logger.isDebugEnabled()) {
logger.debug("EchoReader.fetchRecord(resultSet)- dbmap = "
+ dbMap);
logger.debug("exiting EchoReader.fetchRecord(resultSet)");
}
if (dbMap.size() == 0)
return null;
else
return new EchoDataHolder(dbMap);
} catch (Throwable t) {
throw new RuntimeException(t);
}
}
// -----------------------------------------------------
// Byte read methods - ByteReaderPattern implementation
// the processHeader method's task is to parse the data in the
// bufferedInputStream reader and extract any header information that could
// be used for this stream.
// the java io methods of mark() and reset() should be used here to ensure
// that the bufferedInputStream, upon completion of parsing the header data,
// is positioned
// at the start of the first data record to be processed.
public void processHeader(BufferedInputStream arg0) throws IOException {
// n/a for this example
}
// the task of the fetchRecord() method is to parse the raw data and map it
// to the domain object. The java io methods of mark() and reset() should be
// used
// to ensure the buffered reader, upon completion of obtaining the record,
// is at the starting position of the next data record to be processed.
public Object fetchRecord(BufferedInputStream reader) throws IOException {
buf = new byte[recordLength];
int nread = reader.read(buf);
if (logger.isDebugEnabled())
logger.debug("fetchRecord(bufferedInputStream)\nbuf: " + buf
+ "\nnread=" + nread);
if (nread > 0) {
String inputStr = (new String(buf, "utf-8")).trim();
// return new EchoDataHolder(buf, nread);
return new EchoDataBean(inputStr);
} else {
return null;
}
}
// -----------------------------------------------------
// MVS Dataset read methods - RecordOrientedDatasetReaderPattern
// implementation
// the processHeader method's task is to parse the data in the
// bufferedInputStream reader and extract any header information that could
// be used for this stream.
// the java io methods of mark() and reset() should be used here to ensure
// that the bufferedInputStream, upon completion of parsing the header data,
// is positioned
// at the start of the first data record to be processed.
public void processHeader(ZFile arg0) throws IOException {
// n/a for this example
}
// the task of the fetchRecord() method is to parse the raw data and map it
// to the domain object. The io methods of mark() and reset() should be used
// to ensure the buffered reader, upon completion of obtaining the record,
// is at the starting position of the next data record to be processed.
public Object fetchRecord(ZFile reader) throws IOException {
System.out.println("========>read record ZFile reader");
byte[] buf = new byte[reader.getLrecl()];
int nread = reader.read(buf);
if (logger.isDebugEnabled())
logger.debug("fetchRecord(zfile)\nbuf: " + buf + "\nnread=" + nread);
if (nread > 0) {
if (logger.isDebugEnabled())
logger.debug("nread is > 0, returning object");
return new EchoDataHolder(buf, nread);
} else {
if (logger.isDebugEnabled())
logger.debug("nread is < 0, returning null");
return null;
}
}
// -----------------------------------------------------
}
com.ibm.batch.streams.outputstreams.EchoWriter
public class EchoWriter implements FileWriterPattern, ByteWriterPattern,
JDBCWriterPattern, RecordOrientedDatasetWriterPattern {
protected BDSFWLogger logger;
protected EchoDataHolder echoDataHolder;
protected String jobid;
protected String jobIdKey = "JobStepId";
protected int counter = 0;
// Properties for writing to a JDBC output source
protected String tableNameKey = "tablename";
protected String tableName = "t_grid_output_test";
protected String sqlQueryPreTablename = "insert into ";
/* oracle insert sql*/
//protected String tableValues = "(pk_id, my_date, open,high,low,close,volume,adj_close)";
//protected String sqlQueryPostTablename = " values (seq_test_output_id.nextval, ?, ?, ?, ?, ?, ?,?)";
/* mysql insert sql*/
protected String tableValues = "(my_date, open,high,low,close,volume,adj_close)";
protected String sqlQueryPostTablename = " values (?, ?, ?, ?, ?, ?,?)";
public void initialize(Properties props) {
logger = new BDSFWLogger(props);
jobid = props.get(this.jobIdKey).toString();
if (logger.isDebugEnabled())
logger.debug("EchoWriter.initialize()");
// initialize any JDBC properties that have been defined
tableName = props.getProperty(this.tableNameKey, tableName);
}
// -----------------------------------------------
// File write methods - FileWriterPattern implementation
// the task for the writeHeader(bufferedWriter) method is to write any
// header data to the output stream prior to writing the output data
// records.
// This method is only called once, which is upon initialization of the
// first execution of this job. This means the header will not be written
// again
// if the job is restarted.
public void writeHeader(BufferedWriter arg0) throws IOException {
// n/a, no header data to write.
}
// the task for the writeHeader(bufferedWriter) method is to write the
// header object passed to this stream
// to the output stream prior to writing the output data records.
// This method is only called once, which is upon initialization of the
// first execution of this job. This means the header will not be written
// again
// if the job is restarted.
public void writeHeader(BufferedWriter arg0, Object arg1)
throws IOException {
}
// the task for the writeRecord method is to write the processed domain
// object to the output stream.
public void writeRecord(BufferedWriter out, Object record)
throws IOException {
if (counter != 0) {
out.newLine();
}
counter++;
if (logger.isDebugEnabled())
logger.debug("EchoWriter.writeRecord(BufferedWriter)- record= "
+ record);
out.write(record.toString());
}
// ------------------------------------------------------
// byte writing methods - ByteWriterPattern implementation
// The task for the writeHeader(bufferedOutputStream) method is to write any
// header data to the output stream prior to writing the output data bytes.
// This method is only called once, which is upon initialization of the
// first execution of this job. This means the header will not be written
// again
// if the job is restarted.
public void writeHeader(BufferedOutputStream arg0) throws IOException {
// n/a, no header data to write.
}
// the task for the writeRecord method is to write the processed domain
// object to the output stream.
public void writeHeader(BufferedOutputStream arg0, Object arg1)
throws IOException {
}
// the task for the writeRecord method is to write the processed domain
// object to the output stream.
public void writeRecord(BufferedOutputStream out, Object record)
throws IOException {
EchoDataHolder holder = ((EchoDataHolder) record);
if (logger.isDebugEnabled())
logger.debug("writeRecord(bufferedOutputStream)\nbuf: " + holder
+ "\nnread=" + holder.nread);
out.write(holder.getByteData(), 0, holder.nread);
}
// ------------------------------------------------------
// JDBC writing methods - JDBCWriterPattern implementation
// the task for the getSQLQuery method is to return an SQL string that is
// will be used to store this domain object in the database.
public String getSQLQuery() {
String sqlQuery = sqlQueryPreTablename + tableName + tableValues
+ sqlQueryPostTablename;
// System.out.println("sqlQuery====" + sqlQuery);
return sqlQuery;
}
// the task for thie writeRecord(pstmt, record) method is to map the domain
// object to the prepared statement. The bds framework then manages
// executing that
// that prepared statement for you (because then we can do things like JDBC
// batching).
public PreparedStatement writeRecord(PreparedStatement pstmt, Object record) {
try {
if (logger.isDebugEnabled()) {
logger.debug("EchoWriter.writeRecord(PreparedStatement)- record= "
+ record);
}
EchoDataBean echoData = (EchoDataBean) record;
// System.out.println("mydate===" + echoData.getMyDate() + " open==="
// + echoData.getOpen());
pstmt.setDate(1, StringUtil.convertStrToDate(echoData.getMyDate()));
pstmt.setDouble(2, echoData.getOpen());
pstmt.setDouble(3, echoData.getHigh());
pstmt.setDouble(4, echoData.getLow());
pstmt.setDouble(5, echoData.getClose());
pstmt.setDouble(6, echoData.getVolume());
pstmt.setDouble(7, echoData.getAdjClose());
counter++;
} catch (Throwable t) {
throw new RuntimeException(t);
}
return pstmt;
}
// -------------------------------------------------------
// writing bytes to a fixed-block MVS dataset -
// RecordOrientedDatasetWriterPattern implementation
// The task for the writeHeader(ZFile) method is to write any header data to
// the output stream prior to writing the output data bytes.
// This method is only called once, which is upon initialization of the
// first execution of this job. This means the header will not be written
// again
// if the job is restarted.
public void writeHeader(ZFile arg0) throws IOException {
// n/a, no header data to write.
}
// the task for the writeRecord method is to write the processed domain
// object to the output stream.
public void writeHeader(ZFile arg0, Object arg1) {
// no header data to write.
}
// the task for the writeRecord method is to write the processed domain
// object to the output stream.
public void writeRecord(ZFile out, Object record) throws IOException {
EchoDataHolder holder = ((EchoDataHolder) record);
if (logger.isDebugEnabled())
logger.debug("writeRecord(zFile)\nbuf: " + holder + "\nnread="
+ holder.nread);
out.write(holder.getByteData(), 0, holder.nread);
}
}
com.ibm.batch.steps.Echo
public class Echo implements BatchRecordProcessor {
protected BDSFWLogger logger;
Integer time= 0;
Integer count= 0;
// this method is called once by GenericXDBatchStep.initializeJobStep()
public void initialize(Properties arg0) {
logger = new BDSFWLogger(arg0);
if (logger.isDebugEnabled())
logger.debug("initialize.");
}
// this method is called repeatedly by GenericXDBatchStep.processJobStep()
public Object processRecord(Object domainObject) throws Exception {
if (logger.isDebugEnabled())
logger.debug("processing record: " + domainObject);
// Since this is an echo step, just return the domain object that was passed in.
return domainObject;
}
// this method is called once by GenericXDBatchStep.destroyJobStep()
public int completeProcessing() {
try {
}
catch(Exception e) {
e.printStackTrace();
}
if (logger.isDebugEnabled())
logger.debug("completed processing.");
return 0;
}
}
4.3 布署批处理作业至IBM WAS Compute Grid
4.3.1 布署前的程序调试
一般我们找到我们的"handler“,然后在eclipse里使用右键run as->把main class填成:com.ibm.websphere.batch.BatchSimulator来进行测试。
4.3.2 生成布署包
一切在eclipse中调试通过后我们就可以开始生成我们的布署包了。
第一步:WAS设置
定位到script.ant.config文件夹中的WASConfig,设置你将要布署的WAS的相关连接参数:
<project name="WASConfig" default="checkIfHomeSet">
<property name="WAS_profile_home_default" value="D:/IBM/WebSphere/AppServer/profiles/AppSrv01" />
<property name="WAS_appserver_host_default" value="localhost" />
<property name="WAS_appserver_port_default" value="9082" />
<property name="WAS_adminserver_host_default" value="${WAS_appserver_host_default}" />
<property name="WAS_adminserver_port_default" value="unset" />
<property name="deploymentTarget_default" value=""/>
<property name="wsadminScript_default" value="${basedir}/../script.wsadmin/installApp.py" />
<target name="setDefaults">
<condition property="WAS_profile_home" value="${WAS_profile_home_default}">
<not>
<isset property="${WAS_profile_home}" />
</not>
</condition>
<condition property="WAS_appserver_host" value="${WAS_appserver_host_default}">
<not>
<isset property="${WAS_appserver_host}" />
</not>
</condition>
<condition property="WAS_appserver_port" value="${WAS_appserver_port_default}">
<not>
<isset property="${WAS_appserver_port}" />
</not>
</condition>
<condition property="WAS_adminserver_host" value="${WAS_adminserver_host_default}">
<not>
<isset property="${WAS_adminserver_host}" />
</not>
</condition>
<condition property="WAS_adminserver_port" value="${WAS_adminserver_port_default}">
<not>
<isset property="${WAS_adminserver_port}" />
</not>
</condition>
<condition property="deploymentTarget" value="${deploymentTarget_default}">
<not>
<isset property="${deploymentTarget}" />
</not>
</condition>
<condition property="wsadminScript" value="${wsadminScript_default}">
<not>
<isset property="${wsadminScript}" />
</not>
</condition>
<echo message="WASConfig.xml set WAS_profile_home=${WAS_profile_home}" />
<echo message="WASConfig.xml set WAS_appserver_host=${WAS_appserver_host}" />
<echo message="WASConfig.xml set WAS_appserver_port=${WAS_appserver_port}" />
<echo message="WASConfig.xml set WAS_adminserver_host=${WAS_adminserver_host}" />
<echo message="WASConfig.xml set WAS_adminserver_port=${WAS_adminserver_port}" />
<echo message="WASConfig.xml set deployementTarget=${deployementTarget}" />
<echo message="WASConfig.xml set wsadminScript=${wsadminScript}" />
</target>
<target name="checkIfHomeSet" depends="setDefaults">
<echo message="Checking if default WAS home set ..."/>
<condition property="homeNotSet">
<equals arg1="${WAS_profile_home}" arg2="unset"/>
</condition>
<fail message="Must set property WAS_profile_home_default in script.ant/config/WASConfig.xml" if="homeNotSet"/>
<echo message="WAS profile home set to ${WAS_profile_home}"/>
</target>
</project>
第二步:启动WAS
第三步:
在eclipse中定位到下面这个文件夹,看到一堆的xml文件了吗?
依下面的次序把每个xml文件打开后用ANT运行一下:
- clean.xml
- generatePackagingProps.Echo.xml
- packageApp.Echo.xml
- installApp.Echo.xml
- generatexJCL.Echo.xml
第四步:重启你的WAS
在WAS的企业程序菜单中你可以看到有一个ear的应用已经布署并应该随着你的第三步中的重启WAS后而处于启动阶段了:
怎么执行我们的批处理脚本呢?
来,看
看到这个Echo.xml文件了吗?这个是根据Echo.props文件在进行AntgeneratexJCL.Echo.xml的过程中自动生成的批处理脚本,即.xJCL文件
Echo.xml文件(xJCL)
<?xml version="1.0" encoding="UTF-8" ?>
<!--
#
# WebSphere Batch xJCL
#
# This file generated on 2012.08.06 at 14:12:13 CST by:
# BatchSimulator Version: WXD611 [cf20947.48960]
#
-->
<job name="Echo" default-application-name="Echo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<jndi-name>ejb/com/ibm/ws/batch/EchoBatchController</jndi-name>
<step-scheduling-criteria>
<scheduling-mode>sequential</scheduling-mode>
</step-scheduling-criteria>
<checkpoint-algorithm name="chkpt">
<classname>com.ibm.wsspi.batch.checkpointalgorithms.RecordbasedBase</classname>
<props>
<prop name="recordcount" value="1000"/>
</props>
</checkpoint-algorithm>
<results-algorithms>
<results-algorithm name="jobsum">
<classname>com.ibm.wsspi.batch.resultsalgorithms.jobsum</classname>
</results-algorithm>
</results-algorithms>
<job-step name="Step1">
<jndi-name>ejb/GenericXDBatchStep</jndi-name>
<checkpoint-algorithm-ref name="chkpt"/>
<results-ref name="jobsum"/>
<batch-data-streams>
<bds>
<logical-name>outputStream</logical-name>
<props>
<prop name="pswd" value="password_1"/>
<prop name="AppendJobIdToFileName" value="false"/>
<prop name="debug" value="false"/>
<prop name="EnableDetailedPerformanceMeasurement" value="false"/>
<prop name="jdbc_driver" value="com.mysql.jdbc.Driver"/>
<prop name="EnablePerformanceMeasurement" value="false"/>
<prop name="userid" value="elt"/>
<prop name="FILENAME" value="${echo.data}/output.txt"/>
<prop name="tablename" value="t_grid_output_test"/>
<prop name="jdbc_url" value="jdbc:mysql://localhost:3306/eltdb"/>
<prop name="PATTERN_IMPL_CLASS" value="com.ibm.batch.streams.outputstreams.EchoWriter"/>
</props>
<impl-class>com.ibm.websphere.batch.devframework.datastreams.patterns.LocalJDBCWriter</impl-class>
</bds>
<bds>
<logical-name>inputStream</logical-name>
<props>
<prop name="FILENAME" value="${echo.data}/input.txt"/>
<prop name="EnablePerformanceMeasurement" value="false"/>
<prop name="EnableDetailedPerformanceMeasurement" value="false"/>
<prop name="debug" value="false"/>
<prop name="PATTERN_IMPL_CLASS" value="com.ibm.batch.streams.inputstreams.EchoReader"/>
</props>
<impl-class>com.ibm.websphere.batch.devframework.datastreams.patterns.FileByteReader</impl-class>
</bds>
</batch-data-streams>
<props>
<prop name="EnablePerformanceMeasurement" value="false"/>
<prop name="EnableDetailedPerformanceMeasurement" value="false"/>
<prop name="debug" value="false"/>
<prop name="BATCHRECORDPROCESSOR" value="com.ibm.batch.steps.Echo"/>
</props>
</job-step>
</job>
在这个文件中我们可以任意改动其们批处理的步骤,调整我们的批处理脚本的各个步骤的执行顺序
而不需要改动源码与再打包
。
看到了米有? 把这个xml文件填入Specify path to xJCL这一栏,点”submit“按钮,就可以执行该批处理了,通过这个界面,想想我在上面”商业级解决方案“中提到过的5点需求,是不是它都可以执行了,嘿嘿,大家看看单节点执行的效率吧。
五、通过WASND来实现批处理网格计算
这个留给大家自行练习了。
你不需要改动任何程序代码,只要:
- 布署一个WASND
- 安装IBM Compute Grid后在Augment profile时选择WASND的主控域
- IBM Compute Grid会根据指定的WASND的主控域自动安装到各个WASND中的节点
- 设置需要布署的WASND的相关参数
- 依次运行我们的一系列ANT用的XML
- 重启WASND
- 还是通过http://localhost:9080/jmc进入IBM Compute Grid控制台
- 提交xJCL
此时,这个批处理就会使用WASND中的各个布署有IBM Compute Grid的节点来进行”
网格计算了”.
注意:
这边的网格计算和传统的集群和load balance还是有区别的,这边的网格计算更类似于原来我有一个CPU,4G内存处理100万笔记录,通过WASND增加了一个节点后我就拥有1*2个CPU以及4GB*2的内存来运行同样的100万笔记录了,然后我再WASND一个节点,那么就再增加我批处理的处理能力。