Spring Batch 提供了两种不同方式实现job: tasklet 和 chunk。本文通过实例实践两种方法。
给定输入csv文件内容如下:
Mae Hodges,10/22/1972
Gary Potter,02/22/1953
Betty Wise,02/17/1968
Wayne Rose,04/06/1977
Adam Caldwell,09/27/1995
Lucille Phillips,05/14/1992
每行的第一列表示名称、第二列表示出生日期。我们的示例是需要生成结果文档包括名称和年龄:
Mae Hodges,45
Gary Potter,64
Betty Wise,49
Wayne Rose,40
Adam Caldwell,22
Lucille Phillips,25
现在目标清楚了,让我们开始使用两种方法实现。首先我们使用tasklet。
takslet意味着在step中执行单个任务,job有多个step按一定顺序组成,每个步骤应该执行一个具体任务。
我们的job有三个步骤:
框架搭好了,我们开始实现具体每个步骤。
LinesReader负责从输入文件读数据:
public class LinesReader implements Tasklet {
// ...
}
LinesProcessor负责计算每个人年龄:
public class LinesProcessor implements Tasklet {
// ...
}
最后,LinesWriter负责名称和年龄至输出文件:
public class LinesWriter implements Tasklet {
// ...
}
同时,所有步骤都实现Tasklet接口,所以必须实现其方法:
@Override
public RepeatStatus execute(StepContribution stepContribution,
ChunkContext chunkContext) throws Exception {
// ...
}
该方法需实现每个步骤的业务逻辑。开始写详细代码之前,我们先配置job。
### 批处理配置
我们需要在spring上下文中增加一些配置。把前面创建的类增bean声明配置,请看job定义:
@Configuration
@EnableBatchProcessing
public class TaskletsConfig {
@Autowired private JobBuilderFactory jobs;
@Autowired private StepBuilderFactory steps;
@Bean
public JobLauncherTestUtils jobLauncherTestUtils() {
return new JobLauncherTestUtils();
}
@Bean
public JobRepository jobRepository() throws Exception {
MapJobRepositoryFactoryBean factory = new MapJobRepositoryFactoryBean();
factory.setTransactionManager(transactionManager());
return (JobRepository) factory.getObject();
}
@Bean
public PlatformTransactionManager transactionManager() {
return new ResourcelessTransactionManager();
}
@Bean
public JobLauncher jobLauncher() throws Exception {
SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
jobLauncher.setJobRepository(jobRepository());
return jobLauncher;
}
@Bean
public LinesReader linesReader() {
return new LinesReader();
}
@Bean
public LinesProcessor linesProcessor() {
return new LinesProcessor();
}
@Bean
public LinesWriter linesWriter() {
return new LinesWriter();
}
@Bean
protected Step readLines() {
return steps
.get("readLines")
.tasklet(linesReader())
.build();
}
@Bean
protected Step processLines() {
return steps
.get("processLines")
.tasklet(linesProcessor())
.build();
}
@Bean
protected Step writeLines() {
return steps
.get("writeLines")
.tasklet(linesWriter())
.build();
}
@Bean
public Job job() {
return jobs
.get("taskletsJob")
.start(readLines())
.next(processLines())
.next(writeLines())
.build();
}
}
我们的taskletsJob由三个步骤组成。第一个(readLines)执行lineReader bean的任务,然后到下一个步骤processLines。ProcessLines执行lineProcessor bean的任务,最后执行writeLines步骤。
到目前为止,我们的job流程定义好了,下面准备增加逻辑。
因为需要处理csv文件中的每一行,我们定义Line类:
@Setter
@Getter
@ToString
@AllArgsConstructor
@NoArgsConstructor
public class Line implements Serializable {
private String name;
private LocalDate dob;
private Long age;
}
需要注意的是Line类实现Serializable接口,那是因为Line作为DTO在不同步骤之间传输数据。根据Spring Batch规范,在步骤之间传输的对象必须是可序列化的。另外我们需要考虑如何读写csv文件,这里我们使用OpenCSV工具库:
implementation 'com.opencsv:openscs:4.5'
加入OpenCSV依赖后,我们实现FileUtils类,提供CSV文件读写方法:
ublic class FileUtils {
private final Logger logger = LoggerFactory.getLogger(FileUtils.class);
private String fileName;
private CSVReader CSVReader;
private CSVWriter CSVWriter;
private FileReader fileReader;
private FileWriter fileWriter;
private File file;
public FileUtils(String fileName) {
this.fileName = fileName;
}
public Line readLine() {
try {
if (CSVReader == null) initReader();
String[] line = CSVReader.readNext();
if (line == null) return null;
return new Line(line[0], LocalDate.parse(line[1], DateTimeFormatter.ofPattern("MM/dd/yyyy")));
} catch (Exception e) {
logger.error("Error while reading line in file: " + this.fileName);
return null;
}
}
public void writeLine(Line line) {
try {
if (CSVWriter == null) initWriter();
String[] lineStr = new String[2];
lineStr[0] = line.getName();
lineStr[1] = line
.getAge()
.toString();
CSVWriter.writeNext(lineStr);
} catch (Exception e) {
logger.error("Error while writing line in file: " + this.fileName);
}
}
private void initReader() throws Exception {
ClassLoader classLoader = this
.getClass()
.getClassLoader();
if (file == null) file = new File(classLoader
.getResource(fileName)
.getFile());
if (fileReader == null) fileReader = new FileReader(file);
if (CSVReader == null) CSVReader = new CSVReader(fileReader);
}
private void initWriter() throws Exception {
if (file == null) {
file = new File(fileName);
file.createNewFile();
}
if (fileWriter == null) fileWriter = new FileWriter(file, true);
if (CSVWriter == null) CSVWriter = new CSVWriter(fileWriter);
}
public void closeWriter() {
try {
CSVWriter.close();
fileWriter.close();
} catch (IOException e) {
logger.error("Error while closing writer.");
}
}
public void closeReader() {
try {
CSVReader.close();
fileReader.close();
} catch (IOException e) {
logger.error("Error while closing reader.");
}
}
}
readLine实际上是OpenCSV的readNext方法的包装器负责返回Line对象。同理,writeLine封装了OpenCSV的writeNext方法,接收Line并写入文件。下面实现每个步骤的业务逻辑。
实现LinesReader 类的完整逻辑:
public class LinesReader implements Tasklet, StepExecutionListener {
private final Logger logger = LoggerFactory
.getLogger(LinesReader.class);
private List lines;
private FileUtils fu;
@Override
public void beforeStep(StepExecution stepExecution) {
lines = new ArrayList<>();
fu = new FileUtils(
"taskletsvschunks/input/tasklets-vs-chunks.csv");
logger.debug("Lines Reader initialized.");
}
@Override
public RepeatStatus execute(StepContribution stepContribution,
ChunkContext chunkContext) throws Exception {
Line line = fu.readLine();
while (line != null) {
lines.add(line);
logger.debug("Read line: " + line.toString());
line = fu.readLine();
}
return RepeatStatus.FINISHED;
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
fu.closeReader();
stepExecution
.getJobExecution()
.getExecutionContext()
.put("lines", this.lines);
logger.debug("Lines Reader ended.");
return ExitStatus.COMPLETED;
}
}
LinesReader的execute方法针对输入文件path创建CSVFileUtils实例,然后增加Line对象至list,直至没有数据可读。同时也实现StepExecutionListener接口,并实现其两个方法:beforeStep 和 afterStep,主要实现初始化或关闭资源。另外,afterStep方法中list(存储读取的数据:lines)被放入job上下文中,使其在下一个步骤中可用:
stepExecution
.getJobExecution()
.getExecutionContext()
.put("lines", this.lines);
至此,第一步已经完成其职责:加载csv文件至内存(list中),下面继续第二步处理数据。
### LinesProcessor
LinesProcessor 当然也实现StepExecutionListener和Tasklet接口,因此需要实现 beforeStep、 execute 、afterStep三个方法:
public class LinesProcessor implements Tasklet, StepExecutionListener {
private Logger logger = LoggerFactory.getLogger(
LinesProcessor.class);
private List lines;
@Override
public void beforeStep(StepExecution stepExecution) {
ExecutionContext executionContext = stepExecution
.getJobExecution()
.getExecutionContext();
this.lines = (List) executionContext.get("lines");
logger.debug("Lines Processor initialized.");
}
@Override
public RepeatStatus execute(StepContribution stepContribution,
ChunkContext chunkContext) throws Exception {
for (Line line : lines) {
long age = ChronoUnit.YEARS.between(
line.getDob(),
LocalDate.now());
logger.debug("Calculated age " + age + " for line " + line.toString());
line.setAge(age);
}
return RepeatStatus.FINISHED;
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
logger.debug("Lines Processor ended.");
return ExitStatus.COMPLETED;
}
}
很容易理解,其实现从job上下文中加载数据并计算每人的年龄。不需要再次把结果存入job上下文,因为其已经存在,我们仅修改了数据(增加了年龄)。下面开始最后一步。
LinesWriter任务就是遍历list中的每项数据并写入姓名和年龄至文件:
public class LinesWriter implements Tasklet, StepExecutionListener {
private final Logger logger = LoggerFactory
.getLogger(LinesWriter.class);
private List lines;
private FileUtils fu;
@Override
public void beforeStep(StepExecution stepExecution) {
ExecutionContext executionContext = stepExecution
.getJobExecution()
.getExecutionContext();
this.lines = (List) executionContext.get("lines");
fu = new FileUtils("output.csv");
logger.debug("Lines Writer initialized.");
}
@Override
public RepeatStatus execute(StepContribution stepContribution,
ChunkContext chunkContext) throws Exception {
for (Line line : lines) {
fu.writeLine(line);
logger.debug("Wrote line " + line.toString());
}
return RepeatStatus.FINISHED;
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
fu.closeWriter();
logger.debug("Lines Writer ended.");
return ExitStatus.COMPLETED;
}
}
至此已经实现了所有功能,下面跑测试验证结果。
定义测试配置类:
@Configuration
public class TaskletsTestConfig {
@Bean
public JobLauncherTestUtils jobLauncherTestUtils() {
return new JobLauncherTestUtils();
}
}
通过测试类运行:
@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {TaskletsConfig.class, TaskletsTestConfig.class})
public class TaskletApplicationTests {
@Autowired
private JobLauncherTestUtils jobLauncherTestUtils;
@Test
public void givenTaskletsJob_whenJobEnds_thenStatusCompleted() throws Exception {
JobExecution jobExecution = jobLauncherTestUtils.launchJob();
assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus());
}
}
ContextConfiguration注解指定spring上下文配置及JobLauncherTestUtils测试类。
所有都好了,开始运行测试。运行完成之后,output.csv有了期望的内容,日志显示如下:
[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader initialized.
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader ended.
[main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor initialized.
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 64 for line [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 49 for line [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor ended.
[main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer initialized.
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Gary Potter,02/22/1953,64]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Betty Wise,02/17/1968,49]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25]
[main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer ended.
见名思议,该方法基于数据块(一部分数据)执行。也就是说,其不是一次读、处理和写所有行,而是一次仅读、处理、写固定数量记录。然后重复循环执行直到读不到数据为止。
因此,此流程与上面有些差异:
while 有数据:
do X 行数据
读一行
处理一行
写 X 行数据
end while
所以,我们需要针对该方法创建三个bean:
public class LineReader {
// ...
}
public class LineProcessor {
// ...
}
public class LinesWriter {
// ...
}
在具体实现业务之前,我们首先配置job:
@Configuration
@EnableBatchProcessing
public class ChunksConfig {
private final JobBuilderFactory jobs;
private final StepBuilderFactory steps;
public ChunksConfig(JobBuilderFactory jobs, StepBuilderFactory steps) {
this.jobs = jobs;
this.steps = steps;
}
@Bean
public ItemReader itemReader() {
return new LineReader();
}
@Bean
public ItemProcessor itemProcessor() {
return new LineProcessor();
}
@Bean
public ItemWriter itemWriter() {
return new LinesWriter();
}
@Bean
protected Step processLines(ItemReader reader,
ItemProcessor processor, ItemWriter writer) {
return steps.get("processLines"). chunk(2)
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
@Bean
public Job job() {
return jobs
.get("chunksJob")
.start(processLines(itemReader(), itemProcessor(), itemWriter()))
.build();
}
}
这里我们仅一个步骤执行一个任务。但是任务中定义了基于数据库进行读、处理以及写环节。
需要提醒的是,提交量表示在一个数据块中处理数据的数量。因此我们会一次读、处理、写两行。
好了,下面该实现每个环节的业务逻辑。
LineReader负责读一条记录,然后返回Line实例。为了实现读环节,需实现ItemReader接口:
public class LineReader implements
ItemReader, StepExecutionListener {
private final Logger logger = LoggerFactory
.getLogger(LineReader.class);
private FileUtils fu;
@Override
public void beforeStep(StepExecution stepExecution) {
fu = new FileUtils("taskletsvschunks/input/tasklets-vs-chunks.csv");
logger.debug("Line Reader initialized.");
}
@Override
public Line read() throws Exception {
Line line = fu.readLine();
if (line != null) logger.debug("Read line: " + line.toString());
return line;
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
fu.closeReader();
logger.debug("Line Reader ended.");
return ExitStatus.COMPLETED;
}
}
read()方法很直接,读一行数据并返回Line实例。同时我们也实现了StepExecutionListener接口,其中beforeStep 和 afterStep方法在整个步骤之前和之后运行,负责初始化资源和关闭资源。
LineProcessor遵循的逻辑与LineReader几乎相同,需要实现process方法,其接收一个输入line,处理并返回输出line对象。
public class LineProcessor implements
ItemProcessor, StepExecutionListener {
private Logger logger = LoggerFactory.getLogger(LineProcessor.class);
@Override
public void beforeStep(StepExecution stepExecution) {
logger.debug("Line Processor initialized.");
}
@Override
public Line process(Line line) throws Exception {
long age = ChronoUnit.YEARS
.between(line.getDob(), LocalDate.now());
logger.debug(
"Calculated age " + age + " for line " + line.toString());
line.setAge(age);
return line;
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
logger.debug("Line Processor ended.");
return ExitStatus.COMPLETED;
}
}
同时也实现StepExecutionListener接口,这里仅用于输出日志,方便理解执行机制。
### LinesWriter
与reader 和 processor 不同,LinesWriter写数据块中所有行,因此write方法接收List参数。
public class LinesWriter implements
ItemWriter, StepExecutionListener {
private final Logger logger = LoggerFactory
.getLogger(LinesWriter.class);
private FileUtils fu;
@Override
public void beforeStep(StepExecution stepExecution) {
fu = new FileUtils("output.csv");
logger.debug("Line Writer initialized.");
}
@Override
public void write(List extends Line> lines) throws Exception {
for (Line line : lines) {
fu.writeLine(line);
logger.debug("Wrote line " + line.toString());
}
}
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
fu.closeWriter();
logger.debug("Line Writer ended.");
return ExitStatus.COMPLETED;
}
}
其他代码几乎是自描述的,下面开始测试job。
我们创建一个新的测试,与之前测试tasklet方法一致:
@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {ChunksConfig.class, TaskletsTestConfig.class})
public class ChunksTest {
@Autowired
private JobLauncherTestUtils jobLauncherTestUtils;
@Test
public void givenChunksJob_whenJobEnds_thenStatusCompleted()
throws Exception {
JobExecution jobExecution = jobLauncherTestUtils.launchJob();
assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus());
}
}
配置了ChunksConfig之后,开始运行测试。job正常运行后,生成了output.csv文件,包含期望结果,且日志输出如下:
[main] DEBUG o.b.t.chunks.LineReader - Line Reader initialized.
[main] DEBUG o.b.t.chunks.LinesWriter - Line Writer initialized.
[main] DEBUG o.b.t.chunks.LineProcessor - Line Processor initialized.
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 64 for line [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Gary Potter,02/22/1953,64]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 49 for line [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Betty Wise,02/17/1968,49]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25]
[main] DEBUG o.b.t.chunks.LineProcessor - Line Processor ended.
[main] DEBUG o.b.t.chunks.LinesWriter - Line Writer ended.
[main] DEBUG o.b.t.chunks.LineReader - Line Reader ended.
两个方法结果相同,但流程不同,日志结果显示两个方法的执行流程的差异。
两者差异显示了各自适用场景。tasklet更适合一个步骤到另一个步骤场景。chunk提供简单解决方案:实现处理分页读,或我们不想在内存中保留大量数据场景。