介绍Spring Batch 中Tasklet 和 Chunks

介绍Spring Batch 中Tasklet 和 Chunks

Spring Batch 提供了两种不同方式实现job: tasklet 和 chunk。本文通过实例实践两种方法。

示例需求说明

给定输入csv文件内容如下:

Mae Hodges,10/22/1972
Gary Potter,02/22/1953
Betty Wise,02/17/1968
Wayne Rose,04/06/1977
Adam Caldwell,09/27/1995
Lucille Phillips,05/14/1992

每行的第一列表示名称、第二列表示出生日期。我们的示例是需要生成结果文档包括名称和年龄:

Mae Hodges,45
Gary Potter,64
Betty Wise,49
Wayne Rose,40
Adam Caldwell,22
Lucille Phillips,25

现在目标清楚了,让我们开始使用两种方法实现。首先我们使用tasklet。

Tasklet 方式实现

分析与设计

takslet意味着在step中执行单个任务,job有多个step按一定顺序组成,每个步骤应该执行一个具体任务。
我们的job有三个步骤:

  1. 从输入csv文件读
  2. 对每个输入行数据计算年龄
  3. 写姓名和年龄至输出csv文件

框架搭好了,我们开始实现具体每个步骤。

LinesReader负责从输入文件读数据:

public class LinesReader implements Tasklet {
    // ...
}

LinesProcessor负责计算每个人年龄:

public class LinesProcessor implements Tasklet {
    // ...
}

最后,LinesWriter负责名称和年龄至输出文件:

public class LinesWriter implements Tasklet {
    // ...
}

同时,所有步骤都实现Tasklet接口,所以必须实现其方法:

@Override
public RepeatStatus execute(StepContribution stepContribution, 
  ChunkContext chunkContext) throws Exception {
    // ...
}

该方法需实现每个步骤的业务逻辑。开始写详细代码之前,我们先配置job。

### 批处理配置

我们需要在spring上下文中增加一些配置。把前面创建的类增bean声明配置,请看job定义:

@Configuration
@EnableBatchProcessing
public class TaskletsConfig {

    @Autowired private JobBuilderFactory jobs;
    @Autowired private StepBuilderFactory steps;

    @Bean
    public JobLauncherTestUtils jobLauncherTestUtils() {
        return new JobLauncherTestUtils();
    }

    @Bean
    public JobRepository jobRepository() throws Exception {
        MapJobRepositoryFactoryBean factory = new MapJobRepositoryFactoryBean();
        factory.setTransactionManager(transactionManager());
        return (JobRepository) factory.getObject();
    }

    @Bean
    public PlatformTransactionManager transactionManager() {
        return new ResourcelessTransactionManager();
    }

    @Bean
    public JobLauncher jobLauncher() throws Exception {
        SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
        jobLauncher.setJobRepository(jobRepository());
        return jobLauncher;
    }

    @Bean
    public LinesReader linesReader() {
        return new LinesReader();
    }

    @Bean
    public LinesProcessor linesProcessor() {
        return new LinesProcessor();
    }

    @Bean
    public LinesWriter linesWriter() {
        return new LinesWriter();
    }

    @Bean
    protected Step readLines() {
        return steps
          .get("readLines")
          .tasklet(linesReader())
          .build();
    }

    @Bean
    protected Step processLines() {
        return steps
          .get("processLines")
          .tasklet(linesProcessor())
          .build();
    }

    @Bean
    protected Step writeLines() {
        return steps
          .get("writeLines")
          .tasklet(linesWriter())
          .build();
    }

    @Bean
    public Job job() {
        return jobs
          .get("taskletsJob")
          .start(readLines())
          .next(processLines())
          .next(writeLines())
          .build();
    }
}

我们的taskletsJob由三个步骤组成。第一个(readLines)执行lineReader bean的任务,然后到下一个步骤processLines。ProcessLines执行lineProcessor bean的任务,最后执行writeLines步骤。
到目前为止,我们的job流程定义好了,下面准备增加逻辑。

模型对象和工具类

因为需要处理csv文件中的每一行,我们定义Line类:

@Setter
@Getter
@ToString
@AllArgsConstructor
@NoArgsConstructor
public class Line implements Serializable {
    private String name;
    private LocalDate dob;
    private Long age;
}

需要注意的是Line类实现Serializable接口,那是因为Line作为DTO在不同步骤之间传输数据。根据Spring Batch规范,在步骤之间传输的对象必须是可序列化的。另外我们需要考虑如何读写csv文件,这里我们使用OpenCSV工具库:

    implementation 'com.opencsv:openscs:4.5'

加入OpenCSV依赖后,我们实现FileUtils类,提供CSV文件读写方法:

ublic class FileUtils {

    private final Logger logger = LoggerFactory.getLogger(FileUtils.class);

    private String fileName;
    private CSVReader CSVReader;
    private CSVWriter CSVWriter;
    private FileReader fileReader;
    private FileWriter fileWriter;
    private File file;

    public FileUtils(String fileName) {
        this.fileName = fileName;
    }

    public Line readLine() {
        try {
            if (CSVReader == null) initReader();
            String[] line = CSVReader.readNext();
            if (line == null) return null;
            return new Line(line[0], LocalDate.parse(line[1], DateTimeFormatter.ofPattern("MM/dd/yyyy")));
        } catch (Exception e) {
            logger.error("Error while reading line in file: " + this.fileName);
            return null;
        }
    }

    public void writeLine(Line line) {
        try {
            if (CSVWriter == null) initWriter();
            String[] lineStr = new String[2];
            lineStr[0] = line.getName();
            lineStr[1] = line
              .getAge()
              .toString();
            CSVWriter.writeNext(lineStr);
        } catch (Exception e) {
            logger.error("Error while writing line in file: " + this.fileName);
        }
    }

    private void initReader() throws Exception {
        ClassLoader classLoader = this
          .getClass()
          .getClassLoader();
        if (file == null) file = new File(classLoader
          .getResource(fileName)
          .getFile());
        if (fileReader == null) fileReader = new FileReader(file);
        if (CSVReader == null) CSVReader = new CSVReader(fileReader);
    }

    private void initWriter() throws Exception {
        if (file == null) {
            file = new File(fileName);
            file.createNewFile();
        }
        if (fileWriter == null) fileWriter = new FileWriter(file, true);
        if (CSVWriter == null) CSVWriter = new CSVWriter(fileWriter);
    }

    public void closeWriter() {
        try {
            CSVWriter.close();
            fileWriter.close();
        } catch (IOException e) {
            logger.error("Error while closing writer.");
        }
    }

    public void closeReader() {
        try {
            CSVReader.close();
            fileReader.close();
        } catch (IOException e) {
            logger.error("Error while closing reader.");
        }
    }

}

readLine实际上是OpenCSV的readNext方法的包装器负责返回Line对象。同理,writeLine封装了OpenCSV的writeNext方法,接收Line并写入文件。下面实现每个步骤的业务逻辑。

LinesReader

实现LinesReader 类的完整逻辑:

public class LinesReader implements Tasklet, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LinesReader.class);
 
    private List lines;
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        lines = new ArrayList<>();
        fu = new FileUtils(
          "taskletsvschunks/input/tasklets-vs-chunks.csv");
        logger.debug("Lines Reader initialized.");
    }
 
    @Override
    public RepeatStatus execute(StepContribution stepContribution, 
      ChunkContext chunkContext) throws Exception {
        Line line = fu.readLine();
        while (line != null) {
            lines.add(line);
            logger.debug("Read line: " + line.toString());
            line = fu.readLine();
        }
        return RepeatStatus.FINISHED;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeReader();
        stepExecution
          .getJobExecution()
          .getExecutionContext()
          .put("lines", this.lines);
        logger.debug("Lines Reader ended.");
        return ExitStatus.COMPLETED;
    }
}

LinesReader的execute方法针对输入文件path创建CSVFileUtils实例,然后增加Line对象至list,直至没有数据可读。同时也实现StepExecutionListener接口,并实现其两个方法:beforeStep 和 afterStep,主要实现初始化或关闭资源。另外,afterStep方法中list(存储读取的数据:lines)被放入job上下文中,使其在下一个步骤中可用:

stepExecution
  .getJobExecution()
  .getExecutionContext()
  .put("lines", this.lines);

至此,第一步已经完成其职责:加载csv文件至内存(list中),下面继续第二步处理数据。

### LinesProcessor

LinesProcessor 当然也实现StepExecutionListener和Tasklet接口,因此需要实现 beforeStep、 execute 、afterStep三个方法:

public class LinesProcessor implements Tasklet, StepExecutionListener {
 
    private Logger logger = LoggerFactory.getLogger(
      LinesProcessor.class);
 
    private List lines;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        ExecutionContext executionContext = stepExecution
          .getJobExecution()
          .getExecutionContext();
        this.lines = (List) executionContext.get("lines");
        logger.debug("Lines Processor initialized.");
    }
 
    @Override
    public RepeatStatus execute(StepContribution stepContribution, 
      ChunkContext chunkContext) throws Exception {
        for (Line line : lines) {
            long age = ChronoUnit.YEARS.between(
              line.getDob(), 
              LocalDate.now());
            logger.debug("Calculated age " + age + " for line " + line.toString());
            line.setAge(age);
        }
        return RepeatStatus.FINISHED;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        logger.debug("Lines Processor ended.");
        return ExitStatus.COMPLETED;
    }
}

很容易理解,其实现从job上下文中加载数据并计算每人的年龄。不需要再次把结果存入job上下文,因为其已经存在,我们仅修改了数据(增加了年龄)。下面开始最后一步。

LinesWriter

LinesWriter任务就是遍历list中的每项数据并写入姓名和年龄至文件:

public class LinesWriter implements Tasklet, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LinesWriter.class);
 
    private List lines;
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        ExecutionContext executionContext = stepExecution
          .getJobExecution()
          .getExecutionContext();
        this.lines = (List) executionContext.get("lines");
        fu = new FileUtils("output.csv");
        logger.debug("Lines Writer initialized.");
    }
 
    @Override
    public RepeatStatus execute(StepContribution stepContribution, 
      ChunkContext chunkContext) throws Exception {
        for (Line line : lines) {
            fu.writeLine(line);
            logger.debug("Wrote line " + line.toString());
        }
        return RepeatStatus.FINISHED;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeWriter();
        logger.debug("Lines Writer ended.");
        return ExitStatus.COMPLETED;
    }
}

至此已经实现了所有功能,下面跑测试验证结果。

运行job

定义测试配置类:

@Configuration
public class TaskletsTestConfig {
    @Bean
    public JobLauncherTestUtils jobLauncherTestUtils() {
        return new JobLauncherTestUtils();
    }
}

通过测试类运行:

@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {TaskletsConfig.class, TaskletsTestConfig.class})
public class TaskletApplicationTests {
    @Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

    @Test
    public void givenTaskletsJob_whenJobEnds_thenStatusCompleted() throws Exception {
        JobExecution jobExecution = jobLauncherTestUtils.launchJob();
        assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus());
    }
}

ContextConfiguration注解指定spring上下文配置及JobLauncherTestUtils测试类。

所有都好了,开始运行测试。运行完成之后,output.csv有了期望的内容,日志显示如下:

[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader initialized.
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.tasklets.LinesReader - Read line: [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.tasklets.LinesReader - Lines Reader ended.
[main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor initialized.
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 64 for line [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 49 for line [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.tasklets.LinesProcessor - Lines Processor ended.
[main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer initialized.
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Gary Potter,02/22/1953,64]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Betty Wise,02/17/1968,49]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22]
[main] DEBUG o.b.t.tasklets.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25]
[main] DEBUG o.b.t.tasklets.LinesWriter - Lines Writer ended.

Chunk方法

分析与设计

见名思议,该方法基于数据块(一部分数据)执行。也就是说,其不是一次读、处理和写所有行,而是一次仅读、处理、写固定数量记录。然后重复循环执行直到读不到数据为止。
因此,此流程与上面有些差异:
while 有数据:
do X 行数据
读一行
处理一行
写 X 行数据
end while

所以,我们需要针对该方法创建三个bean:

public class LineReader {
     // ...
}
public class LineProcessor {
    // ...
}
public class LinesWriter {
    // ...
}

在具体实现业务之前,我们首先配置job:

@Configuration
@EnableBatchProcessing
public class ChunksConfig {
    private final JobBuilderFactory jobs;
    private final StepBuilderFactory steps;

    public ChunksConfig(JobBuilderFactory jobs, StepBuilderFactory steps) {
        this.jobs = jobs;
        this.steps = steps;
    }

    @Bean
    public ItemReader itemReader() {
        return new LineReader();
    }

    @Bean
    public ItemProcessor itemProcessor() {
        return new LineProcessor();
    }

    @Bean
    public ItemWriter itemWriter() {
        return new LinesWriter();
    }

    @Bean
    protected Step processLines(ItemReader reader,
                                ItemProcessor processor, ItemWriter writer) {
        return steps.get("processLines"). chunk(2)
            .reader(reader)
            .processor(processor)
            .writer(writer)
            .build();
    }

    @Bean
    public Job job() {
        return jobs
            .get("chunksJob")
            .start(processLines(itemReader(), itemProcessor(), itemWriter()))
            .build();
    }
}

这里我们仅一个步骤执行一个任务。但是任务中定义了基于数据库进行读、处理以及写环节。
需要提醒的是,提交量表示在一个数据块中处理数据的数量。因此我们会一次读、处理、写两行。
好了,下面该实现每个环节的业务逻辑。

LineReader

LineReader负责读一条记录,然后返回Line实例。为了实现读环节,需实现ItemReader接口:

public class LineReader implements
  ItemReader, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LineReader.class);
  
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        fu = new FileUtils("taskletsvschunks/input/tasklets-vs-chunks.csv");
        logger.debug("Line Reader initialized.");
    }
 
    @Override
    public Line read() throws Exception {
        Line line = fu.readLine();
        if (line != null) logger.debug("Read line: " + line.toString());
        return line;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeReader();
        logger.debug("Line Reader ended.");
        return ExitStatus.COMPLETED;
    }
}

read()方法很直接,读一行数据并返回Line实例。同时我们也实现了StepExecutionListener接口,其中beforeStep 和 afterStep方法在整个步骤之前和之后运行,负责初始化资源和关闭资源。

LineProcessor

LineProcessor遵循的逻辑与LineReader几乎相同,需要实现process方法,其接收一个输入line,处理并返回输出line对象。

public class LineProcessor implements
  ItemProcessor, StepExecutionListener {
 
    private Logger logger = LoggerFactory.getLogger(LineProcessor.class);
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        logger.debug("Line Processor initialized.");
    }
     
    @Override
    public Line process(Line line) throws Exception {
        long age = ChronoUnit.YEARS
          .between(line.getDob(), LocalDate.now());
        logger.debug(
          "Calculated age " + age + " for line " + line.toString());
        line.setAge(age);
        return line;
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        logger.debug("Line Processor ended.");
        return ExitStatus.COMPLETED;
    }
}

同时也实现StepExecutionListener接口,这里仅用于输出日志,方便理解执行机制。

### LinesWriter

与reader 和 processor 不同,LinesWriter写数据块中所有行,因此write方法接收List参数。

public class LinesWriter implements
  ItemWriter, StepExecutionListener {
 
    private final Logger logger = LoggerFactory
      .getLogger(LinesWriter.class);
  
    private FileUtils fu;
 
    @Override
    public void beforeStep(StepExecution stepExecution) {
        fu = new FileUtils("output.csv");
        logger.debug("Line Writer initialized.");
    }
 
    @Override
    public void write(List lines) throws Exception {
        for (Line line : lines) {
            fu.writeLine(line);
            logger.debug("Wrote line " + line.toString());
        }
    }
 
    @Override
    public ExitStatus afterStep(StepExecution stepExecution) {
        fu.closeWriter();
        logger.debug("Line Writer ended.");
        return ExitStatus.COMPLETED;
    }
}

其他代码几乎是自描述的,下面开始测试job。

运行并测试job

我们创建一个新的测试,与之前测试tasklet方法一致:

@RunWith(SpringRunner.class)
@ContextConfiguration(classes = {ChunksConfig.class, TaskletsTestConfig.class})
public class ChunksTest {

    @Autowired
    private JobLauncherTestUtils jobLauncherTestUtils;

    @Test
    public void givenChunksJob_whenJobEnds_thenStatusCompleted()
    throws Exception {

        JobExecution jobExecution = jobLauncherTestUtils.launchJob();

        assertEquals(ExitStatus.COMPLETED, jobExecution.getExitStatus());
    }
}

配置了ChunksConfig之后,开始运行测试。job正常运行后,生成了output.csv文件,包含期望结果,且日志输出如下:

[main] DEBUG o.b.t.chunks.LineReader - Line Reader initialized.
[main] DEBUG o.b.t.chunks.LinesWriter - Line Writer initialized.
[main] DEBUG o.b.t.chunks.LineProcessor - Line Processor initialized.
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 45 for line [Mae Hodges,10/22/1972]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 64 for line [Gary Potter,02/22/1953]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Mae Hodges,10/22/1972,45]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Gary Potter,02/22/1953,64]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 49 for line [Betty Wise,02/17/1968]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 40 for line [Wayne Rose,04/06/1977]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Betty Wise,02/17/1968,49]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Wayne Rose,04/06/1977,40]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.chunks.LineReader - Read line: [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 22 for line [Adam Caldwell,09/27/1995]
[main] DEBUG o.b.t.chunks.LineProcessor - Calculated age 25 for line [Lucille Phillips,05/14/1992]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Adam Caldwell,09/27/1995,22]
[main] DEBUG o.b.t.chunks.LinesWriter - Wrote line [Lucille Phillips,05/14/1992,25]
[main] DEBUG o.b.t.chunks.LineProcessor - Line Processor ended.
[main] DEBUG o.b.t.chunks.LinesWriter - Line Writer ended.
[main] DEBUG o.b.t.chunks.LineReader - Line Reader ended.

两个方法结果相同,但流程不同,日志结果显示两个方法的执行流程的差异。

总结

两者差异显示了各自适用场景。tasklet更适合一个步骤到另一个步骤场景。chunk提供简单解决方案:实现处理分页读,或我们不想在内存中保留大量数据场景。

你可能感兴趣的:(spring,batch)