Spring batch自定义LineMapper实现特殊文本的处理

阅读更多

spring-batch是Java Spring的的批处理框架,而且提供了简单的文本处理功能。

下面的的例子,实现了对文本的下载和处理的流程。

 

	
    	
    		
    			
    				
    				
    				
    				
    				
    			
    		
    	
    	
    		
	    	
	    		
	    		
    		
    	
    
    
    
    
    	
    	
    	
    	    
    		    	
    
    
    		
    
     
    

 

 

    	
    	
    		
    		
    	
    
    
    	
   		
   			
   				
   				
   			
   		
   	
    
     	
    	
	   		
	   			tradeTime
	   			pubAccountId
	   			merchantId
	   			subMerchantId
	   			deviceId
	   			wxOrderId
	   			merchantOrderId
	   			userTag
	   			tradeType
	   			tradeStatus
	   			payerBank
	   			capitalType
	   			totalAmount
	   			enterpriseRedAmount
	   			wxRefundId
	   			merchantRefundId
	   			refundAmount
	   			enterpriseRedRefundAmount
	   			refundType
	   			refundStatus
	   			goodsName
	   			merchantData
	   			fee
	   			feeRate
	   		
    	
    
    
     	
     	
	   		
	   			totalCount
	   			totalAmount
	   			refundAmount
	   			enterpriseRedRefundAmount
	   			fee
	   		
    	
    

 可以看出,DefaultLineMapper实现了简单的文本处理功能,直接将切分的工作交给了wxMultiTokenizer处理文本的工作交给了WxFileSetMapper。这两个分别由两个bean实现注入,spring类DefaultLineMapper的代码如下:

 

package org.springframework.batch.item.file.mapping;
 
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.util.Assert;
 
/**
 * Two-phase {@link LineMapper} implementation consisting of tokenization of the line into {@link FieldSet} followed by
 * mapping to item. If finer grained control of exceptions is needed, the {@link LineMapper} interface should be
 * implemented directly.
 * 
 * @author Robert Kasanicky
 * @author Lucas Ward
 * 
 * @param  type of the item
 */
public class DefaultLineMapper implements LineMapper, InitializingBean {
 
    private LineTokenizer tokenizer;
 
    private FieldSetMapper fieldSetMapper;
 
    @Override
    public T mapLine(String line, int lineNumber) throws Exception {
        return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));
    }
 
    public void setLineTokenizer(LineTokenizer tokenizer) {
        this.tokenizer = tokenizer;
    }
 
    public void setFieldSetMapper(FieldSetMapper fieldSetMapper) {
        this.fieldSetMapper = fieldSetMapper;
    }
 
    @Override
    public void afterPropertiesSet() {
        Assert.notNull(tokenizer, "The LineTokenizer must be set");
        Assert.notNull(fieldSetMapper, "The FieldSetMapper must be set");
    }
}
 

核心方法mapLine就一句话:return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));

tokenizer解析后的结果,直接交给fieldSetMapper的mapFieldSet处理,用户仅需要自定义mapFieldSet方法实现如何处理。如何解析和如何处理不是本文关心的内容,后续笔者会专门介绍。

在这里有一个问题,如果文本有空行,tokenizer.tokenize(line)会返回""而不是null,导致fieldSetMapper.mapFieldSet抛异常,而FlatFileItemReader提供的方法又无法跳过空行(暂时还没有找到办法),在这里笔者仅做了一个简单的改造,实现了自己的LineMapper,代码如下:

package com.secondgame.demo_service.demo.batch.task;

import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.util.Assert;

import com.alibaba.dubbo.common.utils.StringUtils;

public class MyLineMapper implements LineMapper, InitializingBean {
 
    private LineTokenizer tokenizer;
 
    private FieldSetMapper fieldSetMapper;
 
    @Override
    public T mapLine(String line, int lineNumber) throws Exception {
    	if(StringUtils.isNotEmpty(line)){
    		return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));
    	}else{
    		return null;
    	}
    }
 
    public void setLineTokenizer(LineTokenizer tokenizer) {
        this.tokenizer = tokenizer;
    }
 
    public void setFieldSetMapper(FieldSetMapper fieldSetMapper) {
        this.fieldSetMapper = fieldSetMapper;
    }
 
    @Override
    public void afterPropertiesSet() {
        Assert.notNull(tokenizer, "The LineTokenizer must be set");
        Assert.notNull(fieldSetMapper, "The FieldSetMapper must be set");
    }
}

总体代码都一样,只是在方法mapLine中对空串进行特别处理即可。框架对null的返回不作处理。

当然xml中,仍需修改一行代码,引用自定义的mapper即可

 
   

 写了半天,就是两句话的事儿:

 1. 模仿重写DefaultLineMapper的mapLine方法适配空行

 2. 配置文件使用自定义的MyLineMapper方法

 3. 其它的更复杂的需求也可以用这个方法:比如过滤含有某些特殊字符的行(非行首),对特别文本进行替换(A替换成B)等等,都可以在这里进行行级别的处理

 

 

 

 

你可能感兴趣的:(Spring,Batch,空行,文本处理)