Kettle7中使用Java脚本进行数据清洗

1.首先,Kettle7.1下载li链接:https://sourceforge.net/projects/pentaho/files/Data%20Integration/7.1/pdi-ce-7.1.0.0-12.zip/download

2.打开这个文件:Spoon.bat

Kettle7中使用Java脚本进行数据清洗_第1张图片

3.假设就这三个步骤:

Kettle7中使用Java脚本进行数据清洗_第2张图片

4.其中组件“Java代码”的内容如下:


import java.text.SimpleDateFormat;
public boolean processRow(StepMetaInterface smi,StepDataInterface sdi) throws Exception{
	   Object[] r=getRow();
 
  	   if(r==null)
       {
            setOutputDone();
            return false;
       }
       r=createOutputRow(r,data.outputRowMeta.size());
	   String HANDSET_NUM_ = get(Fields.In, "HANDSET_NUM_").getString(r);
		String TO_ACCOUNT_ = get(Fields.In, "TO_ACCOUNT_").getString(r);
		String FROM_ACCOUNT_ = get(Fields.In, "FROM_ACCOUNT_").getString(r);
      
	   logBasic("---> HANDSET_NUM_: "+HANDSET_NUM_);
		logBasic("TO_ACCOUNT_: "+TO_ACCOUNT_);
		logBasic("FROM_ACCOUNT_: "+FROM_ACCOUNT_);

	   if(HANDSET_NUM_!=null)
       {
            String format = HANDSET_NUM_.replace("+86", "").replace("+85", "").replaceAll("-", "").replaceAll("\\+", "");
			get(Fields.In, "HANDSET_NUM_").setValue(r, format.trim());
            logBasic("format1: "+format);
            //putRow(data.outputRowMeta,r);
       }
       if(TO_ACCOUNT_!=null)
       {
            String format = TO_ACCOUNT_.replace("+86", "").replace("+85", "").replaceAll("-", "").replaceAll("\\+", "");
		
       	    get(Fields.In, "TO_ACCOUNT_").setValue(r, format.trim());
            logBasic("format2: "+format);
            //putRow(data.outputRowMeta,r);
       }
       
		if(FROM_ACCOUNT_!=null)
       {
            String format = FROM_ACCOUNT_.replace("+86", "").replace("+85", "").replaceAll("-", "").replaceAll("\\+", "");
       	    get(Fields.In, "FROM_ACCOUNT_").setValue(r, format.trim());
            logBasic("format3: "+format);
           // putRow(data.outputRowMeta,r);
       }

	   putRow(data.outputRowMeta,r);
       return true;
}

5.转换demo下载链接:https://download.csdn.net/download/bastriver/11072483

  直接用Kettle打开就可以了。

 

你可能感兴趣的:(软件/工具,Etl工具)