最近一直使用Pentaho 的 spoon 进行数据挖掘的ETL数据处理 在使用其中的 "If field value is null "的插件的时候,发现当我的csv 的某个字段值 为空或者null 的时候该判断失效,总不起作用后来google 到 pentahao 的网站看到有人问同样的问题,内容大概如下:
=========================================================================
PDI 3.2 has this new step "If field value is null".
I am trying to use it in a transformation:
Access Input -> Add constants -> If field value is null -> Table output Mapping -> Table output (which is a MySQL table)
For those fields in the MySQL table that accept NULL I'd like to change to UNDEFINED so this step seems ideal.
So in the step I select "Select fields" and specify a handful of fields under the Fields section and specify the "Replace by value" value. I have specified and not specified a "Conversion mask (Date)". Is the Conversion mask necessary? In all cases I still have NULL written into fields in the MySQL table. All the MySQL fields that accept NULL are varchars.
There is not much on this step in the wiki:
http://wiki.pentaho.com/display/EAI/...+value+is+null
A previous post about this step:
http://forums.pentaho.org/showthread.php?t=70745
doesn't conclude with any advice pertaining to this particular step.
Am I doing something wrong?
Thanks in advance.
Cheers
=================================================================================
后来看到一个回复的帖子说这个是个bug 将在3.2 以上版本或者4.0进行更新,而目前能在官网下载的只有3.2.0的版本
那我们该怎么办? 幸好spoon 是开源软件 ,可以下载源代码 并找到 “If field value is null ” 插件所对应的类文件
org.pentaho.di.trans.steps.ifnull.IfNull.java 在 replaceNull 方法设置断点进行跟踪得知 第213行
public void replaceNull(Object[] row, int i) throws Exception
{
if(row[i]==null)
{
// DO CONVERSION OF THE DEFAULT VALUE ...
// Entered by user
ValueMetaInterface targetValueMeta = data.outputRowMeta.getValueMeta(i);
ValueMetaInterface sourceValueMeta = data.convertRowMeta.getValueMeta(i);
if(!Const.isEmpty(data.realconversionMask)) sourceValueMeta.setConversionMask(data.realconversionMask);
row[i] = targetValueMeta.convertData(sourceValueMeta, data.realReplaceByValue);
}
}
当 csv 文件里的某个值虽然为空或者空字符串的时候 Object[] row 的数组里的对象并不为 null 而对象里的值实际上为null
所以我们要利用 ValueMetaInterface.isNull() 来判断,接下来我们修改程序结构 红色部分是修改的
public void replaceNull(Object[] row, int i) throws Exception
{
ValueMetaInterface targetValueMeta = data.outputRowMeta.getValueMeta(i);
ValueMetaInterface sourceValueMeta = data.convertRowMeta.getValueMeta(i);
if(row[i]==null||sourceValueMeta.isNull(row[i]))
{
// DO CONVERSION OF THE DEFAULT VALUE ...
// Entered by user
if(!Const.isEmpty(data.realconversionMask)) sourceValueMeta.setConversionMask(data.realconversionMask);
row[i] = targetValueMeta.convertData(sourceValueMeta, data.realReplaceByValue);
}
}
编译后再次运行 一切正常。
补充,另外请修改 org.pentaho.di.core.row.ValueMeta.java 的
public boolean isNull(Object data) throws KettleValueException 的 2825行
将 原来的
if (((String)value).length()==0) return true;
替换成
String str=String.valueOf(value); if (str.length()==0) return true;
因为某种情况下String 类型强制转换会造成异常错误,推荐使用 String.valueOf();