1 现象:
通过sqoop将oracle表数据导入hdfs,在oracle的数据为null下,sqoop自动将null变成null字符串存储到hdfs上
这样,在执行hive脚本:
case when a.begindate is not null and a.enddate >= '%current_date%' and instrfun(a.status,'无效')=0 then 'R商标' when a.begindate is not null and instrfun(a.status,'无效')>0 or instrfun(a.status,'注销')>0 then '过期商标' when a.enddate < '%current_date%' then '过期商标' when a.begindate is not null and (instrfun(a.status,'无效')>0 or instrfun(b.xiangmu_new,'无效')>0 or instrfun(b.xiangmu_new,'驳回')>0) then '无效(被否)商标' when a.begindate is null then 'TM商标' else '未知' end MARKTYPE_NEW
的时候, 因为begindate 已经是 'null'字符串了因此上述赋值失效,
处理这种方式如下:
1 sqoop导入数据前针对javabean的字段做预处理比如赋值为 ''
2 或者上述脚本修改为:
case when a.begindate <> 'null' and a.enddate >= '%current_date%' and instrfun(a.status,'无效')=0 then 'R商标' when a.begindate <> 'null' and instrfun(a.status,'无效')>0 or instrfun(a.status,'注销')>0 then '过期商标' when a.enddate < '%current_date%' then '过期商标' when a.begindate <> 'null' and (instrfun(a.status,'无效')>0 or instrfun(b.xiangmu_new,'无效')>0 or instrfun(b.xiangmu_new,'驳回')>0) then '无效(被否)商标' when a.begindate = 'null' then 'TM商标' else '未知' end MARKTYPE_NEW
3 正规要在sqoop导入数据的时候 将oracle的null入hdfs也是null的写法如下:
Sqoop will by default import NULL values as string null. Hive is however using string \N to denote NULL values and therefore predicates dealing with NULL (like IS NULL) will not work correctly. You should append parameters --null-string and --null-non-string in case of import job or --input-null-string and --input-null-non-string in case of an export job if you wish to properly preserve NULL values. Because sqoop is using those parameters in generated code, you need to properly escape value \N to \\N: $ sqoop import ... --null-string '\\N' --null-non-string '\\N'
参考链接:
http://bbs.csdn.net/topics/390459433?page=1