继续排坑(一,数仓统一规则)

在做数据仓库的时候,最关键、最首要的工作就是导数据!

导数据的时候,最关键、最首要的工作就是数据统一规则!

1、空数据

2、字符串中空数据

3、脏数据

4、字段名统一

5、字段类型统一

...

这里要讲的是两个坑:

one:第一个坑比较常见,就是第五个统一规则里面出来的,1)mysql的字段类型是tinyint(1) ,在用sqoop导数的时候,连接数据库就要有设置tinyInt1isBit=false,2)mysql的字段类型是时间,但默认值有0000-00-00 00:00:00,在用sqoop导数的时候,连接数据库就要有设置zeroDateTimeBehavior=convertToNull,3)当然还有其他比如编码类型utf-8等等。

最后的sqoop中的连接就是长这个样子的:--connect jdbc:mysql://ip:3306/db?zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false

two:第二个坑那就太坑了,第二个统一规则中出来的,空数据,mysql中有字段名称是关键字的,比如type、order、desc这种,在写进sqoop的时候,要用`这个符号标识一下,用于sqoop中就是--query "select \`type\`,\`desc\` from table where \$CONDITIONS",然鹅,这个不是坑的地方,这个字符串类型的以关键字命名的字段,数据中竟然还带有'\t'制表符,表面看就是空数据,用sqoop的--hive-drop-import-delims去不掉的情况下,必须使用mysql的语法去掉,也就是REPLACE(col,CHR(9),''),那么我就想当然的这样写了:--query "select REPLACE(\`type\`,CHR(9),''),REPLACE(\`desc\`,CHR(9),'') from table where \$CONDITIONS",可是一直报错,一直完美的跑完了mapreduce,在load data 的时候来了这么一个bug

19/07/12 15:40:01 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/jars/hive-common-1.1.0-cdh5.13.3.jar!/hive-log4j.properties
NoViableAltException(86@[])
	at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:39101)
	at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:38866)
	at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:38566)
	at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:36789)
	at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:5057)
	at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2557)
	at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1589)
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1065)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1356)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1473)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1275)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:226)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:175)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:389)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:324)
	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:422)
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:438)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:732)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:331)
	at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:530)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
FAILED: ParseException line 1:140 cannot recognize input near 'desc' ',CHAR(9),'')' 'STRING' in column type

苦啊,到处找不到解决方法,最后的最后,我终于发现,‘坑’这个词的定义:最好解决的方式,往往不在网上,不在知识的海洋里,而是在不经意间的尝试!!

我试图这样写--query "select REPLACE(\`type\`,CHR(9),'')as \`type\`,REPLACE(\`desc\`,CHR(9),'')as \`desc\` from table where \$CONDITIONS"

完美!load进hive了。。。

 

 

你可能感兴趣的:(数据,sqoop)