Hive进阶之Hive数据导入

使用load语句导入数据

-语法:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE table name [PARTITION (partcoll=vall,partcol=val2 ...)]

如:

Hive进阶之Hive数据导入_第1张图片


注意如果创建表的时候没有规定分隔符那它默认是制表符(\t),而你导入的数据以','分隔,那便会成为空值如下所示:

Hive进阶之Hive数据导入_第2张图片

导入目录下的所有文件数据

Hive进阶之Hive数据导入_第3张图片

注意不写local代表从hdfs中导入

将数据导入分区

Hive进阶之Hive数据导入_第4张图片


使用Sqoop实现关系型数据库数据导入

下载地址
http://sqoop.apache.org/
sqoop安装请看sqoop安装篇

将mysql中的数据导入到hdfs中
注意了sqoop是在命令行中执行不是在hive中执行,我之前一直在hive中执行结果一直给我报这样的错
hive> sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by',';
NoViableAltException(26@[])
	at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:999)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'sqoop' 'import' ''
实际运行应该是这样

zj-db0236deMacBook-Pro:sbin zj-db0236$ sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table trade_detail --hive-import --hive-overwrite -m 1 --hive-table trade_detail --fields-terminated-by ','
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/27 15:25:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/27 15:25:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/27 15:25:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/27 15:25:35 INFO tool.CodeGenTool: Beginning code generation
17/06/27 15:25:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:25:35 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:25:35 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/zj-db0236/Downloads/hadoop-2.7.2
注: /tmp/sqoop-zj-db0236/compile/da5649c40aae421516a4a7b09474d590/trade_detail.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/06/27 15:25:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zj-db0236/compile/da5649c40aae421516a4a7b09474d590/trade_detail.jar
17/06/27 15:25:36 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/06/27 15:25:36 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/06/27 15:25:36 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/06/27 15:25:36 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/06/27 15:25:36 INFO mapreduce.ImportJobBase: Beginning import of trade_detail
17/06/27 15:26:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/27 15:26:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/27 15:26:08 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/27 15:26:08 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/27 15:26:10 INFO db.DBInputFormat: Using read commited transaction isolation
17/06/27 15:26:10 INFO mapreduce.JobSubmitter: number of splits:1
17/06/27 15:26:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498547617140_0002
17/06/27 15:26:10 INFO impl.YarnClientImpl: Submitted application application_1498547617140_0002
17/06/27 15:26:10 INFO mapreduce.Job: The url to track the job: http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498547617140_0002/
17/06/27 15:26:10 INFO mapreduce.Job: Running job: job_1498547617140_0002
17/06/27 15:26:48 INFO mapreduce.Job: Job job_1498547617140_0002 running in uber mode : false
17/06/27 15:26:49 INFO mapreduce.Job:  map 0% reduce 0%
17/06/27 15:27:24 INFO mapreduce.Job:  map 100% reduce 0%
17/06/27 15:27:24 INFO mapreduce.Job: Job job_1498547617140_0002 completed successfully
17/06/27 15:27:24 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=137758
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=119
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=33155
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=33155
		Total vcore-milliseconds taken by all map tasks=33155
		Total megabyte-milliseconds taken by all map tasks=33950720
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=41
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=149422080
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=119
17/06/27 15:27:24 INFO mapreduce.ImportJobBase: Transferred 119 bytes in 76.2361 seconds (1.5609 bytes/sec)
17/06/27 15:27:24 INFO mapreduce.ImportJobBase: Retrieved 5 records.
17/06/27 15:27:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `trade_detail` AS t LIMIT 1
17/06/27 15:27:24 INFO hive.HiveImport: Loading uploaded data into Hive
17/06/27 15:27:26 INFO hive.HiveImport: 
17/06/27 15:27:26 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/Users/zj-db0236/Downloads/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/06/27 15:28:00 INFO hive.HiveImport: OK
17/06/27 15:28:00 INFO hive.HiveImport: Time taken: 0.679 seconds
17/06/27 15:28:00 INFO hive.HiveImport: Loading data to table default.trade_detail
17/06/27 15:28:01 INFO hive.HiveImport: rmr: DEPRECATED: Please use 'rm -r' instead.
17/06/27 15:28:01 INFO hive.HiveImport: Deleted hdfs://localhost:9000/user/hive/warehouse/trade_detail
17/06/27 15:28:01 INFO hive.HiveImport: Table default.trade_detail stats: [numFiles=2, numRows=0, totalSize=119, rawDataSize=0]
17/06/27 15:28:01 INFO hive.HiveImport: OK
17/06/27 15:28:01 INFO hive.HiveImport: Time taken: 0.456 seconds
17/06/27 15:28:01 INFO hive.HiveImport: Hive import complete.

注意了:如果没有-m 1代表map启动1个如果不加这一句那么每条数据都会启动一个map最后你有多少条数据就会有多少分区,这样很浪费空间
sqoop指定参数说明

--append 将数据追加到hdfs中已经存在的dataset中。使用该参数,sqoop将把数据先导入到一个临时目录中,然后重新给文件命名到一个正式的目录中,以避免和该目录中已存在的文件重名。
--as-avrodatafile 将数据导入到一个Avro数据文件中
--as-sequencefile 将数据导入到一个sequence文件中
--as-textfile 将数据导入到一个普通文本文件中,生成该文本文件后,可以在hive中通过sql语句查询出结果。
--boundary-query 边界查询,也就是在导入前先通过SQL查询得到一个结果集,然后导入的数据就是该结果集内的数据,格式如:--boundary-query 'select id,no from t where id = 3',表示导入的数据为id=3的记录,或者 select min(), max() from ,注意查询的字段中不能有数据类型为字符串的字段,否则会报错
--columns 指定要导入的字段值,格式如:--columns id,username
--direct 直接导入模式,使用的是关系数据库自带的导入导出工具。官网上是说这样导入会更快
--direct-split-size 在使用上面direct直接导入的基础上,对导入的流按字节数分块,特别是使用直连模式从PostgreSQL导入数据的时候,可以将一个到达设定大小的文件分为几个独立的文件。
--inline-lob-limit 设定大对象数据类型的最大值
-m,--num-mappers 启动N个map来并行导入数据,默认是4个,最好不要将数字设置为高于集群的节点数
--query,-e 从查询结果中导入数据,该参数使用时必须指定–target-dir–hive-table,在查询语句中一定要有where条件且在where条件中需要包含 \$CONDITIONS,示例:--query 'select * from t where \$CONDITIONS ' --target-dir /tmp/t –hive-table t
--split-by 表的列名,用来切分工作单元,一般后面跟主键ID
--table 关系数据库表名,数据从该表中获取
--delete-target-dir 删除目标目录
--target-dir 指定hdfs路径
--warehouse-dir 与 --target-dir 不能同时使用,指定数据导入的存放目录,适用于hdfs导入,不适合导入hive目录
--where 从关系数据库导入数据时的查询条件,示例:--where "id = 2"
-z,--compress 压缩参数,默认情况下数据是没被压缩的,通过该参数可以使用gzip压缩算法对数据进行压缩,适用于SequenceFile, text文本文件, 和Avro文件
--compression-codec Hadoop压缩编码,默认是gzip
--null-string 可选参数,如果没有指定,则字符串null将被使用
--null-non-string 可选参数,如果没有指定,则字符串null将被使用
将hive的数据导出到mysql

sqoop export --connect "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --table hiveToMysql --password 123456 --export-dir /user/hive/warehouse/trade_detail/ --fields-terminated-by ','


结果

Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /Users/zj-db0236/Downloads/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/27 17:17:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/27 17:17:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/27 17:17:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/06/27 17:17:07 INFO tool.CodeGenTool: Beginning code generation
17/06/27 17:17:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hiveToMysql` AS t LIMIT 1
17/06/27 17:17:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hiveToMysql` AS t LIMIT 1
17/06/27 17:17:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /Users/zj-db0236/Downloads/hadoop-2.7.2
注: /tmp/sqoop-zj-db0236/compile/2f26ed69134261e462cebf51c09deff7/hiveToMysql.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
17/06/27 17:17:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-zj-db0236/compile/2f26ed69134261e462cebf51c09deff7/hiveToMysql.jar
17/06/27 17:17:10 INFO mapreduce.ExportJobBase: Beginning export of hiveToMysql
17/06/27 17:17:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/06/27 17:17:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/27 17:17:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/06/27 17:17:43 INFO input.FileInputFormat: Total input paths to process : 1
17/06/27 17:17:43 INFO input.FileInputFormat: Total input paths to process : 1
17/06/27 17:17:43 INFO mapreduce.JobSubmitter: number of splits:4
17/06/27 17:17:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
17/06/27 17:17:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498547617140_0003
17/06/27 17:17:44 INFO impl.YarnClientImpl: Submitted application application_1498547617140_0003
17/06/27 17:17:44 INFO mapreduce.Job: The url to track the job: http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498547617140_0003/
17/06/27 17:17:44 INFO mapreduce.Job: Running job: job_1498547617140_0003
17/06/27 17:18:22 INFO mapreduce.Job: Job job_1498547617140_0003 running in uber mode : false
17/06/27 17:18:22 INFO mapreduce.Job:  map 0% reduce 0%
17/06/27 17:19:03 INFO mapreduce.Job:  map 75% reduce 0%
17/06/27 17:19:04 INFO mapreduce.Job:  map 100% reduce 0%
17/06/27 17:19:04 INFO mapreduce.Job: Job job_1498547617140_0003 completed successfully
17/06/27 17:19:04 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=549964
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1009
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=19
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=4
		Data-local map tasks=4
		Total time spent by all maps in occupied slots (ms)=152967
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=152967
		Total vcore-milliseconds taken by all map tasks=152967
		Total megabyte-milliseconds taken by all map tasks=156638208
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=676
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=206
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=577241088
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=0
17/06/27 17:19:04 INFO mapreduce.ExportJobBase: Transferred 1,009 bytes in 82.6365 seconds (12.2101 bytes/sec)
17/06/27 17:19:04 INFO mapreduce.ExportJobBase: Exported 5 records.
Hive进阶之Hive数据导入_第5张图片










你可能感兴趣的:(大数据开发)