大家好!砸门又见面了。今天来玩一下Hive数据导出。
导出的方式有以下几种
1)hadoop命令的方式
get
text
2)通过insert…directory方式
insert overwrite[local] directory '/tmp/ca employees'
[row format delimited fields terminated by '\t']
select name,salary,address
from employees
样本代码
3)shell命令加管道:hive -f/e | sed/grep/awk>file
4)第三方工具,如sqoop
好,砸门开始实验吧!
1)hadoop命令的方式
hive> select * from testtext;
OK
wer 46
wer 89
weree 78
rr 89
Time taken: 0.212 seconds
hive>
[root@hadoop1 host]# hadoop fs -get /user/hive/warehouse/testtext /usr/host/data2/
[root@hadoop1 host]# cd data2
[root@hadoop1 data2]# ll
total 4
drwxr-xr-x. 2 root root 4096 Jun 2 02:23 testtext
[root@hadoop1 data2]#
[root@hadoop1 data2]# hadoop fs -text /user/hive/warehouse/testtext/*
wer 46
wer 89
weree 78
rr 89
[root@hadoop1 data2]#
注意:我的hive保存在hdfs上的路径是/user/hive/warehouse,这是在配置hive-site.xml文件时指定的。
当然还可以重定向:
[root@hadoop1 data2]# hadoop fs -text /user/hive/warehouse/testtext/* > newdata2
[root@hadoop1 data2]# ll
total 8
-rw-r--r--. 1 root root 29 Jun 2 02:26 newdata2
drwxr-xr-x. 2 root root 4096 Jun 2 02:23 testtext
[root@hadoop1 data2]# cat newdata2
wer 46
wer 89
weree 78
rr 89
[root@hadoop1 data2]#
(两个大于号是追加,一个大于号是重写覆盖)
2)通过insert…directory方式
hive> insert overwrite local directory '/usr/host/data3'
> select name,addr
> from testtext;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 1; number of reducers: 0
2016-06-02 02:32:44,084 null map = 0%, reduce = 0%
2016-06-02 02:32:56,469 null map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2016-06-02 02:32:57,543 null map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
2016-06-02 02:32:58,658 null map = 100%, reduce = 0%, Cumulative CPU 0.83 sec
MapReduce Total cumulative CPU time: 830 msec
Ended Job = job_1464828076391_0014
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Copying data to local directory /usr/host/data3
Copying data to local directory /usr/host/data3
OK
Time taken: 29.525 seconds
hive>
查一下data3文件夹
[root@hadoop1 host]# cd data3
[root@hadoop1 data3]# ll
total 4
-rw-r--r--. 1 root root 29 Jun 2 02:32 000000_0
[root@hadoop1 data3]# cat 000000_0
wer46
wer89
weree78
rr89
[root@hadoop1 data3]#
下载到hdfs上,则去掉local,不需要格式row….
hive> insert overwrite directory '/data3'
> select name,addr
> from testtext;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 1; number of reducers: 0
2016-06-02 02:37:27,137 null map = 0%, reduce = 0%
2016-06-02 02:37:34,847 null map = 100%, reduce = 0%
Ended Job = job_1464828076391_0015
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Moving data to: /data3
OK
Time taken: 27.902 seconds
hive>
真是百发百中,屡试不爽呀!
3)shell命令加管道:
hive -f/e | sed/grep/awk>file
[root@hadoop1 data3]# hive -e "select * from testtext"
OK
wer 46
wer 89
weree 78
rr 89
Time taken: 5.879 seconds
[root@hadoop1 data3]# hive -S -e "select * from testtext" | grep wer
wer 89
weree 78
[root@hadoop1 data3]#
加-S的好处是控制台上可以少很多信息。
有些累了,休息一会。如果你看到此文,想进一步学习或者和我沟通,加我微信公众号:名字:五十年后 。