加载数据到hive中的方式
官网:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
* [LOCAL]
表示数据文件在本地还是hdfs上
如果是加载hdfs上文件数据到表中去,实质是将hdfs上文件move到对应的hive表中的目录下。
* [OVERWRITE]
表示是否覆盖表中已有的数据
1、准备工作:
use db_hive_0927 ;
drop table if exists emp ;
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
hive> load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
Copying data from file:/opt/hive-0.13.1/emp.txt
Copying file: file:/opt/hive-0.13.1/emp.txt
Loading data to table db_hive_0927.emp
Table db_hive_0927.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]
OK
Time taken: 4.471 seconds
load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
hive> load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
Copying data from file:/opt/hive-0.13.1/emp.txt
Copying file: file:/opt/hive-0.13.1/emp.txt
Loading data to table db_hive_0927.emp
Table db_hive_0927.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]
OK
Time taken: 4.471 seconds
3、加载hdfs到hive中
load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
hive> load data inpath '/emp.txt' into table emp ;
Loading data to table db_hive_0927.emp
Table db_hive_0927.emp stats: [numFiles=2, numRows=0, totalSize=1312, rawDataSize=0]
OK
Time taken: 11.055 seconds
4、覆盖已有表中的数据
load data inpath '/emp.txt' overwrite into table emp ;
hive> load data inpath '/emp.txt' overwrite into table emp ;
FAILED: SemanticException Line 1:17 Invalid path ''/emp.txt'': No files matching path hdfs://cluster/emp.txt
报错!!!!!!!!!!因为加载hdfs上文件数据到表中,实质是将hdfs上文件move到对应的hive表中的目录下没有找到文件
5、创建表的时候通过select加载数据
create table db_hive_0927.emp_load as select empno, ename, deptno from db_hive_0927.emp ;
hive> create table db_hive_0927.emp_load as select empno, ename, deptno from db_hive_0927.emp ;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0006, Tracking URL = http://cloud1:8088/proxy/application_1445142802171_0006/
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0006
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2015-10-19 08:49:57,599 Stage-1 map = 0%, reduce = 0%
2015-10-19 08:50:58,584 Stage-1 map = 0%, reduce = 0%
2015-10-19 08:51:28,175 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 5.75 sec
2015-10-19 08:52:28,757 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.66 sec
2015-10-19 08:52:59,494 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 12.5 sec
MapReduce Total cumulative CPU time: 12 seconds 500 msec
Ended Job = job_1445142802171_0006
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0007, Tracking URL = http://cloud1:8088/proxy/application_1445142802171_0007/
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0007
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-10-19 08:54:11,395 Stage-3 map = 0%, reduce = 0%
2015-10-19 08:54:26,512 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.43 sec
MapReduce Total cumulative CPU time: 2 seconds 430 msec
Ended Job = job_1445142802171_0007
Moving data to: hdfs://cluster/user/hive/warehouse/db_hive_0927.db/emp_load
Table db_hive_0927.emp_load stats: [numFiles=1, numRows=0, totalSize=392, rawDataSize=0]
MapReduce Jobs Launched:
Job 0: Map: 2 Cumulative CPU: 13.36 sec HDFS Read: 1751 HDFS Write: 574 SUCCESS
Job 1: Map: 1 Cumulative CPU: 2.43 sec HDFS Read: 761 HDFS Write: 392 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 790 msec
OK
Time taken: 378.773 seconds
6、创建表时,通过insert方式加载数据
hive> create table db_hive_0927.emp_ins like db_hive_0927.emp ;
OK
Time taken: 0.505 seconds
hive> desc db_hive_0927.emp_ins;
OK
empno int
ename string
job string
mgr int
hiredate string
sal double
comm double
deptno int
Time taken: 0.587 seconds, Fetched: 8 row(s)
use db_hive_0927 ;
追加数据
insert into table emp_ins select * from emp ;
hive> insert into table emp_ins select * from emp ;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0008, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0008
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2015-10-19 09:04:33,042 Stage-1 map = 0%, reduce = 0%
2015-10-19 09:05:18,854 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.72 sec
MapReduce Total cumulative CPU time: 9 seconds 720 msec
Ended Job = job_1445142802171_0008
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0009, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0009
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-10-19 09:08:57,711 Stage-3 map = 0%, reduce = 0%
2015-10-19 09:09:58,092 Stage-3 map = 0%, reduce = 0%
2015-10-19 09:11:08,156 Stage-3 map = 0%, reduce = 0%
2015-10-19 09:12:09,288 Stage-3 map = 0%, reduce = 0%
2015-10-19 09:13:12,190 Stage-3 map = 0%, reduce = 0%
2015-10-19 09:14:49,177 Stage-3 map = 0%, reduce = 0%
2015-10-19 09:15:53,783 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.28 sec
MapReduce Total cumulative CPU time: 4 seconds 280 msec
Ended Job = job_1445142802171_0009
Loading data to table db_hive_0927.emp_ins
Table db_hive_0927.emp_ins stats: [numFiles=1, numRows=28, totalSize=1322, rawDataSize=1294]
MapReduce Jobs Launched:
Job 0: Map: 2 Cumulative CPU: 10.46 sec HDFS Read: 1751 HDFS Write: 1476 SUCCESS
Job 1: Map: 1 Cumulative CPU: 7.96 sec HDFS Read: 1691 HDFS Write: 1322 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 420 msec
OK
Time taken: 1180.09 seconds
覆盖表中数据
insert overwrite table emp_ins select * from emp ;
from emp insert overwrite table emp_ins select * ;
7、 创建表时,通过location指定数据存储位置,加载数据
create external table emp_ext_loc like emp
location '/test/emp.txt' ;
这里好像有问题!!!求助 运行报错!!!