Hive四种数据导入方式

Hive常见数据导入方式:
1、从本地导入数据到hive表
2、从hdfs导入数据到hive表
3、从其他表导入数据到hive表
4、创建表时从其他表导入数据到hive表

一、首先看前两种方式
官方导入语法格式

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

1、从本地导入数据到hive表中

首先创建一个空emp表
hive (default)> create table emp(
              > empno int,
              > ename string,
              > job string,
              > mgr int,
              > hiredate string,
              > sal double,
              > comm double,
              > deptno int
              > )
              > COMMENT 'The table belong to hive database'
              > row format delimited fields terminated by '\t';
OK
Time taken: 0.412 seconds
hive (default)> select * from emp;
OK
emp.empno       emp.ename       emp.job emp.mgr emp.hiredate    emp.sal emp.comm        emp.deptno
Time taken: 0.21 seconds

将/root/data/emp.txt导入到emp表中,emp.txt内容如下

[root@hadoop001 data]# cat emp.txt
7369    SMITH   CLERK   7902    1980-12-17      800.00          20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.00 300.00  30
7521    WARD    SALESMAN        7698    1981-2-22       1250.00 500.00  30
7566    JONES   MANAGER 7839    1981-4-2        2975.00         20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.00 1400.00 30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.00         30
7782    CLARK   MANAGER 7839    1981-6-9        2450.00         10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.00         20
7839    KING    PRESIDENT               1981-11-17      5000.00         10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.00 0.00    30
7876    ADAMS   CLERK   7788    1987-5-23       1100.00         20
7900    JAMES   CLERK   7698    1981-12-3       950.00          30
7902    FORD    ANALYST 7566    1981-12-3       3000.00         20
7934    MILLER  CLERK   7782    1982-1-23       1300.00         10

导入

hive (default)> load data local inpath  '/root/data/emp.txt' overwrite into table emp;
Loading data to table default.emp
OK
Time taken: 1.987 seconds

验证数据是否已导入

hive (default)> select * from emp;
OK
emp.empno       emp.ename       emp.job emp.mgr emp.hiredate    emp.sal emp.comm        emp.deptno
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
8888    HIVE    PROGRAM 7839    1988-1-23       10300.0 NULL    NULL
Time taken: 1.915 seconds, Fetched: 15 row(s)

2、从HDFS上导入数据

首先创建一个空emp2表
hive (default)> create table emp2(
               empno int,
               ename string,
               job string,
               mgr int,
               hiredate string,
               sal double,
               comm double,
               deptno int )
               COMMENT 'The table belong to hive database'
               row format delimited fields terminated by '\t';

将emp.txt文件放到HDFS上

hadoop fs -put /root/data/emp.txt /

从HDFS将emp.txt内容导入到emp2表中

hive (default)> load data inpath 'hdfs://mycluster/emp.txt' overwrite into table emp2;
Loading data to table default.emp2
OK
Time taken: 0.526 seconds

查看

hive (default)> select * from emp2;
OK
emp2.empno      emp2.ename      emp2.job        emp2.mgr        emp2.hiredate   emp2.sal        emp2.comm       emp2.deptno
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
8888    HIVE    PROGRAM 7839    1988-1-23       10300.0 NULL    NULL
Time taken: 0.352 seconds, Fetched: 15 row(s)

二、再来看3、4两种方式
1、从其他表导入数据到hive表

官方语法格式
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

创建一个空emp3表

hive (default)> create table emp3(
               empno int,
               ename string,
               job string,
               mgr int,
               hiredate string,
               sal double,
               comm double,
               deptno int )
               COMMENT 'The table belong to hive database'
               row format delimited fields terminated by '\t';

将emp2表的数据导入到emp3

hive (default)> insert overwrite table emp3 select * from emp2;

查看

hive (default)> select * from emp3;
OK
emp3.empno      emp3.ename      emp3.job        emp3.mgr        emp3.hiredate   emp3.sal        emp3.comm       emp3.deptno
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
8888    HIVE    PROGRAM 7839    1988-1-23       10300.0 NULL    NULL
Time taken: 0.219 seconds, Fetched: 15 row(s)

2、创建表就从其他表中导入数据到新表中(CTAS)
创建emp4表并从emp2表中将数据导入

hive (default)> create table emp4
               COMMENT 'The table belong to hive database'
               row format delimited fields terminated by '\t'
               AS select * from emp2;

查看

hive (default)> select * from emp4;
OK
emp4.empno      emp4.ename      emp4.job        emp4.mgr        emp4.hiredate   emp4.sal        emp4.comm       emp4.deptno
7369    SMITH   CLERK   7902    1980-12-17      800.0   NULL    20
7499    ALLEN   SALESMAN        7698    1981-2-20       1600.0  300.0   30
7521    WARD    SALESMAN        7698    1981-2-22       1250.0  500.0   30
7566    JONES   MANAGER 7839    1981-4-2        2975.0  NULL    20
7654    MARTIN  SALESMAN        7698    1981-9-28       1250.0  1400.0  30
7698    BLAKE   MANAGER 7839    1981-5-1        2850.0  NULL    30
7782    CLARK   MANAGER 7839    1981-6-9        2450.0  NULL    10
7788    SCOTT   ANALYST 7566    1987-4-19       3000.0  NULL    20
7839    KING    PRESIDENT       NULL    1981-11-17      5000.0  NULL    10
7844    TURNER  SALESMAN        7698    1981-9-8        1500.0  0.0     30
7876    ADAMS   CLERK   7788    1987-5-23       1100.0  NULL    20
7900    JAMES   CLERK   7698    1981-12-3       950.0   NULL    30
7902    FORD    ANALYST 7566    1981-12-3       3000.0  NULL    20
7934    MILLER  CLERK   7782    1982-1-23       1300.0  NULL    10
8888    HIVE    PROGRAM 7839    1988-1-23       10300.0 NULL    NULL
Time taken: 0.245 seconds, Fetched: 15 row(s)

你可能感兴趣的:(Hive)