易筋经Hive——Hive创建数据库、数据表及插入数据

转载请注明出处:http://blog.csdn.net/dongdong9223/article/details/86438951
本文出自【我是干勾鱼的博客】

Ingredients:

  • Java:Java SE Development Kit 8u162(Oracle Java Archive),Linux下安装JDK修改环境变量

  • Hadoop:hadoop-2.9.1.tar.gz(Apache Hadoop Releases Downloads, All previous releases of Hadoop are available from the Apache release archive site)

  • Hive:hive-2.3.4(mirrors.tuna.tsinghua.edu.cn,Mirror site for Hive)

之前在易筋经Hive——Hive安装及简单使用中讲解了Hive的环境搭建及简单操作方法,今天来讲解一下Hive中数据库、数据表以及数据信息的插入。

Hive具体数据的操作可以在官网User Documentation中找到。

1 启动Hive服务器

运行hiveserver2命令:

hiveserver2

2 创建数据库、数据表

在UserInfo.hql文件中保存HQL脚本命令,内容如下:

create database if not exists mydata;

use mydata;

drop table if exists UserInfo;

create table UserInfo(
	id			int,					--id
	name		string,					--姓名
    hobby		array,			--个人爱好
    address		map		--地址信息
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':';

hive命令导入数据:

# hive -f UserInfo.hql 
which: no hbase in (/opt/hive/apache-hive-2.3.4-bin/bin:/opt/java/jdk1.8.0_162/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
OK
Time taken: 5.497 seconds
OK
Time taken: 0.026 seconds
OK
Time taken: 0.118 seconds
OK
Time taken: 0.669 seconds

进入Hive查看:

3 上传数据文件到HDFS

否则在插入数据的时候会出现权限的问题。

在HDFS上创建一个文件夹:

/opt/hadoop/hadoop-2.9.1/bin/hadoop fs -mkdir -p /tmp/myfile/

可以使用Hue文件浏览器,上传本地csv文件。假设路径为/tmp/hive, 文件名为: UserInfo.csv。或者直接使用命令上传文件:

/opt/hadoop/hadoop-2.9.1/bin/hadoop fs -put /opt/git/WebCrawler/UserInfo.csv /tmp/myfile/

给文件夹授权:

/opt/hadoop/hadoop-2.9.1/bin/hadoop fs -chmod 755 /tmp/myfile

查看:

# /opt/hadoop/hadoop-2.9.1/bin/hadoop fs -ls /tmp/myfile/
Found 1 items
-rw-r--r--   1 root supergroup        146 2019-01-13 22:48 /tmp/myfile/UserInfo.csv

能够看到csv文件已经保存到HDFS中了。

4 插入数据

4.1 关于beeline命令

查看beeline命令的用法:

beeline -help

能够看出其用法如下:

Usage: java org.apache.hive.cli.beeline.BeeLine 
   -u <database url>               the JDBC URL to connect to
   -r                              reconnect to last saved connect url (in conjunction with !save)
   -n <username>                   the username to connect as
   -p <password>                   the password to connect as
   -d <driver class>               the driver class to use
   -i <init file>                  script file for initialization
   -e <query>                      query that should be executed
   -f <exec file>                  script file that should be executed
   -w (or) --password-file <password file>  the password file to read password from
   --hiveconf property=value       Use value for given property
   --hivevar name=value            hive variable name and value
                                   This is Hive specific settings in which variables
                                   can be set at session level and referenced in Hive
                                   commands or queries.
   --property-file=<property-file> the file to read connection properties (url, driver, user, password) from
   --color=[true/false]            control whether color is used for display
   --showHeader=[true/false]       show column names in query results
   --headerInterval=ROWS;          the interval between which heades are displayed
   --fastConnect=[true/false]      skip building table/column list for tab-completion
   --autoCommit=[true/false]       enable/disable automatic transaction commit
   --verbose=[true/false]          show verbose error messages and debug info
   --showWarnings=[true/false]     display connection warnings
   --showDbInPrompt=[true/false]   display the current database name in the prompt
   --showNestedErrs=[true/false]   display nested errors
   --numberFormat=[pattern]        format numbers using DecimalFormat pattern
   --force=[true/false]            continue running script even after errors
   --maxWidth=MAXWIDTH             the maximum width of the terminal
   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns
   --silent=[true/false]           be more silent
   --autosave=[true/false]         automatically save preferences
   --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv]  format mode for result display
                                   Note that csv, and tsv are deprecated - use csv2, tsv2 instead
   --incremental=[true/false]      Defaults to false. When set to false, the entire result set
                                   is fetched and buffered before being displayed, yielding optimal
                                   display column sizing. When set to true, result rows are displayed
                                   immediately as they are fetched, yielding lower latency and
                                   memory usage at the price of extra display column padding.
                                   Setting --incremental=true is recommended if you encounter an OutOfMemory
                                   on the client side (due to the fetched result set size being large).
                                   Only applicable if --outputformat=table.
   --incrementalBufferRows=NUMROWS the number of rows to buffer when printing rows on stdout,
                                   defaults to 1000; only applicable if --incremental=true
                                   and --outputformat=table
   --truncateTable=[true/false]    truncate table column when it exceeds length
   --delimiterForDSV=DELIMITER     specify the delimiter for delimiter-separated values output format (default: |)
   --isolation=LEVEL               set the transaction isolation level
   --nullemptystring=[true/false]  set to true to get historic behavior of printing null as empty string
   --maxHistoryRows=MAXHISTORYROWS The maximum number of rows to store beeline history.
   --help                          display this message
 
   Example:
    1. Connect using simple authentication to HiveServer2 on localhost:10000
    $ beeline -u jdbc:hive2://localhost:10000 username password

    2. Connect using simple authentication to HiveServer2 on hs.local:10000 using -n for username and -p for password
    $ beeline -n username -p password -u jdbc:hive2://hs2.local:10012

    3. Connect using Kerberos authentication with hive/[email protected] as HiveServer2 principal
    $ beeline -u "jdbc:hive2://hs2.local:10013/default;principal=hive/[email protected]"

    4. Connect using SSL connection to HiveServer2 on localhost at 10000
    $ beeline "jdbc:hive2://localhost:10000/default;ssl=true;sslTrustStore=/usr/local/truststore;trustStorePassword=mytruststorepassword"

    5. Connect using LDAP authentication
    $ beeline -u jdbc:hive2://hs2.local:10013/default <ldap-username> <ldap-password>

能够看出:

  • -u:指定连接路径
  • -n:指定用户名
  • -p:指定密码

注意这里的用户名就是元数据所在数据库(比如MySQL)下的用户(比如root)。如果你没有在连接时用户名、密码,则默认使用匿名用户登录的,权限会受到影响。

如果你非要使用匿名用户登录,还不想受到操作权限的影响,则可以在Hadoop的hdfs-site.xml文件中加入:


        dfs.permissions
        false
 

4.1 连接服务器

在客户端连接Hive服务器,运行命令:

beeline -u jdbc:hive2://localhost:10000 -nroot -p123456

如下:

[root@shizhi002 ~]# beeline -u jdbc:hive2://localhost:10000 -nroot -p123456
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.4 by Apache Hive

4.2 插入数据

插入数据命令:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
  • LOCAL:是否使用本地数据。注意!如果是加载HDFS上文件数据到表中去,实质是将HDFS上文件move到对应的Hive表中的目录下。

  • OVERWRITE:是否覆盖数据表中的数据。

4.2.1 使用HDFS上的数据

0: jdbc:hive2://localhost:10000> load data inpath '/user/root/myfile/UserInfo.csv' overwrite into table mydata.UserInfo;
No rows affected (2.411 seconds)

**注意!**这个命令实际上是将HDFS上文件move到对应的Hive表中的目录下,原来HDFS上的csv文件就没有了。

4.2.1 使用本地数据

0: jdbc:hive2://localhost:10000> load data local inpath '/tmp/UserInfo.csv' overwrite into table mydata.UserInfo;
No rows affected (0.59 seconds)

4.3 查看数据:

0: jdbc:hive2://localhost:10000> select * from userinfo;
+--------------+----------------+-----------------------+---------------------------------------------+
| userinfo.id  | userinfo.name  |    userinfo.hobby     |              userinfo.address               |
+--------------+----------------+-----------------------+---------------------------------------------+
| 1            | xiaoming       | ["book","TV","code"]  | {"beijing":"chaoyang","shagnhai":"pudong"}  |
| 2            | lilei          | ["book","code"]       | {"nanjing":"jiangning","taiwan":"taibei"}   |
| 3            | lihua          | ["music","book"]      | {"heilongjiang":"haerbin"}                  |
+--------------+----------------+-----------------------+---------------------------------------------+
3 rows selected (1.411 seconds)

参考

易筋经Hive——Hive安装及简单使用

Hive_ Hive 建表语句详解

csv文件数据导入到hive操作说明

hive建表与数据的导入导出

Permission denied: user=administrator, access=WRITE, inode="/":root:supergroup:drwxr-xr-x

你可能感兴趣的:(Hive)