Hive数据类型和创建,删除数据库
Hive中的数据类型
数据类型是Hive查询语言和数据建模中非常重要的元素。 要定义表列类型,我们必须了解数据类型及其用法。
以下简要概述了Hive中的一些数据类型:
这些是
- 数值类型:Numeric
- 字符串类型:String
- 日期/时间类型:Date/Time
- 复杂类型:Complex
Type | Memory allocation | 例子 |
---|---|---|
TINY INT | 它的1字节有符号整数(-128到127) | 10Y |
SMALL INT | 2字节有符号整数(-32768到32767) | 10S |
INT | 4字节有符号整数(-2,147,484,648到2,147,484,647) | 10 |
BIG INT | 8字节有符号整数 | 100L |
FLOAT | 4字节单精度浮点数 | 1.2345679 |
DOUBLE | 8字节双精度浮点数 | 1.2345678901234567 |
DECIMAL | We can define precision and scale in this Type |
String Types:
Type | Length | 例子 |
---|---|---|
CHAR | 255 | 'US' or "US" |
VARCHAR | 1 to 65355 | |
STRING | 我们可以在这里定义长度(无限制) | "Books" or 'Books' |
BINARY | 只能和STRING互相CAST | 1011 |
BOOLEAN | TRUE or FALSE | TRUE |
Date/Time Types:
Type | Usage | 例子 |
---|---|---|
Timestamp | 支持具有可选纳秒精度的传统Unix时间戳 | 2019-01-01 |
Date | 它采用YYYY-MM-DD格式。Date类型支持的值范围是0000-01-01到9999-12-31,具体取决于原始Java Date类型的支持 | 2019-01-01 12:00:01.345 |
Complex Types:
Type | Usage | 例子 |
---|---|---|
Arrays | ARRAY |
[ "apple","orange","mango" ] |
Maps | MAP |
{1: "apple",2: "orange"} |
Struct | STRUCT |
{1, "apple"} |
NAMED Struct | STRUCT |
{"apple":"gala","weightkg":1} |
Union | UNIONTYPE |
{2:["apple","orange"]} |
参考资料
- python测试开发项目实战-目录
- python工具书籍下载-持续更新
- python 3.7极速入门教程 - 目录
- 讨论qq群630011153 144081101
- 原文地址
- 本文涉及的python测试开发库 谢谢点赞!
- [本文相关海量书籍下载](https://github.com/china-testing/python-api-tesing/blob/master/books.md
在Hive中创建和删除数据库:
创建数据库:
要在Hive shell中创建数据库,我们必须使用如下语法所示的命令: -
句法:
Create database
示例: create database guru99
类似的有drop,此处的语法和mysql及其相似。
hive> show databases;
OK
default
Time taken: 2.723 seconds, Fetched: 1 row(s)
hive> create database guru99
> ;
OK
Time taken: 0.656 seconds
hive> show databases;
OK
default
guru99
Time taken: 0.068 seconds, Fetched: 2 row(s)
hive> drop database guru99;
OK
Time taken: 0.848 seconds
hive> show databases;
OK
default
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive>
数据类型演示实例
employee.txt
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer�Lead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
Row Delimiter: This can be used with Ctrl + A or ^ A (use \001 when creating the table)
Collection Item Delimiter: This can be used with Ctrl + B or ^ B (\002)
Map Key Delimiter: This can be used with Ctrl + C or ^ C (\003)
If the delimiter is overridden during the table creation, it only works when used in the flat structure. This is still a limitation in Hive described in Apache Jira Hive-365 ( https://issues.apach e.org/jira/browse/HIVE-365 ). For nested types, the level of nesting determines the delimiter. Using ARRAY of ARRAY as an example, the delimiters for the outer ARRAY , as expected, are Ctrl + B characters, but the inner ARRAY delimiter becomes Ctrl + C characters, which is the next delimiter in the list. In the preceding example, the depart_title column, which is a MAP of ARRAY , the MAP key delimiter is Ctrl + C, and the ARRAY delimiter is Ctrl + D.
执行
> CREATE TABLE employee (name STRING, work_place ARRAY, gender_age STRUCT, skills_score MAP, depart_title MAP>)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' STORED AS TEXTFILE;
OK
> LOAD DATA INPATH '/user/hduser/employee.txt' OVERWRITE INTO TABLE employee;
Loading data to table default.employee
OK
Time taken: 2.602 seconds
hive> SELECT work_place FROM employee;
OK
["Montreal","Toronto"]
["Montreal"]
["New York"]
["Vancouver"]
Time taken: 5.808 seconds, Fetched: 4 row(s)
hive> SELECT work_place[0] as col_1, work_place[1] as col_2, work_place[2] as col_3 FROM employee;
OK
Montreal Toronto NULL
Montreal NULL NULL
New York NULL NULL
Vancouver NULL NULL
Time taken: 1.376 seconds, Fetched: 4 row(s)
hive>
> SELECT gender_age FROM employee;
OK
{"gender":"Male","age":30}
{"gender":"Male","age":35}
{"gender":"Female","age":27}
{"gender":"Female","age":57}
Time taken: 0.399 seconds, Fetched: 4 row(s)
hive>
> SELECT gender_age.gender, gender_age.age FROM employee;
OK
Male 30
Male 35
Female 27
Female 57
Time taken: 0.369 seconds, Fetched: 4 row(s)
hive>
> SELECT skills_score FROM employee;
OK
{"DB":80}
{"Perl":85}
{"Python":80}
{"Sales":89,"HR":94}
Time taken: 0.347 seconds, Fetched: 4 row(s)
hive>
> SELECT skills_score FROM employee;
OK
{"DB":80}
{"Perl":85}
{"Python":80}
{"Sales":89,"HR":94}
Time taken: 0.382 seconds, Fetched: 4 row(s)
hive>
> SELECT name, skills_score['DB'] as DB, skills_score['Perl'] as Perl, skills_score['Python'] as Python, skills_score['Sales'] as Sales, skills_score['HR'] as HR FROM employee;
OK
Michael 80 NULL NULL NULL NULL
Will NULL 85 NULL NULL NULL
Shelley NULL NULL 80 NULL NULL
Lucy NULL NULL NULL 89 94
Time taken: 0.447 seconds, Fetched: 4 row(s)
hive>
>
> SELECT depart_title FROM employee;
OK
{"Product":["Developer","Lead"]}
{"Product":["Lead"],"Test":["Lead"]}
{"Test":["Lead"],"COE":["Architect"]}
{"Sales":["Lead"]}
Time taken: 0.329 seconds, Fetched: 4 row(s)
hive>
>
> SELECT name, depart_title['Product'] as Product, depart_title['Test'] as Test, depart_title['COE'] as COE, depart_title['Sales'] as Sales FROM employee;
OK
Michael ["Developer","Lead"] NULL NULL NULL
Will ["Lead"] ["Lead"] NULL NULL
Shelley NULL ["Lead"] ["Architect"] NULL
Lucy NULL NULL NULL ["Lead"]
Time taken: 0.322 seconds, Fetched: 4 row(s)
hive> SELECT name, depart_title['Product'][0] as product_col0, depart_title['Test'][0] as test_col0 FROM employee;
OK
Michael Developer NULL
Will Lead Lead
Shelley NULL Lead
Lucy NULL NULL
Time taken: 0.335 seconds, Fetched: 4 row(s)
类型转换
从窄类型到更宽类型的原始类型转换称为隐式转换。但是,不允许进行反向转换。所有整数数字类型FLOAT和STRING都可以隐式转换DOUBLE,TINYINT,SMALLINT和INT都可以转换为FLOAT。 BOOLEAN类型无法转换为任何其他类型。更多参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types。
显式类型转换使用CAST函数和CAST(值为TYPE)语法。例如,CAST('100' as INT)将100字符串转换为100整数值。如果强制转换失败,例如CAST('INT' as
INT),则该函数返回NULL。
此外,BINARY类型只能首先转换为STRING,然后根据需要从STRING转换为其他类型。
更多数据库操作
hive>
>
> CREATE DATABASE myhivebook;
OK
Time taken: 0.323 seconds
hive> CREATE SCHEMA IF NOT EXISTS myhivebook;
OK
Time taken: 0.07 seconds
hive> CREATE DATABASE IF NOT EXISTS myhivebook COMMENT 'hive database demo' LOCATION '/hdfs/directory' WITH DBPROPERTIES ('creator'='dayongd','date'='2018-05-01');
OK
Time taken: 0.05 seconds
hive> SHOW CREATE DATABASE default;
OK
CREATE DATABASE `default`
COMMENT
'Default Hive database'
LOCATION
'hdfs://localhost:54310/user/hive/warehouse'
Time taken: 0.07 seconds, Fetched: 5 row(s)
hive> SHOW DATABASES;
OK
default
myhivebook
Time taken: 0.055 seconds, Fetched: 2 row(s)
hive> SHOW DATABASES LIKE 'my.*';
OK
myhivebook
Time taken: 0.165 seconds, Fetched: 1 row(s)
hive> USE myhivebook;
OK
Time taken: 0.059 seconds
hive> SELECT current_database();
OK
myhivebook
Time taken: 0.244 seconds, Fetched: 1 row(s)
hive> DROP DATABASE IF EXISTS myhivebook;
OK
Time taken: 0.492 seconds
hive> DROP DATABASE IF EXISTS myhivebook CASCADE;
OK
Time taken: 0.042 seconds
hive> ALTER DATABASE default SET DBPROPERTIES ('edited-by'='Dayong');
hive> ALTER DATABASE default SET OWNER user dayongd;
hive> ALTER DATABASE default SET LOCATION '/tmp/data/default';
从Hive v2.2.1开始,ALTER DATABASE ... SET LOCATION语句可用于修改数据库的位置,但它不会将当前数据库目录中的所有现有表/分区移动到新指定的位置。它只会在更改数据库后更改新添加的表的位置。此行为类似于更改表目录不会将现有分区移动到其他位置的方式。
Hive中的SHOW和DESC(或DESCRIBE)语句用于显示大多数对象的定义,例如表和分区。 SHOW语句支持各种Hiveobject,例如表,表的属性,表DDL,索引,分区,列,函数,锁,角色,配置,事务和压缩。 DESC语句支持少量Hive对象,例如数据库,表,视图,列和分区。
但是,DESC声明能够提供与EXTENDED或FORMATTED关键字相结合的更详细信息。
数据定义语言
Hive的数据定义语言(DDL)是通过创建,删除,描述Hive数据结构的HQL语句的子集。或者改变模式对象,例如数据库,表,视图,分区和存储桶。大多数DDL语句以CREATE,DROP或ALTER关键字开头。 HQL DDL的语法与SQL DDL非常相似。
Database
Hive中的数据库描述了用于类似目的或属于相同组的表的集合。如果未指定数据库,则使用默认数据库,并使用HDFS中的/user/hive/warehouse作为其根目录。此路径可由hive-site.xml中的hive.metastore.warehouse.dir属性配置。无论何时创建新数据库,Hive都会在/user/hive/warehouse下为每个数据库创建一个新目录。例如,myhivebook数据库位于/user/hive/datawarehouse/myhivebook.db。此外,DATABASE有一个名称别名,SCHEMA,这意味着它们在HQL中是相同的。