HIVE快速入门教程4Hive数据类型和创建,删除数据库

Hive数据类型和创建,删除数据库

Hive中的数据类型

数据类型是Hive查询语言和数据建模中非常重要的元素。 要定义表列类型,我们必须了解数据类型及其用法。

以下简要概述了Hive中的一些数据类型:

这些是

  • 数值类型:Numeric
  • 字符串类型:String
  • 日期/时间类型:Date/Time
  • 复杂类型:Complex
Type Memory allocation 例子
TINY INT 它的1字节有符号整数(-128到127) 10Y
SMALL INT 2字节有符号整数(-32768到32767) 10S
INT 4字节有符号整数(-2,147,484,648到2,147,484,647) 10
BIG INT 8字节有符号整数 100L
FLOAT 4字节单精度浮点数 1.2345679
DOUBLE 8字节双精度浮点数 1.2345678901234567
DECIMAL We can define precision and scale in this Type

String Types:

Type Length 例子
CHAR 255 'US' or "US"
VARCHAR 1 to 65355
STRING 我们可以在这里定义长度(无限制) "Books" or 'Books'
BINARY 只能和STRING互相CAST 1011
BOOLEAN TRUE or FALSE TRUE

Date/Time Types:

Type Usage 例子
Timestamp 支持具有可选纳秒精度的传统Unix时间戳 2019-01-01
Date 它采用YYYY-MM-DD格式。Date类型支持的值范围是0000-01-01到9999-12-31,具体取决于原始Java Date类型的支持 2019-01-01 12:00:01.345

Complex Types:

Type Usage 例子
Arrays ARRAY不允许使用负值和非常量表达式 [ "apple","orange","mango" ]
Maps MAP 不允许使用负值和非常量表达式 {1: "apple",2: "orange"}
Struct STRUCT {1, "apple"}
NAMED Struct STRUCT {"apple":"gala","weightkg":1}
Union UNIONTYPE 不常用 {2:["apple","orange"]}

参考资料

  • python测试开发项目实战-目录
  • python工具书籍下载-持续更新
  • python 3.7极速入门教程 - 目录
  • 讨论qq群630011153 144081101
  • 原文地址
  • 本文涉及的python测试开发库 谢谢点赞!
  • [本文相关海量书籍下载](https://github.com/china-testing/python-api-tesing/blob/master/books.md

在Hive中创建和删除数据库:

创建数据库:

要在Hive shell中创建数据库,我们必须使用如下语法所示的命令: -

句法:

Create database

示例: create database guru99

类似的有drop,此处的语法和mysql及其相似。

hive> show databases;
OK
default
Time taken: 2.723 seconds, Fetched: 1 row(s)
hive> create database guru99
    > ;
OK
Time taken: 0.656 seconds
hive> show databases;
OK
default
guru99
Time taken: 0.068 seconds, Fetched: 2 row(s)
hive> drop database guru99;
OK
Time taken: 0.848 seconds
hive> show databases;
OK
default
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> 

数据类型演示实例

employee.txt

Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer�Lead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead

Row Delimiter: This can be used with Ctrl + A or ^ A (use \001 when creating the table)
Collection Item Delimiter: This can be used with Ctrl + B or ^ B (\002)
Map Key Delimiter: This can be used with Ctrl + C or ^ C (\003)

If the delimiter is overridden during the table creation, it only works when used in the flat structure. This is still a limitation in Hive described in Apache Jira Hive-365 ( https://issues.apach e.org/jira/browse/HIVE-365 ). For nested types, the level of nesting determines the delimiter. Using ARRAY of ARRAY as an example, the delimiters for the outer ARRAY , as expected, are Ctrl + B characters, but the inner ARRAY delimiter becomes Ctrl + C characters, which is the next delimiter in the list. In the preceding example, the depart_title column, which is a MAP of ARRAY , the MAP key delimiter is Ctrl + C, and the ARRAY delimiter is Ctrl + D.

执行

    > CREATE TABLE employee (name STRING, work_place ARRAY, gender_age STRUCT, skills_score MAP, depart_title MAP>)
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' STORED AS TEXTFILE;
OK
> LOAD DATA INPATH '/user/hduser/employee.txt' OVERWRITE INTO TABLE employee;
Loading data to table default.employee
OK
Time taken: 2.602 seconds
hive> SELECT work_place FROM employee;
OK
["Montreal","Toronto"]
["Montreal"]
["New York"]
["Vancouver"]
Time taken: 5.808 seconds, Fetched: 4 row(s)
hive> SELECT work_place[0] as col_1, work_place[1] as col_2, work_place[2] as col_3 FROM employee;
OK
Montreal    Toronto NULL
Montreal    NULL    NULL
New York    NULL    NULL
Vancouver   NULL    NULL
Time taken: 1.376 seconds, Fetched: 4 row(s)
hive> 
    > SELECT gender_age FROM employee;
OK
{"gender":"Male","age":30}
{"gender":"Male","age":35}
{"gender":"Female","age":27}
{"gender":"Female","age":57}
Time taken: 0.399 seconds, Fetched: 4 row(s)
hive> 
    > SELECT gender_age.gender, gender_age.age FROM employee;
OK
Male    30
Male    35
Female  27
Female  57
Time taken: 0.369 seconds, Fetched: 4 row(s)
hive> 
    > SELECT skills_score FROM employee;
OK
{"DB":80}
{"Perl":85}
{"Python":80}
{"Sales":89,"HR":94}
Time taken: 0.347 seconds, Fetched: 4 row(s)
hive> 
    > SELECT skills_score FROM employee;
OK
{"DB":80}
{"Perl":85}
{"Python":80}
{"Sales":89,"HR":94}
Time taken: 0.382 seconds, Fetched: 4 row(s)
hive> 
    > SELECT name, skills_score['DB'] as DB, skills_score['Perl'] as Perl, skills_score['Python'] as Python, skills_score['Sales'] as Sales, skills_score['HR'] as HR FROM employee;
OK
Michael 80  NULL    NULL    NULL    NULL
Will    NULL    85  NULL    NULL    NULL
Shelley NULL    NULL    80  NULL    NULL
Lucy    NULL    NULL    NULL    89  94
Time taken: 0.447 seconds, Fetched: 4 row(s)
hive> 
    > 
    > SELECT depart_title FROM employee;
OK
{"Product":["Developer","Lead"]}
{"Product":["Lead"],"Test":["Lead"]}
{"Test":["Lead"],"COE":["Architect"]}
{"Sales":["Lead"]}
Time taken: 0.329 seconds, Fetched: 4 row(s)
hive> 
    > 
    > SELECT name, depart_title['Product'] as Product, depart_title['Test'] as Test, depart_title['COE'] as COE, depart_title['Sales'] as Sales FROM employee;
OK
Michael ["Developer","Lead"]    NULL    NULL    NULL
Will    ["Lead"]    ["Lead"]    NULL    NULL
Shelley NULL    ["Lead"]    ["Architect"]   NULL
Lucy    NULL    NULL    NULL    ["Lead"]
Time taken: 0.322 seconds, Fetched: 4 row(s)
hive> SELECT name, depart_title['Product'][0] as product_col0, depart_title['Test'][0] as test_col0 FROM employee;
OK
Michael Developer   NULL
Will    Lead    Lead
Shelley NULL    Lead
Lucy    NULL    NULL
Time taken: 0.335 seconds, Fetched: 4 row(s)

类型转换

从窄类型到更宽类型的原始类型转换称为隐式转换。但是,不允许进行反向转换。所有整数数字类型FLOAT和STRING都可以隐式转换DOUBLE,TINYINT,SMALLINT和INT都可以转换为FLOAT。 BOOLEAN类型无法转换为任何其他类型。更多参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types。

显式类型转换使用CAST函数和CAST(值为TYPE)语法。例如,CAST('100' as INT)将100字符串转换为100整数值。如果强制转换失败,例如CAST('INT' as
INT),则该函数返回NULL。

此外,BINARY类型只能首先转换为STRING,然后根据需要从STRING转换为其他类型。

更多数据库操作

hive> 
    > 
    > CREATE DATABASE myhivebook;
OK
Time taken: 0.323 seconds
hive> CREATE SCHEMA IF NOT EXISTS myhivebook;
OK
Time taken: 0.07 seconds
hive> CREATE DATABASE IF NOT EXISTS myhivebook COMMENT 'hive database demo' LOCATION '/hdfs/directory' WITH DBPROPERTIES ('creator'='dayongd','date'='2018-05-01');
OK
Time taken: 0.05 seconds
hive> SHOW CREATE DATABASE default;
OK
CREATE DATABASE `default`
COMMENT
  'Default Hive database'
LOCATION
  'hdfs://localhost:54310/user/hive/warehouse'
Time taken: 0.07 seconds, Fetched: 5 row(s)
hive> SHOW DATABASES;
OK
default
myhivebook
Time taken: 0.055 seconds, Fetched: 2 row(s)
hive> SHOW DATABASES LIKE 'my.*';
OK
myhivebook
Time taken: 0.165 seconds, Fetched: 1 row(s)
hive> USE myhivebook;
OK
Time taken: 0.059 seconds
hive> SELECT current_database();
OK
myhivebook
Time taken: 0.244 seconds, Fetched: 1 row(s)
hive> DROP DATABASE IF EXISTS myhivebook;
OK
Time taken: 0.492 seconds
hive> DROP DATABASE IF EXISTS myhivebook CASCADE;
OK
Time taken: 0.042 seconds
hive> ALTER DATABASE default SET DBPROPERTIES ('edited-by'='Dayong');

hive> ALTER DATABASE default SET OWNER user dayongd;

hive> ALTER DATABASE default SET LOCATION '/tmp/data/default';

从Hive v2.2.1开始,ALTER DATABASE ... SET LOCATION语句可用于修改数据库的位置,但它不会将当前数据库目录中的所有现有表/分区移动到新指定的位置。它只会在更改数据库后更改新添加的表的位置。此行为类似于更改表目录不会将现有分区移动到其他位置的方式。
Hive中的SHOW和DESC(或DESCRIBE)语句用于显示大多数对象的定义,例如表和分区。 SHOW语句支持各种Hiveobject,例如表,表的属性,表DDL,索引,分区,列,函数,锁,角色,配置,事务和压缩。 DESC语句支持少量Hive对象,例如数据库,表,视图,列和分区。
但是,DESC声明能够提供与EXTENDED或FORMATTED关键字相结合的更详细信息。

数据定义语言

Hive的数据定义语言(DDL)是通过创建,删除,描述Hive数据结构的HQL语句的子集。或者改变模式对象,例如数据库,表,视图,分区和存储桶。大多数DDL语句以CREATE,DROP或ALTER关键字开头。 HQL DDL的语法与SQL DDL非常相似。

Database

Hive中的数据库描述了用于类似目的或属于相同组的表的集合。如果未指定数据库,则使用默认数据库,并使用HDFS中的/user/hive/warehouse作为其根目录。此路径可由hive-site.xml中的hive.metastore.warehouse.dir属性配置。无论何时创建新数据库,Hive都会在/user/hive/warehouse下为每个数据库创建一个新目录。例如,myhivebook数据库位于/user/hive/datawarehouse/myhivebook.db。此外,DATABASE有一个名称别名,SCHEMA,这意味着它们在HQL中是相同的。

你可能感兴趣的:(HIVE快速入门教程4Hive数据类型和创建,删除数据库)