Hive Outline - Part II (Architecture, SerDe)

Hive Architecture

1. Metastore service, 提供元数据服务,存储可以选择Derby,Mysql, 等其他数据库。

2. HiveServer,Thrift服务,

    HiveServer1 deprecated.HiveServer2 提供了如下更新:

  • HiveServer2 Thrift API spec

  • JDBC/ODBC HiveServer2 drivers

  • Concurrent Thrift clients with memory leak fixes and session/config info

  • Kerberos authentication

  • Authorization to improve GRANT/ROLE and code injection vectors

3. Driver,也叫Query Engine,这是一个核心服务,包括了HQL的Compiler,Optimizer,Executor等核心功能。


其它都是应用端如,cli,beepline, hivejar,hwi等。

Hive Outline - Part II (Architecture, SerDe)_第1张图片

Figure 1

Hive配置

优先级从高到低.

1. Hive SET

2. Hive -hiveconf

3. hive-site.xml

4. hive-default.xml

5. hadoop-size.xml

6. hadoop-default.xml


Hive Datatype:

TINYINT,SMALINT,INT,BIGINT,FLOAT,DOUBLE,BOOLEAN,STRING

1,2,4,8,4,8,true/false


ARRAY,MAP,STRUCT,see reference.


Hive Function

see online help, a lot of built-in functions, like CAST().


Hive Commands

常用的

1. SHOW TABLES;

2. SHOW FUNCTIONS;

3. DESCRIBE EXTENDED <tablename>

4. DESCRIBE FORMATTED <tablename>

5. SHOW DATABASES;

6. SET; Set var; //to show default values;


Table and Partition:

Managed Table and External Table

Partition


Storage format:

Row Fomat Delimitted 
   Fields Terminated By '\001'
   Collection Items Terminated by '\002'
   MAP KEYS TERMINATED BY '\003'
   Lines Terminated By '\n' Stored As TextFile;


SequenceFile, from Hadoop,for order and key-value, splittable shrink. Stored As Sequencefile.

RCFile, from Hive, Columnar File, Make Row Split, and store by column. this is best to access a small part of data


可以自己开发这些SerDe,InputFormat, OutputFormat, 比如:

ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

SerDe:

SerDe jar description
LazySimpleSerDe org.apace.hadoop.hive.serde2.lazy default SerDe, TextFile, Lazy access.
LazyBinarySerDe org.apache.hadoop.hive.serde2.lazybinary better performance, lazy access, used internal already
BinarySortableSerDe org.apache.hadoop.hive.serde2.binarysortable optimized for sort. capacity bettween above two.
ColumnarSerDe org.apache.hadoop.hive.serde2.columnar LazySimpleSerDe based on RCFile.
RegexSerDe org.apache.hadoop.hive.contrib.serde2 apply regular expression on text line, good on log files, normal performance.
ThriftByteStreamTypedSerDe org.apache.hadoop.hive.serde2.thrift Read/write Thrift encoded binrary.
HBaseSerDe org.apache.hadoop.hive.hbase Read/write Hbase data.


总结

可以自己创建SerDe, InputFormat, OutputFormat,然后和自己的已有系统,进行数据集成。

你可能感兴趣的:(HQL,hadoop,hive,SerDe)