1. Metastore service, 提供元数据服务,存储可以选择Derby,Mysql, 等其他数据库。
2. HiveServer,Thrift服务,
HiveServer1 deprecated.HiveServer2 提供了如下更新:
HiveServer2 Thrift API spec
JDBC/ODBC HiveServer2 drivers
Concurrent Thrift clients with memory leak fixes and session/config info
Kerberos authentication
Authorization to improve GRANT/ROLE and code injection vectors
3. Driver,也叫Query Engine,这是一个核心服务,包括了HQL的Compiler,Optimizer,Executor等核心功能。
其它都是应用端如,cli,beepline, hivejar,hwi等。
Figure 1
优先级从高到低.
1. Hive SET
2. Hive -hiveconf
3. hive-site.xml
4. hive-default.xml
5. hadoop-size.xml
6. hadoop-default.xml
TINYINT,SMALINT,INT,BIGINT,FLOAT,DOUBLE,BOOLEAN,STRING
1,2,4,8,4,8,true/false
ARRAY,MAP,STRUCT,see reference.
see online help, a lot of built-in functions, like CAST().
常用的
1. SHOW TABLES;
2. SHOW FUNCTIONS;
3. DESCRIBE EXTENDED <tablename>
4. DESCRIBE FORMATTED <tablename>
5. SHOW DATABASES;
6. SET; Set var; //to show default values;
Managed Table and External Table
Partition
Row Fomat Delimitted Fields Terminated By '\001' Collection Items Terminated by '\002' MAP KEYS TERMINATED BY '\003' Lines Terminated By '\n' Stored As TextFile;
SequenceFile, from Hadoop,for order and key-value, splittable shrink. Stored As Sequencefile.
RCFile, from Hive, Columnar File, Make Row Split, and store by column. this is best to access a small part of data
可以自己开发这些SerDe,InputFormat, OutputFormat, 比如:
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde' STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
SerDe | jar | description |
LazySimpleSerDe | org.apace.hadoop.hive.serde2.lazy | default SerDe, TextFile, Lazy access. |
LazyBinarySerDe | org.apache.hadoop.hive.serde2.lazybinary | better performance, lazy access, used internal already |
BinarySortableSerDe | org.apache.hadoop.hive.serde2.binarysortable | optimized for sort. capacity bettween above two. |
ColumnarSerDe | org.apache.hadoop.hive.serde2.columnar | LazySimpleSerDe based on RCFile. |
RegexSerDe | org.apache.hadoop.hive.contrib.serde2 | apply regular expression on text line, good on log files, normal performance. |
ThriftByteStreamTypedSerDe | org.apache.hadoop.hive.serde2.thrift | Read/write Thrift encoded binrary. |
HBaseSerDe | org.apache.hadoop.hive.hbase | Read/write Hbase data. |
可以自己创建SerDe, InputFormat, OutputFormat,然后和自己的已有系统,进行数据集成。