spark2.4 hive3.1.1
使用 alter table * compact 'major'后,spark仍无法读取hive事务表中数据
具体流程如下
create table sugon_transaction(id Int,name String) clustered by (name) into 3 buckets stored as orc TBLPROPERTIES ('transactional'='true');
desc formatted sugon_transaction
hive> desc formatted sugon_transaction;
OK
# col_name data_type comment
id int
name string
# Detailed Table Information
Database: default
OwnerType: USER
Owner: root
CreateTime: Tue Aug 20 14:58:03 CST 2019
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://localhost:9000/user/hive/warehouse/sugon_transaction
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"name\":\"true\"}}
bucketing_version 2
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transactional true
transactional_properties default
transient_lastDdlTime 1566284283
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed: No
Num Buckets: 3
Bucket Columns: [name]
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.392 seconds, Fetched: 34 row(s)
insert into table sugon_transaction values(100,'sugon');
hive> select * from default.sugon_transaction; OK 100 sugon Time taken: 1.412 seconds, Fetched: 1 row(s)
无法查询,提示
WARN hive.HiveMetastoreCatalog: Unable to infer schema for table default.sugon_transaction from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
scala> hiveContext.sql("select * from default.sugon_transaction").show()
WARN hive.HiveMetastoreCatalog: Unable to infer schema for table default.sugon_transaction from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
+-------+----+
| id|name|
+-------+----+
+-------+----
ALTER TABLE table_name COMPACT 'major';
先查看下执行压缩的任务
show compacttions;
hive> show compactions;
OK
CompactionId Database Table Partition Type State Worker Start Time Duration(ms) HadoopJobId
1 myhive test_transaction --- MAJOR initiated --- --- --- ---
2 default hello_trancaction --- MAJOR initiated --- --- --- ---
Time taken: 0.183 seconds, Fetched: 3 row(s)
在执行
ALTER TABLE sugon_transaction COMPACT 'major';
hive> ALTER TABLE sugon_transaction COMPACT 'major';
Compaction enqueued with id 3
OK
Time taken: 0.501 seconds
再次查询压缩列表
hive> show compactions;
OK
CompactionId Database Table Partition Type State Worker Start Time Duration(ms) HadoopJobId
1 myhive test_transaction --- MAJOR initiated --- --- --- ---
2 default hello_trancaction --- MAJOR initiated --- --- --- ---
3 default sugon_transaction --- MAJOR initiated --- --- --- ---
Time taken: 0.032 seconds, Fetched: 4 row(s)
仍无法查出表数据
另无提示:
WARN hive.HiveMetastoreCatalog: Unable to infer schema for table default.sugon_transaction from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
scala> hiveContext.sql("select * from default.sugon_transaction").show()
+-------+----+
| id|name|
+-------+----+
+-------+----+
https://issues.apache.org/jira/browse/SPARK-15348
Spark does not support any feature of hive's transnational tables, you cannot use spark to delete/update a table and it also has problems reading the aggregated data when no compaction was done. Also it seems that compaction is not supported - alter table ... partition .... COMPACT 'major'