hive小操作·关于 spark2.4-读取hive3.1事务表

版本信息:

spark2.4 hive3.1.1

异常情况:

使用 alter table * compact 'major'后,spark仍无法读取hive事务表中数据

具体流程如下

1、建表语句

create table sugon_transaction(id Int,name String) clustered by (name) into 3 buckets stored as orc TBLPROPERTIES ('transactional'='true');

2、查看事务表明细

desc formatted sugon_transaction

 

hive> desc formatted sugon_transaction;
OK
# col_name              data_type               comment
id                      int
name                    string

# Detailed Table Information
Database:               default
OwnerType:              USER
Owner:                  root
CreateTime:             Tue Aug 20 14:58:03 CST 2019
LastAccessTime:         UNKNOWN
Retention:              0
Location:               hdfs://localhost:9000/user/hive/warehouse/sugon_transaction
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"name\":\"true\"}}
        bucketing_version       2
        numFiles                0
        numRows                 0
        rawDataSize             0
        totalSize               0
        transactional           true
        transactional_properties        default
        transient_lastDdlTime   1566284283

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            3
Bucket Columns:         [name]
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
Time taken: 0.392 seconds, Fetched: 34 row(s)

 

3、插入数据

insert into table sugon_transaction values(100,'sugon');

4、查询数据

hive> select * from default.sugon_transaction; OK 100 sugon Time taken: 1.412 seconds, Fetched: 1 row(s)

5、Spark-SQL查询数据

无法查询,提示

WARN hive.HiveMetastoreCatalog: Unable to infer schema for table default.sugon_transaction from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.

scala> hiveContext.sql("select * from default.sugon_transaction").show()
WARN hive.HiveMetastoreCatalog: Unable to infer schema for table default.sugon_transaction from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.
+-------+----+
|     id|name|
+-------+----+
+-------+----

6、将表进行压缩

ALTER TABLE table_name COMPACT 'major';

先查看下执行压缩的任务

show compacttions;

hive> show compactions;
OK
CompactionId    Database        Table   Partition       Type    State   Worker  Start Time      Duration(ms)    HadoopJobId
1       myhive  test_transaction         ---    MAJOR   initiated        ---     ---     ---     ---
2       default hello_trancaction        ---    MAJOR   initiated        ---     ---     ---     ---
Time taken: 0.183 seconds, Fetched: 3 row(s)

在执行

ALTER TABLE sugon_transaction COMPACT 'major';
hive> ALTER TABLE sugon_transaction COMPACT 'major';
Compaction enqueued with id 3
OK
Time taken: 0.501 seconds

再次查询压缩列表

hive> show compactions;
OK
CompactionId    Database        Table   Partition       Type    State   Worker  Start Time      Duration(ms)    HadoopJobId
1       myhive  test_transaction         ---    MAJOR   initiated        ---     ---     ---     ---
2       default hello_trancaction        ---    MAJOR   initiated        ---     ---     ---     ---
3       default sugon_transaction        ---    MAJOR   initiated        ---     ---     ---     ---
Time taken: 0.032 seconds, Fetched: 4 row(s)

7、再次使用spark-SQL查询事务表

仍无法查出表数据

另无提示:

WARN hive.HiveMetastoreCatalog: Unable to infer schema for table default.sugon_transaction from file format ORC (inference mode: INFER_AND_SAVE). Using metastore schema.

scala> hiveContext.sql("select * from default.sugon_transaction").show()
+-------+----+
|     id|name|
+-------+----+
+-------+----+

8、查找原因

https://issues.apache.org/jira/browse/SPARK-15348

Spark does not support any feature of hive's transnational tables, you cannot use spark to delete/update a table and it also has problems reading the aggregated data when no compaction was done. Also it seems that compaction is not supported - alter table ... partition .... COMPACT 'major'

你可能感兴趣的:(HIVE小操作)