Trafodion 使用BLOB存储非结构化/半结构化数据

Trafodion目前支持LOB类型,包括BLOB/CLOB,BLOB(Binary Large Object),主要用于存储非结构化数据,如图片、音频等,CLOB(Character Large Object),主要用于半结构化数据,如大文本,大字符串等。参考Apache Trafodion官网http://trafodion.apache.org/docs/lob_guide/index.html,可以详细了解LOB特性及使用方法。

LOB如何存储

LOB本身的数据存储在一个单独的HDFS文件,保存在/user/trafodion/lobs这个HDFS目录下,对应的Trafodion表中对每个LOB值存储一个唯一标识(LOB handle)。
当创建一个包含有LOB字段的表时,会相应的创建一些依赖对象用于存储LOB的元数据信息。

什么是LOB Handle

LOB Handle是用于描述一个LOB对象的。包含LOB字段的Trafodion表中每行会包含相应的Handle。
实际的LOB数据存储在HDFS文件(column store),LOB Handle描述LOB对象的位置、偏移量信息、描述信息,可以看成是LOB对象的唯一标识。

示例:创建BLOB字段

1 创建一个包含BLOB字段的表

SQL>cqd traf_blob_as_varchar 'off';

--- SQL operation complete.
SQL>create table t_blob(a int not null, b blob) primary key (a);

--- SQL operation complete.

2 查看表结构及相应描述信息

SQL>showddl t_blob;

CREATE TABLE TRAFODION.SEABASE.T_BLOB
  ( 
    A                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
      SERIALIZED
  , B                                BLOB DEFAULT NULL NOT SERIALIZED
  , PRIMARY KEY (A ASC)
  )
 ATTRIBUTES ALIGNED FORMAT NAMESPACE 'TRAF_RSRVD_3' 
;

SQL>get tables;

Tables in Schema TRAFODION.SEABASE
==================================
LOBDescChunks__04001609182884681768_0001
LOBDescHandle__04001609182884681768_0001
LOBMD__04001609182884681768
T_BLOB

通过以上结果可知,创建一个有LOB类型的表,会额外创建三个独立的表,一个LOB MD表,两个LOB Desc表。另外,对于每个LOB字段,Trafodion把LOB数据存储在独立的HDFS目录/user/trafodion/lobs下。

3 查看上述表内容及文件路径

SQL>select * from LOBMD__04001609182884681768;

LOBNUM STORAGETYPE LOCATION                                                                                                                         COLUMN_NAME                                                                                                                     
------ ----------- -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------
     1           2 /user/trafodion/lobs/TRAF_1500000                                                                                                B

SQL>select * from T_BLOB;

--- 0 row(s) selected.         

hadoop fs -ls /user/trafodion/lobs/TRAF_1500000
Found 1 items
-rw-r--r--   3 trafodion trafodion          0 2018-05-14 06:09 /user/trafodion/lobs/TRAF_1500000/LOBP_04001609182884681768_0001                                                                                                                      

上述LOBMD__04001609182884681768保存了LOB数据实际存储的位置信息,LOB编号,以及对应的字段名称。

示例:创建CLOB字段

1 创建一个包含CLOB字段的表

SQL>cqd traf_clob_as_varchar 'off';

--- SQL operation complete.

SQL>create table t_clob(a int not null, b clob) primary key (a);

--- SQL operation complete.

2 查看表结构及相应描述信息

SQL>showddl t_clob;
CREATE TABLE TRAFODION.SEABASE.T_CLOB
  ( 
    A                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
      SERIALIZED
  , B                                CLOB DEFAULT NULL NOT SERIALIZED
  , PRIMARY KEY (A ASC)
  )
 ATTRIBUTES ALIGNED FORMAT NAMESPACE 'TRAF_RSRVD_3' 
;

SQL>get tables;

Tables in Schema TRAFODION.SEABASE
==================================
LOBDescChunks__04001609182884681768_0001
LOBDescChunks__08455106264966547072_0001
LOBDescHandle__04001609182884681768_0001
LOBDescHandle__08455106264966547072_0001
LOBMD__04001609182884681768
LOBMD__08455106264966547072
T_BLOB
T_CLOB

通过以下结果可以发现,每新增一个LOB字段,当前schema下就会多出3个独立的表,用于描述相应的LOB字段信息。

3 查看上述表内容及文件路径

SQL>select * from LOBMD__08455106264966547072;

LOBNUM STORAGETYPE LOCATION                                                                                                                         COLUMN_NAME                                                                                                                     
------ ----------- -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------
     1           2 /user/trafodion/lobs/TRAF_1500000                                                                                                B
SQL>select * from t_clob;

--- 0 row(s) selected.

hadoop fs -ls /user/trafodion/lobs/TRAF_1500000
Found 2 items
-rw-r--r--   3 trafodion trafodion          0 2018-05-14 06:09 /user/trafodion/lobs/TRAF_1500000/LOBP_04001609182884681768_0001
-rw-r--r--   3 trafodion trafodion          0 2018-05-14 06:36 /user/trafodion/lobs/TRAF_1500000/LOBP_08455106264966547072_0001

插入BLOB数据

1 插入null值

SQL>insert into t_blob values(1,null);

--- 1 row(s) inserted.
SQL>select * from t_blob;

A           B                                                                                                                               
----------- --------------------------------------------------------------------------------------------------------------------------------
          1 NULL                                                                                                                            

--- 1 row(s) selected.

2 插入empty_blob() –返回空的LOB handle

SQL>insert into t_blob values(2,empty_blob()); 

--- 1 row(s) inserted.

SQL>select * from t_blob;

A           B                                                                                                                               
----------- --------------------------------------------------------------------------------------------------------------------------------
          1 NULL                                                                                                                            
          2 LOBH0000000200010400160918288468176819590156528453273693918212393055256514019021"TRAFODION"."SEABASE"                           

--- 2 row(s) selected.

3 插入本地图片(注:需将文件上传到所有数据库节点相同路径)

SQL>insert into t_blob values(3, filetolob('/opt/trafodion/a.png'));

--- 1 row(s) inserted.
SQL>select * from t_blob;

A           B                                                                                                                               
----------- --------------------------------------------------------------------------------------------------------------------------------
          1 NULL                                                                                                                            
          2 LOBH0000000200010400160918288468176819590156528453273693918212393055256514019021"TRAFODION"."SEABASE"                           
          3 LOBH0000000200010400160918288468176819400160919305233014918212393056573095236021"TRAFODION"."SEABASE"                           

--- 3 row(s) selected.

4 插入HDFS图片

SQL>insert into t_blob values(4, filetolob('hdfs:///tmp/a.png'));

--- 1 row(s) inserted.

SQL>select * from t_blob;

A           B                                                                                                                               
----------- --------------------------------------------------------------------------------------------------------------------------------
          1 NULL                                                                                                                            
          2 LOBH0000000200010400160918288468176819590156528453273693918212393055256514019021"TRAFODION"."SEABASE"                           
          3 LOBH0000000200010400160918288468176819400160919305233014918212393056573095236021"TRAFODION"."SEABASE"                           
          4 LOBH0000000200010400160918288468176819845510627479221136218212393056796983880021"TRAFODION"."SEABASE"                           

--- 4 row(s) selected.

插入CLOB数据

1 插入字符串

SQL>insert into t_clob values(1,stringtolob('ABCDEFG'));

--- 1 row(s) inserted.
SQL>select * from t_clob;

A           B                                                                                                                               
----------- --------------------------------------------------------------------------------------------------------------------------------
          1 LOBH0000000200010845510626496654707219174418018914969589618212393365889784172021"TRAFODION"."SEABASE"                           

--- 1 row(s) selected.

以下我们讲述了Trafodion 的LOB的基本使用,包括基本概念、如何创建LOB字段以及如何向LOB中插入数据。后续我们会接着介绍如何查询获取LOB字段中的数据。

你可能感兴趣的:(Trafodion 使用BLOB存储非结构化/半结构化数据)