[liujh@hadoop104 hadoop-2.7.2]$ hadoop checknative [-a|-h] check native hadoop and compression libraries availability
[liujh@hadoop104 hadoop-2.7.2]$ hadoop checknative
17/12/24 20:32:52 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
17/12/24 20:32:52 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/module/hadoop-2.7.2/lib/native/libhadoop.so
zlib: true /lib64/libz.so.1
snappy: false
lz4: true revision:99
bzip2: false
[liujh@hadoop102 software]$ tar -zxvf hadoop-2.7.2.tar.gz
[liujh@hadoop102 native]$ pwd
/opt/software/hadoop-2.7.2/lib/native
[liujh@hadoop102 native]$ ll
-rw-r--r--. 1 liujh liujh 472950 9月 1 10:19 libsnappy.a
-rwxr-xr-x. 1 liujh liujh 955 9月 1 10:19 libsnappy.la
lrwxrwxrwx. 1 liujh liujh 18 12月 24 20:39 libsnappy.so -> libsnappy.so.1.3.0
lrwxrwxrwx. 1 liujh liujh 18 12月 24 20:39 libsnappy.so.1 -> libsnappy.so.1.3.0
-rwxr-xr-x. 1 liujh liujh 228177 9月 1 10:19 libsnappy.so.1.3.0
[liujh@hadoop102 native]$ cp ../native/* /opt/module/hadoop-2.7.2/lib/native/
[liujh@hadoop102 lib]$ xsync native/
[liujh@hadoop102 hadoop-2.7.2]$ hadoop checknative
17/12/24 20:45:02 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
17/12/24 20:45:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/module/hadoop-2.7.2/lib/native/libhadoop.so
zlib: true /lib64/libz.so.1
snappy: true /opt/module/hadoop-2.7.2/lib/native/libsnappy.so.1
lz4: true revision:99
bzip2: false
官网:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC
ORC存储方式的压缩:
Key | Default | Notes |
---|---|---|
orc.compress | ZLIB | high level compression (one of NONE, ZLIB, SNAPPY) |
orc.compress.size | 262,144 | number of bytes in each compression chunk |
orc.stripe.size | 268,435,456 | number of bytes in each stripe |
orc.row.index.stride | 10,000 | number of rows between index entries (must be >= 1000) |
orc.create.index | true | whether to create row indexes |
orc.bloom.filter.columns | “” | comma separated list of column names for which bloom filter should be created |
orc.bloom.filter.fpp | 0.05 | false positive probability for bloom filter (must >0.0 and <1.0) |
注意:所有关于ORCFile的参数都是在HQL语句的TBLPROPERTIES字段里面出现
create table log_orc_none(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)
row format delimited fields terminated by '\t'
stored as orc tblproperties ("orc.compress"="NONE");
hive (default)> insert into table log_orc_none select * from log_text ;
hive (default)> dfs -du -h /user/hive/warehouse/log_orc_none/ ;
7.7 M /user/hive/warehouse/log_orc_none/000000_0
create table log_orc_snappy(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
)
row format delimited fields terminated by '\t'
stored as orc tblproperties ("orc.compress"="SNAPPY");
hive (default)> insert into table log_orc_snappy select * from log_text ;
hive (default)> dfs -du -h /user/hive/warehouse/log_orc_snappy/ ;
3.8 M /user/hive/warehouse/log_orc_snappy/000000_0
2.8 M /user/hive/warehouse/log_orc/000000_0
比Snappy压缩的还小。原因是orc存储文件默认采用ZLIB压缩,ZLIB采用的是deflate压缩算法。比snappy压缩的小。
在实际的项目开发当中,hive表的数据存储格式一般选择:orc或parquet。压缩方式一般选择snappy,lzo。
简书:https://www.jianshu.com/u/0278602aea1d
CSDN:https://blog.csdn.net/u012387141