Hive压缩格式配置

转载:
https://blog.csdn.net/qq_32641659/article/details/89337897

1、文件压缩配置实现

首先你的Hadoop是需要编译安装的,参考博客:Hadoop源码编译
https://blog.csdn.net/greenplum_xiaofan/article/details/95466703
检查Hadoop支持的压缩格式:

[hadoop@vm01 hadoop-2.6.0-cdh5.7.0]$ pwd
/home/hadoop/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target/hadoop-2.6.0-cdh5.7.0

[hadoop@vm01 hadoop-2.6.0-cdh5.7.0]$ ./bin/hadoop checknative
19/07/10 23:35:35 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
19/07/10 23:35:35 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /home/hadoop/source/hadoop-2.6.0-cdh5.7.0/hadoop-dist/target/hadoop-2.6.0-cdh5.7.0/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /lib64/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib64/libbz2.so.1
openssl: true /lib64/libcrypto.so

hadoop checknative 虽然没没显示gzip、LZO压缩格式是否支持,是因为检查的是native,只要本机有gzip和LZO相关软件即可

2、修改core-site.xml参数支持压缩

<property>
    <name>io.compression.codecsname>
    <value>org.apache.hadoop.io.compress.GzipCodec,
		org.apache.hadoop.io.compress.DefaultCodec,
		org.apache.hadoop.io.compress.BZip2Codec,
		org.apache.hadoop.io.compress.SnappyCodec
	value>
 property>

设置map以及reduce输出文件的压缩格式


<property>
<name>mapreduce.map.output.compressname>
<value>truevalue>
property>
<property>
<name>mapreduce.map.output.compress.codecname>
<value>org.apache.hadoop.io.compress.SnappyCodecvalue>
property>

<property>
<name>mapreduce.output.fileoutputformat.compressname>
<value>truevalue>
property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codecname>
<value>org.apache.hadoop.io.compress.BZip2Codecvalue>
property>

设置完后,重启Hadoop

3、hive文件压缩配置实现

SET hive.exec.compress.output=true #开启压缩
SET mapreduce.output.fileoutputformat.compress.codec=codec-class #为是core-site.xml中配置某个压缩类

#查看压缩格式。false表示不压缩
hive> SET hive.exec.compress.output;
hive.exec.compress.output=false

#创建page_views表
create table page_views(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string
) row format delimited fields terminated by '\t';

#加载数据
LOAD DATA LOCAL INPATH '/home/hadoop/data/click/page_views.dat' OVERWRITE INTO TABLE page_views;

#查看hdfs文件大小
[hadoop@hadoop001 ~]$ hdfs dfs -du -s -h /user/hive/warehouse/wsktest.db/page_views/
18.1 M  18.1 M  /user/hive/warehouse/wsktest.db/page_views/page_views.dat

采用bzip2压缩

#开启压缩,设置压缩格式为bzip2,默认是Bzip,因为我的hdoop配置的是bzip2
hive> SET hive.exec.compress.output=true;
hive> SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec;

#加载数据
create table page_views_bzip2
as select * from page_views;

#查看hdfs文件大小
[hadoop@hadoop001 ~]$ hdfs dfs -du -s -h /user/hive/warehouse/wsktest.db/page_views_bzip2/*
3.6 M  3.6 M  /user/hive/warehouse/wsktest.db/page_views_bzip2/000000_0.bz2

采用压缩gzip压缩

#开启压缩,设置压缩格式为bzip2,默认是Bzip,因为我的hdoop配置的是bzip2
hive> SET hive.exec.compress.output=true;
hive> set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec;

#加载数据
create table page_views_bzip2
as select * from page_views;

#查看hdfs文件大小
[hadoop@hadoop001 ~]$ hdfs dfs -du -s -h /user/hive/warehouse/wsktest.db/page_views_gzip/*
5.3 M  5.3 M  /user/hive/warehouse/wsktest.db/page_views_gzip/000000_0.gz

你可能感兴趣的:(Hive)