hadoop2.x 默认没有支持snappy压缩,需要我们自己编译 才能支持snappy的压缩。
查看hadoop2.x 的BUILDING文件 ,
Snappy build options:
Snappy is a compression library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.
* Use -Drequire.snappy to fail the build if libsnappy.so is not found.
If this option is not specified and the snappy library is missing,
we silently build a version of libhadoop.so that cannot make use of snappy.
This option is recommended if you plan on making use of snappy and want
to get more repeatable builds.
1.环境准备
- 安装 jdk1.6
- 安装maven-3.05
- 安装gcc,openssl,cmake 等软件
sudo yum -y install gcc-c++ libstdc++-devel sudo yum -y install openssl openssl-devel autoconf ncurses-devel libtool cmake zlib-devel
- 安装protobuf-2.5.0
下载 http://pan.baidu.com/s/1pJlZubT
tar -zxvf protobuf-2.5.0.tar.gz cd protobuf sudo ./configure sudo make & make install #验证: protoc --version
- 安装findbugs
wget -c http://prdownloads.sourceforge.net/findbugs/findbugs-2.0.3.tar.gz?download tar -zxvf findbugs-2.0.3.tar.gz -C /opt/modules #配置FINDBUGS环境变量 sudo vi /etc/profile
export FINDBUGS_HOME=/opt/findbugs-2.0.3 export PATH=$PATH:$FINDBUGS_HOME/bin
source /etc/profile
#验证:
fb -version
- 安装snappy压缩库
下载snappy-1.1.1.tar.gz http://pan.baidu.com/s/1o6vGQ18
tar -zxvf snappy-1.1.1.tar.gz cd snappy-1.1.1 sudo ./configure sudo make sudo make install
2.编译hadoop 2.x,对应系统的本地库,其中要包含支持snappy lib
下载hadoop2.x 我选用hadoop-2.5.0-cdh5.3.3
wget -c http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.3-src.tar.gz tar -zxvf hadoop-2.5.0-cdh5.3.3-src.tar.gz -C /opt/compile cd /opt/compile/hadoop-2.5.0-cdh5.3.3 #必须加上 -Drequire.snappy 参数才能支持snappy压缩 mvn clean package -Pdist,native -Drequire.snappy -DskipTests -Dtar #编译的native文件 http://pan.baidu.com/s/12QVam
3.替换hadoop lib/native包
把编译之后的lib/native文件替换掉$HADOOP_HOME/lib/native文件
注意事项:
替换前后必须保持所有文件的格式一致(所有的.a文件 权限655 ,所有的 *.so.xx.xx 权限 755 ,给*.so.xx.xx 创建软链接)
cd $HADOOP_HOME/lib/ mv native backup_native mkdir native cp 自己编译的/lib/native/* native/ chmod 644 *.a rm libhadoop.so libhdfs.so libnativetask.so chmod 755 libnativetask.so.1.0.0 chmod 755 libhdfs.so.0.0.0 chmod 755 libhadoop.so.1.0.0 ln -s libnativetask.so.1.0.0 libnativetask.so ln -s libhdfs.so.0.0.0 libhdfs.so ln -s libhadoop.so.1.0.0 libhadoop.so #ll 各个文件格式对应如下:
-rw-r--r-- 1 ehp ehp 1178622 Jul 5 08:37 libhadoop.a -rw-r--r-- 1 ehp ehp 1487052 Jul 5 08:37 libhadooppipes.a lrwxrwxrwx 1 ehp ehp 18 Jul 5 08:38 libhadoop.so -> libhadoop.so.1.0.0 -rwxr-xr-x 1 ehp ehp 697211 Jul 5 08:37 libhadoop.so.1.0.0 -rw-r--r-- 1 ehp ehp 582056 Jul 5 08:37 libhadooputils.a -rw-r--r-- 1 ehp ehp 359794 Jul 5 08:37 libhdfs.a lrwxrwxrwx 1 ehp ehp 16 Jul 5 08:38 libhdfs.so -> libhdfs.so.0.0.0 -rwxr-xr-x 1 ehp ehp 228715 Jul 5 08:37 libhdfs.so.0.0.0 -rw-r--r-- 1 ehp ehp 7684428 Jul 5 08:37 libnativetask.a lrwxrwxrwx 1 ehp ehp 22 Jul 5 08:39 libnativetask.so -> libnativetask.so.1.0.0 -rwxr-xr-x 1 ehp ehp 3061007 Jul 5 08:37 libnativetask.so.1.0.0
4.编译hadoop-snappy库
下载地址:http://code.google.com/p/hadoop-snappy/(可以无法访问)
我的云盘分享地址 http://pan.baidu.com/s/1pJkj7sB
Build Hadoop Snappy
1. Requirements: gcc c++, autoconf, automake, libtool, Java 6, JAVA_HOME set, Maven 3
2. Build/install Snappy (http://code.google.com/p/snappy/)
3. Build Hadoop Snappy
$ mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]
'snappy.prefix' by default is '/usr/local'. If Snappy is installed in other location than user local set 'snappy.prefix' to the right location.
The built tarball is at target/hadoop-snappy-0.0.1-SNAPSHOT.tar.gz. The tarball includes snappy native library
编译后文件:下载地址 http://pan.baidu.com/s/1hq5w7cg
放入hadoop2到对应的目录下
$ tar -zxf hadoop-snappy-0.0.1-SNAPSHOT.tar.gz $ cd hadoop-snappy-0.0.1-SNAPSHOT $ cp -r lib/native/Linux-amd64-64/* /opt/modules/hadoop-2.5.0-cdh5.3.3/lib/native $ cp lib/hadoop-snappy-0.0.1-SNAPSHOT.jar /opt/modules/hadoop-2.5.0-cdh5.3.3/share/hadoop/common/lib/
5.设置mapper output 启动压缩和压缩编码
#mapred-site.xml <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property>
6.检查snappy是否正确配置
hadoop checknative
7.测试
hadoop fs -rm -R /user/ehp/mapred/wordcount/output
bin/yarn jar \
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.3.jar \
wordcount \
/user/ehp/mapred/wordcount/input \
/user/ehp/mapred/wordcount/output
在uber模式下我们可以看到这么一个问题
2015-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Error running local (uberized) 'child' : java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
解决办法:https://issues.apache.org/jira/browse/MAPREDUCE-5799
#yarn-site.xml <property> <name>yarn.app.mapreduce.am.env</name> <value>LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native</value> </property>
不修改yarn-site.xml 运行时动态增加配置:
bin/yarn jar \ share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.3.jar \ wordcount \ -Dyarn.app.mapreduce.am.env=LD_LIBRARY_PATH=/opt/modules/hadoop-2.5.0-cdh5.3.3/lib/native \ /user/ehp/mapred/wordcount/input \ /user/ehp/mapred/wordcount/output