hadoop-2.6.0-cdh5.15.1源码编译添加压缩支持

文章目录

    • 1.前言
      • 1.1 测试官方包压缩支持情况
    • 2.环境准备
      • 2.1 系统环境
      • 2.2 软件环境
      • 2.3 必要的依赖库安装
      • 2.4 解压JDK
      • 2.4 解压Maven
      • 2.5 配置Maven
      • 2.6 安装protobuf
      • 2.7 添加环境变量
      • 2.8 测试软件安装
    • 3.编译环节
      • 3.1 解压源码
      • 3.2 修改hadoop源码的 pom.xml 仓库,添加阿里云和 cloudera 仓库地址 (可选)
      • 3.3 使用Maven编译hadoop并使其支持压缩
      • 3.4 编译成功
    • 4.测试自编译hadoop压缩支持情况
    • 5.遇到的问题
      • 5.1 Remote host closed connection during handshake和SSL peer shut down incorrectly
      • 5.2 [FATAL] Non-resolvable parent POM for org.apache.hadoop:hadoop-main:2.6.0-cdh5.15.1: Could not transfer artifact com.cloudera.cdh:cdh-root:pom:5.15.1 from/to cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos): Remote host closed connection
    • 6.参考博客

直接从 Cloudera官网下载的hadoop-2.6.0-cdh5.15.1是不支持压缩的,而生产上往往需要这方面的支持,所以需要自行下载源码并编译,这里记录一下自己编译的过程。

1.前言

1.1 测试官方包压缩支持情况

首先我们使用命令hadoop checknative查看官方下载的hadoop-2.6.0-cdh5.15.1对压缩支持的情况

[root@suddev bin]# hadoop checknative
19/08/03 02:34:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop:  false
zlib:    false
snappy:  false
lz4:     false
bzip2:   false
openssl: false
19/08/03 02:34:24 INFO util.ExitUtil: Exiting with status 1

可以看出官方版本是默认不支持的,接下来准备编译并开启压缩支持。

2.环境准备

2.1 系统环境

1.Centos 7

2.2 软件环境

1.hadoop-2.6.0-cdh5.15.1源码 官网下载链接
2.Maven 官网下载地址
3.protobuf 百度网盘下载链接 提取码: bwp1 GitHub下载页面
4.jdk-7u80-linux-x64.tar.gz 百度网盘下载链接 提取码: v1xs
注意:
1、根据前辈的帖子,编译的JDK版本必须是1.7,1.8的JDK会导致编译失败
2、当我们下载万hadoop-2.6.0-cdh5.15.1源码后,我们可以在源码根目录下的BUILDING.txt文件中查看源码包的编译要求,使用命令cat BUILDING.txt

[root@suddev hadoop-2.6.0-cdh5.15.1]# cat BUILDING.txt 
Build instructions for Hadoop
----------------------------------------------------------------------------------
Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

我们看到除了刚才准备的软件外我们还需要其他软件包,这里只需通过Yum命令来安装就行了

2.3 必要的依赖库安装

yum install -y svn ncurses-devel
yum install -y gcc gcc-c++ make cmake
yum install -y openssl openssl-devel svn ncurses-devel zlib-devel libtool
yum install -y snappy snappy-devel bzip2 bzip2-devel lzo lzo-devel lzop autoconf automake cmake 

2.4 解压JDK

tar -zxvf  jdk-7u80-linux-x64.tar.gz

2.4 解压Maven

tar -zxvf  apache-maven-3.6.1-bin.tar.gz 

2.5 配置Maven

vim apache-maven-3.6.1/conf/settings.xml
# 在标签中添加阿里云中央仓库地址

	aliyunmaven
	*,!cloudera
	阿里云公共仓库
	https://maven.aliyun.com/repository/public

2.6 安装protobuf

先解压

 tar -zxvf  protobuf-2.5.0.tar.gz

编译并安装

./configure
 make
 make install

2.7 添加环境变量

路径一定记着修改成自己对应的

vim ~/.bashrc
#在文件末尾添加以下环境变量
export JAVA_HOME=/bd/app/jdk1.7.0_80
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export MAVEN_HOME=/bd/app/apache-maven-3.6.1
export PROTOBUF_HOME=/bd/software/protobuf-2.5.0
export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$PROTOBUF_HOME/bin:$PATH
# :wq退出Vim并设置环境变量生效
source ~/.bashrc

2.8 测试软件安装

jdk测试

[suddev@suddev ~]$ java -version
# 打印以下字样说明安装成功
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

Maven测试

[suddev@suddev ~]$ mvn -v
# 打印以下字样说明安装成功
Apache Maven 3.6.1 (d66c9c0b3152b2e69ee9bac180bb8fcc8e6af555; 2019-04-05T03:00:29+08:00)
Maven home: /bd/app/apache-maven-3.6.1
Java version: 1.7.0_80, vendor: Oracle Corporation, runtime: /bd/app/jdk1.7.0_80/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-957.el7.x86_64", arch: "amd64", family: "unix"

protobuf测试

[suddev@suddev ~]$ protoc --version
# 打印以下字样说明安装成功
libprotoc 2.5.0

3.编译环节

3.1 解压源码

tar -zxvf hadoop-2.6.0-cdh5.15.1-src.tar.gz

3.2 修改hadoop源码的 pom.xml 仓库,添加阿里云和 cloudera 仓库地址 (可选)

vim hadoop-2.6.0-cdh5.15.1-src/pom.xml

替换成以下内容(建议复制原文件备份)

<repositories>
	<repository>      
		<id>aliyunmavenid>      
		<url>http://maven.aliyun.com/nexus/content/groups/public//url>      
		<releases>        
			<enabled>trueenabled>      
		releases>      
		<snapshots>        
			<enabled>trueenabled>        
			<checksumPolicy>failchecksumPolicy>      
		snapshots>    
	repository>    
	<repository>     
		<id>clouderaid>      
		<url>https://repository.cloudera.com/artifactory/cloudera-repos/url>    
	repository>  
repositories>

3.3 使用Maven编译hadoop并使其支持压缩

使用Maven编译hadoop并使其支持压缩:mvn clean package -Pdist,native -DskipTests -Dtar

cd hadoop-2.6.0-cdh5.15.1-src
mvn clean package -Pdist,native -DskipTests -Dtar

根据机器性能和网络情况,这个过程耗时会很长,如果网络不行可以尝试购买阿里云香港主机进行编译,效果可能会好一些。

3.4 编译成功

编译成功后在hadoop-2.6.0-cdh5.15.1/hadoop-dist/target/下会有hadoop-2.6.0-cdh5.15.1.tar.gz包,即为编译后结果
hadoop-2.6.0-cdh5.15.1源码编译添加压缩支持_第1张图片

4.测试自编译hadoop压缩支持情况

tar -zxvf hadoop-2.6.0-cdh5.15.1.tar.gz
cd hadoop-2.6.0-cdh5.15.1
./bin/hadoop checknative
# 打印以下结果说明成功!
19/08/05 19:27:00 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
19/08/05 19:27:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /bd/app/hadoop-2.6.0-cdh5.15.1/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /lib64/libsnappy.so.1
lz4:     true revision:10301
bzip2:   true /lib64/libbz2.so.1
openssl: true /lib64/libcrypto.so

5.遇到的问题

5.1 Remote host closed connection during handshake和SSL peer shut down incorrectly

这个错网上说是JDK1.7的bug,弄了半天没解决,后来想着既然是SSL出错必然是https的原因,于是把pom.xml中的Cloudera Repository从https换成http
http://repository.cloudera.com/artifactory/cloudera-repos/
就解决了

5.2 [FATAL] Non-resolvable parent POM for org.apache.hadoop:hadoop-main:2.6.0-cdh5.15.1: Could not transfer artifact com.cloudera.cdh:cdh-root:pom:5.15.1 from/to cdh.repo (https://repository.cloudera.com/artifactory/cloudera-repos): Remote host closed connection

这个应该就是网络原因吧,我遇到了很多次,非常头疼,不过解决办法很简单:前往Maven本地repo仓库到目标jar文件目录,然后通过wget 命令,从repository.cloudera.com来获取该文件,重新执行编译命令

wget https://repository.cloudera.com/artifactory/cloudera-repos/com/cloudera/cdh/cdh-root/5.15.1/cdh-root-5.15.1.pom

6.参考博客

hadoop-2.6.0-cdh5.7.0源码编译支持压缩
https://blog.csdn.net/liweihope/article/details/89605340#hadoop_174
hadoop之hadoop-2.6.0-cdh5.7.0源码编译支持压缩以及伪分布式部署
https://blog.csdn.net/qq_32641659/article/details/89074365#commentBox
maven仓库不支持cdh解决方案
https://blog.csdn.net/eieiei438/article/details/81742833

你可能感兴趣的:(BigData)