环境:Ubuntu14.4
jdk1.8
maven3.3.9
Hadoop2.7.3
protocbuf 2.5.0
--------hadoop编译------
1、安装jdk 配置环境变量
1.1 解压文件,修改文件名
$ sudo mkdir /usr/lib/jvm
$ sudo tar zxvf /home/linlin/soft/jdk-7u71-linux-i586.tar.gz -C /usr/lib/jvm
$ cd /usr/lib/jvm
$ sudo mv jdk1.7.0_71 jdk
1.2 添加环境变量
$ sudo vim /etc/profile
加入如下内容
export JAVA_HOME=/usr/lib/jvm/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JER_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JER_HOME/bin:$PATH
$ source /etc/profile(让配置生效)
1.3 测试
$ java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) Server VM (build 24.71-b01, mixed mode)
2、安装maven 配置环境变量
解压后,添加环境变量,再测试
$ sudo sudo vim /etc/profile
export MAVEN_HOME=/usr/soft/apache-maven-3.3.3
export PATH=.:$PATH:$JAVA_HOME/bin:$MAVEN_HOME/bin
$ source /etc/profile
$ mvn -version
检验是否安装成功;
3、安装依赖库(重要)
sudo apt-get install g++ autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
4、安装protobuf
解压 tar -zxvf protobuf-2.5.0.tar.gz 到 /usr/soft
$ cd /usr/soft/protobuf-2.5.0
$ ./configure --prefix=/usr/soft/protobuf-2.5.0
$ make
$ make check
$ make clean
$ make install
输入 protoc --version 验证是否成功,出现 libprotoc 2.5.0证明成功!
http://wenku.baidu.com/link?url=QqArdNWMuOBkC5PtUwP4v9mb-Ig6y-qoU7u3LhrMEbG0C8DA-m4g0foGJ6dvfo1PU0aeoxYu0i3zPsOrgIkaAB902w-PkAxdlWxdicwuJS3
NOTE:
$ make clean
此命令为了在运行sudo make install 时出现cannot install `libprotoc.la' to
a directory not ending in /opt/protoc/lib
5、编译Hadoop
5.1 maven换源(因为默认源的下载速度很慢)
5.1.1 修改/work/maven3.3.9/conf/settings.xml。
alimaven
central
aliyun maven
http://maven.aliyun.com/nexus/content/repositories/central/
5.1.2 虽然换源了,但是还是可以看到hadoop项目向http://repository.jboss.org这个网址下载jar,所以我在hadoop源码项目下的pom.xml文件也进行修改
/work/hadoop-2.7.3-src/pom.xml
(增加)
alimaven
aliyun maven
http://maven.aliyun.com/nexus/content/repositories/central/
(原有的)
${distMgmtSnapshotsId}
${distMgmtSnapshotsName}
${distMgmtSnapshotsUrl}
(原有的)
repository.jboss.org
http://repository.jboss.org/nexus/content/groups/public/
false
5.2 增多VM内存
我是在virtualBox里起1GB内存的Ubuntu。但是在hadoop编译过程遇到内存不足的错误(如下)。
[解决方法] virtualBox增大VM,只需要在 设置-系统-内存 更改即可。
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 95862784 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /work/hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/target/hs_err_pid4640.log
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 3.899 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [ 3.204 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 3.721 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 9.949 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 1.308 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 3.869 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 10.007 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 30.554 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 40.842 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 5.922 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:33 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 11.737 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 16.281 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.156 s]
[INFO] Apache Hadoop HDFS ................................. FAILURE [01:09 min]
[INFO] Apache Hadoop HttpFS ............................... SKIPPED
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SKIPPED
[INFO] Apache Hadoop HDFS-NFS ............................. SKIPPED
[INFO] Apache Hadoop HDFS Project ......................... SKIPPED
[INFO] hadoop-yarn ........................................ SKIPPED
[INFO] hadoop-yarn-api .................................... SKIPPED
[INFO] hadoop-yarn-common ................................. SKIPPED
[INFO] hadoop-yarn-server ................................. SKIPPED
[INFO] hadoop-yarn-server-common .......................... SKIPPED
[INFO] hadoop-yarn-server-nodemanager ..................... SKIPPED
[INFO] hadoop-yarn-server-web-proxy ....................... SKIPPED
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SKIPPED
[INFO] hadoop-yarn-server-resourcemanager ................. SKIPPED
[INFO] hadoop-yarn-server-tests ........................... SKIPPED
[INFO] hadoop-yarn-client ................................. SKIPPED
[INFO] hadoop-yarn-server-sharedcachemanager .............. SKIPPED
[INFO] hadoop-yarn-applications ........................... SKIPPED
[INFO] hadoop-yarn-applications-distributedshell .......... SKIPPED
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SKIPPED
[INFO] hadoop-yarn-site ................................... SKIPPED
[INFO] hadoop-yarn-registry ............................... SKIPPED
[INFO] hadoop-yarn-project ................................ SKIPPED
[INFO] hadoop-mapreduce-client ............................ SKIPPED
[INFO] hadoop-mapreduce-client-core ....................... SKIPPED
[INFO] hadoop-mapreduce-client-common ..................... SKIPPED
[INFO] hadoop-mapreduce-client-shuffle .................... SKIPPED
[INFO] hadoop-mapreduce-client-app ........................ SKIPPED
[INFO] hadoop-mapreduce-client-hs ......................... SKIPPED
[INFO] hadoop-mapreduce-client-jobclient .................. SKIPPED
[INFO] hadoop-mapreduce-client-hs-plugins ................. SKIPPED
[INFO] Apache Hadoop MapReduce Examples ................... SKIPPED
[INFO] hadoop-mapreduce ................................... SKIPPED
[INFO] Apache Hadoop MapReduce Streaming .................. SKIPPED
[INFO] Apache Hadoop Distributed Copy ..................... SKIPPED
[INFO] Apache Hadoop Archives ............................. SKIPPED
[INFO] Apache Hadoop Rumen ................................ SKIPPED
[INFO] Apache Hadoop Gridmix .............................. SKIPPED
[INFO] Apache Hadoop Data Join ............................ SKIPPED
[INFO] Apache Hadoop Ant Tasks ............................ SKIPPED
[INFO] Apache Hadoop Extras ............................... SKIPPED
[INFO] Apache Hadoop Pipes ................................ SKIPPED
[INFO] Apache Hadoop OpenStack support .................... SKIPPED
[INFO] Apache Hadoop Amazon Web Services support .......... SKIPPED
[INFO] Apache Hadoop Azure support ........................ SKIPPED
[INFO] Apache Hadoop Client ............................... SKIPPED
[INFO] Apache Hadoop Mini-Cluster ......................... SKIPPED
[INFO] Apache Hadoop Scheduler Load Simulator ............. SKIPPED
[INFO] Apache Hadoop Tools Dist ........................... SKIPPED
[INFO] Apache Hadoop Tools ................................ SKIPPED
[INFO] Apache Hadoop Distribution ......................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:12 min
[INFO] Finished at: 2017-01-04T11:23:41+08:00
[INFO] Final Memory: 74M/241M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.8.1:jar (module-javadocs) on project hadoop-hdfs: MavenReportException: Error while creating archive:
[ERROR] Exit code: 1 - Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f4613000, 95862784, 0) failed; error='Cannot allocate memory' (errno=12)
[ERROR]
[ERROR] Command line was: /work/jdk1.8/jre/../bin/javadoc -J-Xmx512m @options @packages
[ERROR]
[ERROR] Refer to the generated Javadoc files in '/work/hadoop-2.7.3-src/hadoop-hdfs-project/hadoop-hdfs/target' dir.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :hadoop-hdfs
5.3 编译
先把源码拷贝到 linux上,我使用的是2.7.3版本,进入源码目录。
解压源码:
$ tar -zxvf hadoop-2.6.1-src.tar.gz
进入源码目录执行
$ cd hadoop-2.7.3-src
$ mvn clean package -Pdist,native,docs -DskipTests -Dtar
编译时不clean,会接着上次结束的地方继续下载jar包等。如果一次编译没有成功(因为包没有下全),可以多次编译。
建议编译时加native参数,否则在运行hadoop时会出现warning。(我是直接把别的编译好的so文件拿来用了)
编译成功(如下),生成的jar包会放在/work/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3.tar.gz。另外每个模块也会有对于的生成的jar包。
如果修改了部分hadoop的源码,只需要局部编译,然后去对应位置替换即可。
在局部编译后的target目录下,找到jar包,替换整体编译好后 share/hadoop/yarn下的jar包。
main:
[exec] $ tar cf hadoop-2.7.3.tar hadoop-2.7.3
[exec] $ gzip -f hadoop-2.7.3.tar
[exec]
[exec] Hadoop dist tar available at: /work/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3.tar.gz
[exec]
[INFO] Executed tasks
[INFO]
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist ---
[INFO] Building jar: /work/hadoop-2.7.3-src/hadoop-dist/target/hadoop-dist-2.7.3-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 2.351 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [ 2.377 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 2.531 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 4.862 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.789 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 2.979 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 6.490 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 9.262 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 10.741 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 5.350 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:38 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 9.575 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 16.082 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.289 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [02:17 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 24.129 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 10.031 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 6.815 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.151 s]
[INFO] hadoop-yarn ........................................ SUCCESS [ 0.142 s]
[INFO] hadoop-yarn-api .................................... SUCCESS [ 51.981 s]
[INFO] hadoop-yarn-common ................................. SUCCESS [ 57.428 s]
[INFO] hadoop-yarn-server ................................. SUCCESS [ 0.194 s]
[INFO] hadoop-yarn-server-common .......................... SUCCESS [ 17.033 s]
[INFO] hadoop-yarn-server-nodemanager ..................... SUCCESS [ 21.225 s]
[INFO] hadoop-yarn-server-web-proxy ....................... SUCCESS [ 6.788 s]
[INFO] hadoop-yarn-server-applicationhistoryservice ....... SUCCESS [ 12.827 s]
[INFO] hadoop-yarn-server-resourcemanager ................. SUCCESS [ 33.397 s]
[INFO] hadoop-yarn-server-tests ........................... SUCCESS [ 8.309 s]
[INFO] hadoop-yarn-client ................................. SUCCESS [ 10.482 s]
[INFO] hadoop-yarn-server-sharedcachemanager .............. SUCCESS [ 6.218 s]
[INFO] hadoop-yarn-applications ........................... SUCCESS [ 0.080 s]
[INFO] hadoop-yarn-applications-distributedshell .......... SUCCESS [ 4.473 s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher ..... SUCCESS [ 3.484 s]
[INFO] hadoop-yarn-site ................................... SUCCESS [ 0.124 s]
[INFO] hadoop-yarn-registry ............................... SUCCESS [ 8.447 s]
[INFO] hadoop-yarn-project ................................ SUCCESS [ 5.315 s]
[INFO] hadoop-mapreduce-client ............................ SUCCESS [ 0.364 s]
[INFO] hadoop-mapreduce-client-core ....................... SUCCESS [ 30.845 s]
[INFO] hadoop-mapreduce-client-common ..................... SUCCESS [ 25.998 s]
[INFO] hadoop-mapreduce-client-shuffle .................... SUCCESS [ 6.805 s]
[INFO] hadoop-mapreduce-client-app ........................ SUCCESS [ 14.619 s]
[INFO] hadoop-mapreduce-client-hs ......................... SUCCESS [ 9.448 s]
[INFO] hadoop-mapreduce-client-jobclient .................. SUCCESS [ 11.655 s]
[INFO] hadoop-mapreduce-client-hs-plugins ................. SUCCESS [ 3.635 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 9.296 s]
[INFO] hadoop-mapreduce ................................... SUCCESS [ 3.235 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 7.634 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 13.898 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 3.627 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 8.766 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 7.593 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 4.466 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 3.146 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 4.964 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 0.092 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 7.954 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 53.124 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 7.363 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [ 9.754 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 1.902 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 8.209 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 9.538 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.072 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 38.443 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 15:11 min
[INFO] Finished at: 2017-01-04T12:46:30+08:00
[INFO] Final Memory: 166M/454M
[INFO] ------------------------------------------------------------------------
root@daisy-VirtualBox:/work/hadoop-2.7.3-src#
------------hadoop集群搭建-------
一、环境
•OpenStack 平台,构建虚拟机,也可用VMWare代替
•Centos7 x64 操作系统
•Hadoop 2.7.3 64位安装包(bin版)。使用上文编译后的hadoop-2.7.3.tar.gz
二、基础配置
为了方便,全部在root用户下进行操作。
sudo passwd root来设置一下密码
然后 su root用root登陆
§ 更改主机名,并建立主机名与IP的映射关系(每个节点都要做)
1、临时更改主机名,机器重启后失效,以master1为例,其余每个节点同样设置:
# hostname master1
# hostname
master1
2、永久修改主机名
修改centos网络配置文件/etc/sysconfig/network,在末尾添加HOSTNAME=master1:
# vim /etc/sysconfig/network
NETWORKING=yes
NOZEROCONF=yes
HOSTNAME=master1
~
~
3、修改/etc/hosts文件,最终状态,节点中的每台主机都要修改好相应的主机名,并在hosts文件中写入相应的IP 和主机名的映射关系,状态如下所示:
# vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.1.33 master1
172.16.1.35 master2
172.16.1.22 slaver1
172.16.1.23 slaver2
172.16.1.24 slaver3
~
~
这样是永久修改,重启后生效,也可以结合第一种方法,让该用户名立即生效。
NOTE:
•在/etc/下有一个hostname文件,如果是Ubuntu系统可以直接在这个文件下面写入想要设置的主机名来达到永久修改主机名。
•经过验证,openstack生成的虚拟机中,hostname会写有一个类似于test.novalocal的以*.novalocal结尾的主机名,如果按照方法2修改主机名,每次重启后,会自动把主机名设成这个名字,而不能如愿更改为我们想要的主机名,即使在hostname文件中,删掉原有主机名,写上我们想要设置的主机名,重启后以后会还原为删掉之前的样子。
•所以,在本次配置中,我们使用方法1来临时修改主机名,尽量不重启机器,如果需要重启机器,重启后重新修改主机名,以防Hadoop不能正常启动。
4、如果机器的时间差相差太大容易导致启动失败,所以提前进行时间同步:
# yum install ntp
# ntpdate ntp.xx
NOTE:
因为我是在校园网内,需要在机器里拷入并运行一个脚本来连接外网
[root@master cloud-user]# bash ./auto-login.sh
[root@master cloud-user]# ping www.baidu.com
PING www.a.shifen.com (119.75.218.70) 56(84) bytes of data.
64 bytes from 119.75.218.70: icmp_seq=1 ttl=49 time=24.6 ms
64 bytes from 119.75.218.70: icmp_seq=2 ttl=49 time=23.8 ms
64 bytes from 119.75.218.70: icmp_seq=3 ttl=49 time=24.8 ms
64 bytes from 119.75.218.70: icmp_seq=4 ttl=49 time=24.6 ms
§ 关闭防火墙(每个节点都要配置)
如果是生产环境中可以通过配置iptables规则来开放端口,此处我们作为实验且私网环境直接关闭放火墙来达到目的:
§ 创建专门的用户(每个节点都要配置)
在安装完Centos7后,如果在真实的生产环境中,最好建立一个新的用户和组,专门用来安装Hadoop。
为了方便,这里直接使用root
§ 配置ssh免密登录(每个节点都要配置)
hadoop在使用过程中需要分发好多文件,配置好免密登录可以免去我们要不断地输入密码的麻烦。也有助于我们部署过程中把自己修改的配置文件分发到各个节点。
1、配置节点
1.1 修改sshd配置文件(/etc/ssh/sshd_config).
修改PermitRootLogin yes 重要
找到以下内容,并去掉注释符”#“
=========================
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
1.2 重启sshd.
$ /etc/init.d/sshd restart
或
$ systemctl restart sshd.service
2、生成公钥和私钥,过程中会有一些提示选项,直接按回车,采用默认值便好(每个节点)
# ssh-keygen -t rsa
3、执行完上面的命令后,会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥),将公钥拷贝到要免登陆的机器上的authorized_keys文件里
(~/.ssh/下产生一个名为authorized_keys的文件)
NOTE:
A要登录B,则把A的公钥放入B的authorized_keys文件
A要登录自己,也要把自己的公钥放到authorized_keys文件
4、验证。ssh hostname看是否可以连接上
三、安装JDK(每个节点都要配置)
[配置一台,其余的scp传送(jdk8的文件夹和profile文件)]
本次是用的是 jdk-8u101-linux-x64.tar.gz ,可以从Oracle官网下载,然后传送到每一台机器上,并解压,解压的路径可以自由选择,本次选择/opt/。
# tar -zxvf jdk-8u101-linux-x64.tar.gz -C /opt
// 修改文件夹名字
# # mv /opt/jdk1.8.0_101 /opt/jdk8
配置环境变量:
// 修改配置文件
# sudo vim /etc/profile
// 在最后下添加
export JAVA_HOME=/opt/jdk8
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib
// 刷新配置文件
# source /etc/profile
其他每台机器都做同样的配置,或者将这个配好的jdk和profile文件用scp命令分发到每一台机器上。
NOTE:
•jdk的安装目录尽量不要选在普通用户的/home/USER_NAME家目录下,因为在后面hadoop配置中需要用到这个jdk的目录的绝对路径,如果写到/home/USER_NAME这个里面,其中的/USER_NAME会根据分发到各节点的用户名不同而不同,所以要在hadoop中重新配置这个JAVA_HOME的绝对路径,否则会导致启动失败。
四、安装hadoop(所有的节点都要安装)
先从官网下载hadoop-2.7.3.tar.gz,上传到master1这台机器上。先在master1上解压,配置,然后用scp命令分发到其余各个节点即可。
§ 解压hadoop并配置环境变量
1、将上传来的hadoop解压到/opt文件夹下,并重命名为hadoop:
# tar -zxvf hadoop-2.7.3.tar.gz -C /opt
# mv hadoop-2.7.3 hadoop
2、配置环境变量,以便能够在任何位置使用hadoop的相关命令:
# vim /etc/profile
在末尾添加以下内容:
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_LOG_DIR=/home/hadoop/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
让配置文件生效:
# source /etc/profile
3、测试:
# which hadoop
/opt/hadoop/bin/hadoop
§ 建立hadoop的工作目录结构
本次选择在/home目录下建立hadoop的工作目录。目录结构为:
/home/hadoop
/home/hadoop/tmp
/home/hadoop/logs
/home/hadoop/hdfs
/home/hadoop/journal
/home/hadoop/hdfs/datanode
/home/hadoop/hdfs/namenode
§ 修改hadoop配置文件,共7个
所有的配置文件都在hadoop根目录下hadoop/etc/hadoop文件夹下面,所以要先cd到这个目录下。
1、配置hadoop-env.sh,将文件中的export JAVA_HOME=${JAVA_HOME}写成我们自己jdk的绝对路径:
# The java implementation to use.
export JAVA_HOME=/opt/jdk8
2、修改yarn_env.sh,将其中的export JAVA_HOME=${JAVA_HOME}写成自己的jdk的绝对路径:
# some Java parameters
export JAVA_HOME=/opt/jdk8
3、修改mapred-site.xml,首先需要将文件夹中的mapred-site.xml.template复制并重命名为mapred-site.xml:
# cp mapred-site.xml.template mapred-site.xml
4、修改slaves文件,这个文件中是写入希望成为datanode节点的机器的主机名:
# vim slaves
slaver1
slaver2
slaver3
5、mapred-site.xml、core-site.xml、yarn-site.xml、hdfs-site.xml
具体配置,见附件。
§ 将配置好的hadoop拷贝到其他主机
1、用scp命令向其他主机分发配置好的hadoop文件:
# cd /opt
# scp -r ./hadoop master2:/opt
# scp -r ./hadoop slaver1:/opt
# scp -r ./hadoop slaver2:/opt
# scp -r ./hadoop slaver3:/opt
2、用scp命令向其他主机分发配置好的/etc/profile文件:
# scp -r /etc/profile master2:/etc
# scp -r /etc/profile slaver1:/etc
# scp -r /etc/profile slaver2:/etc
# scp -r /etc/profile slaver3:/etc
3、在每台主机上更新环境变量,使其生效:
# source /etc/profile
五、 启动hadoop集群
1、第一次运行要对HDFS(namenode)进行格式化,在master1上执行以下命令,如果没有配置好hadoop的环境变量,可以到hadoop目录下的bin目录下执行./hdfs namenode –format:
# hdfs namenode -format
# hadoop-daemon.sh start namenode
NOTE:
•格式化第二次有可能会造成DataNode无法启动,原因是NameSpaceID不一致造成,解决方法是找出dafs/datanode目录下不一致的VERSION修改NameSpaceID,也可以尝试删除hdfs/datanode目录,重新格式化。
2、在master1上,启动HDFS(NameNode),如果没有配置好hadoop的环境变量,可以到hadoop目录下的sbin目录下执行./start-dfs.sh:
# start-dfs.sh
3、在master1上,启动YARN( ResourceManager),如果没有配置好hadoop的环境变量,可以到hadoop目录下的sbin目录下执行./start-yarn.sh:
# start-yarn.sh
NOTE:
•中间的start参数换成stop就成了停止指定进程的命令。
•也可以使用start-all.sh启动。
[root@amaster hadoop]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [amaster]
amaster: starting namenode, logging to /opt/hadoop/logs/hadoop-root-namenode-amaster.out
aslave-4: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-4.out
aslave-2: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-2.out
aslave-1: starting datanode, logging to /opt/hadoop/logs/hadoop-root-datanode-aslave-1.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/logs/yarn-root-resourcemanager-amaster.out
aslave-4: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-aslave-4.out
aslave-2: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-aslave-2.out
aslave-1: starting nodemanager, logging to /opt/hadoop/logs/yarn-root-nodemanager-aslave-1.out
4、在启动Hadoop集群的时候,除了start-yarn.sh和start-dfs.sh,还要启动historyserver,命令是
mr-jobhistory-daemon.sh start historyserver
mr-jobhistory-daemon.sh命令是在${HADOOP_INSTALL}/sbin/目录下面。启动完了,用jps命令可以看到有JobHistoryServer的进程启动。
启动了HistoryServer后,就可以看到Tracking URL里面的History了。(http://10.10.89.119:19888/jobhistory/tasks/job_1452505390103_0001/m)
http://jingpin.jikexueyuan.com/article/39350.html
5、验证hadoop集群(HDFS和YARN)是否正常启动,可以通过jps命令,查看各节点运行的java进程:
// master1上运行的进程,前面的数字对应的进程的pid号
31747 Jps
17607 JobHistoryServer
977 ResourceManager
522 NameNode
// slaver1上运行的进程,前面的数字对应的进程的pid号
13520 DataNode
13876 Jps
13690 NodeManager
6、通过hadoop的web页面,查看是否异常
hdfs使用情况
http://masterIP:50070
集群作业运行情况
http://masterIP:8088/cluster/cluster
单个作业task运行信息
http://masterIP:19888/jobhistory/tasks/job_1452505390103_0001/m
-------------HDFS操作命令-----------
// 查看DataNode节点信息,可以使用这个命令脚本监控DFS状况
# hadoop dfsadmin -report
// 指定HDFS地址访问
# hadoop fs -ls hdfs://hcluster:9000/
// 列出HDFS文件系统目录下文件和目录
# hadoop fs -ls /
// 递归列出目录
# hadoop fs -lsr /
// 创建test目录
# hadoop fs -mkdir /test
// 上传文件到test目录
# hadoop fs -put /root/test.txt /test/test.txt
// 查看文件内容
# hadoop fs -cat /test/test.txt
// 查看文件大小
# hadoop fs -du /test/test.txt
// 删除文件
# hadoop fs -rm /test/test.txt
// 递归删除目录或文件
# hadoop fs -rmr /test
------------benchmark命令-------
一、TestDFSIO
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar TestDFSIO -write -nrFiles 8 -fileSize 128 -resFile /tmp/TestDFSIOwrite.txt
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar TestDFSIO -read -nrFiles 8 -fileSize 128 -resFile /tmp/TestDFSIOread.txt
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar TestDFSIO -clean
二、Sort
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar randomwriter -D mapreduce.randomwriter.mapsperhost=1 -D mapreduce.randomwriter.totalbytes=1073741800 /user/ubuntu/sort/1G-input
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sort -r 4 /user/ubuntu/sort/1G-input /user/ubuntu/sort/1G-output-task48-1
变更task资源需求
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sort -D mapreduce.map.memory.mb=3072 -D mapreduce.reduce.memory.mb=3072 -r 4 /user/ubuntu/sort/1G-input /user/ubuntu/sort/1G-output-task48-1
hadoop fs -rm -r /user/ubuntu/sort/1G-output*
---------一次成功的Sort执行---------
[root@amaster hadoop]# hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar sort -r 4 /user/ubuntu/sort/1G-input /user/ubuntu/sort/1G-output-task48-10
17/01/07 11:35:26 INFO client.RMProxy: Connecting to ResourceManager at amaster/10.10.89.119:8032
Running on 3 nodes to sort from hdfs://amaster:9000/user/ubuntu/sort/1G-input into hdfs://amaster:9000/user/ubuntu/sort/1G-output-task48-10 with 4 reduces.
Job started: Sat Jan 07 11:35:27 UTC 2017
17/01/07 11:35:27 INFO client.RMProxy: Connecting to ResourceManager at amaster/10.10.89.119:8032
17/01/07 11:35:29 INFO input.FileInputFormat: Total input paths to process : 1
17/01/07 11:35:29 INFO mapreduce.JobSubmitter: number of splits:8
17/01/07 11:35:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1483788624194_0001
17/01/07 11:35:29 INFO impl.YarnClientImpl: Submitted application application_1483788624194_0001
17/01/07 11:35:29 INFO mapreduce.Job: The url to track the job: http://amaster:8088/proxy/application_1483788624194_0001/
17/01/07 11:35:29 INFO mapreduce.Job: Running job: job_1483788624194_0001
17/01/07 11:35:36 INFO mapreduce.Job: Job job_1483788624194_0001 running in uber mode : false
17/01/07 11:35:36 INFO mapreduce.Job: map 0% reduce 0%
17/01/07 11:35:52 INFO mapreduce.Job: map 13% reduce 0%
17/01/07 11:35:53 INFO mapreduce.Job: map 100% reduce 0%
17/01/07 11:36:04 INFO mapreduce.Job: map 100% reduce 28%
17/01/07 11:36:07 INFO mapreduce.Job: map 100% reduce 43%
17/01/07 11:36:10 INFO mapreduce.Job: map 100% reduce 79%
17/01/07 11:36:13 INFO mapreduce.Job: map 100% reduce 89%
17/01/07 11:36:16 INFO mapreduce.Job: map 100% reduce 94%
17/01/07 11:36:18 INFO mapreduce.Job: map 100% reduce 95%
17/01/07 11:36:19 INFO mapreduce.Job: map 100% reduce 100%
17/01/07 11:36:19 INFO mapreduce.Job: Job job_1483788624194_0001 completed successfully
17/01/07 11:36:19 INFO mapreduce.Job: Counters: 51
File System Counters
FILE: Number of bytes read=2150439596
FILE: Number of bytes written=3226844105
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1077391722
HDFS: Number of bytes written=1077279676
HDFS: Number of read operations=44
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Killed map tasks=1
Killed reduce tasks=1
Launched map tasks=8
Launched reduce tasks=5
Data-local map tasks=8
Total time spent by all maps in occupied slots (ms)=109609
Total time spent by all reduces in occupied slots (ms)=94208
Total time spent by all map tasks (ms)=109609
Total time spent by all reduce tasks (ms)=94208
Total vcore-milliseconds taken by all map tasks=109609
Total vcore-milliseconds taken by all reduce tasks=94208
Total megabyte-milliseconds taken by all map tasks=112239616
Total megabyte-milliseconds taken by all reduce tasks=96468992
Map-Reduce Framework
Map input records=102090
Map output records=102090
Map output bytes=1074564572
Map output materialized bytes=1075138889
Input split bytes=984
Combine input records=0
Combine output records=0
Reduce input groups=102090
Reduce shuffle bytes=1075138889
Reduce input records=102090
Reduce output records=102090
Spilled Records=306270
Shuffled Maps =32
Failed Shuffles=0
Merged Map outputs=32
GC time elapsed (ms)=2715
CPU time spent (ms)=47890
Physical memory (bytes) snapshot=3176710144
Virtual memory (bytes) snapshot=25316593664
Total committed heap usage (bytes)=2129133568
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1077390738
File Output Format Counters
Bytes Written=1077279676
Job ended: Sat Jan 07 11:36:19 UTC 2017
The job took 51 seconds.
-----------问题WARN util.NativeCodeLoader---------
[root@amaster opt]# hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar randomwriter -D mapreduce.randomwriter.mapsperhost=1 -D mapreduce.randomwriter.totalbytes=1073741800 /user/ubuntu/sort/1G-input-1
17/01/04 12:22:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/01/04 12:22:05 INFO client.RMProxy: Connecting to ResourceManager at amaster/10.10.89.119:8032
Running 1 maps.
Job started: Wed Jan 04 12:22:06 UTC 2017
17/01/04 12:22:06 INFO client.RMProxy: Connecting to ResourceManager at amaster/10.10.89.119:8032
17/01/04 12:22:06 INFO mapreduce.JobSubmitter: number of splits:1
17/01/04 12:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1483530318384_0002
17/01/04 12:22:07 INFO impl.YarnClientImpl: Submitted application application_1483530318384_0002
17/01/04 12:22:07 INFO mapreduce.Job: The url to track the job: http://amaster:8088/proxy/application_1483530318384_0002/
17/01/04 12:22:07 INFO mapreduce.Job: Running job: job_1483530318384_0002
17/01/04 12:22:12 INFO mapreduce.Job: Job job_1483530318384_0002 running in uber mode : false
17/01/04 12:22:12 INFO mapreduce.Job: map 0% reduce 0%
17/01/04 12:22:25 INFO mapreduce.Job: Task Id : attempt_1483530318384_0002_m_000000_0, Status : FAILED
https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/NativeLibraries.html
http://blog.csdn.net/lalaguozhe/article/details/10580727
用官方的Hadoop 2.1.0-beta安装后,每次hadoop命令进去都会抛出这样一个Warning
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-Java classes where applicable
wrong ELFCLASS32,难道是加载的so文件系统版本不对
执行命令file libhadoop.so.1.0.0
libhadoop.so.1.0.0: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, not stripped
果然是80386,是32位的系统版本,而我的hadoop环境是64位OS
原来直接从apache镜像中下载的编译好的Hadoop版本native library都是32版本的,如果要支持64位版本,必须自己重新编译,这就有点坑爹了,要知道几乎所有的生产环境都是64位的OS
YARN官方对于native library的一段话验证了这一点
“The pre-built 32-bit i386-Linux native hadoop library is available as part of the hadoop distribution and is located in the lib/native directory?”
重新checkout source code
svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta/
加上编译native的选项,编译时会根据当前的操作系统架构来生产相应的native库
mvn package -Pdist,native -DskipTests -Dtar
再去native文件下查看所有的file type,已经都是64位版的了,替换线上文件,WARNING消除
[我编译时没有加native参数,所以编译完的tar包里面,没有lib/native文件夹TT。我直接把别人编译好的native库加入到我的里面,好像也可以。]
---------hadoop调试------------
一、日志调试
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler=DEBUG,TTOUT
log4j.appender.TTOUT =org.apache.log4j.FileAppender
log4j.appender.TTOUT.File=${hadoop.log.dir}/ResourceManager.log
log4j.appender.TTOUT.layout=org.apache.log4j.PatternLayout
log4j.appender.TTOUT.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
查看类的log级别(也可以通过网页设置查看,注意要用nodemanager的url)
[root@amaster /]# hadoop daemonlog -getlevel 10.10.89.120:8042 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler
Connecting to http://10.10.89.120:8042/logLevel?log=org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler
Submitted Log Name: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler
Log Class: org.apache.commons.logging.impl.Log4JLogger
Effective level: DEBUG
二、eclipse调试