1 介绍
为了解决Hadoop 1.x框架中的问题:例如单namenode节点问题等问题,Apache基金会推出新一代的hadoop框架,Hadoop 2.x系列版本,在该版本中,HDFS的一些机制进行了改善,并且Hadoop的MapReduce框架升级为YARY框架(MapReduce 2),并且实现了与spark等现在叫流行的大数据分析框架的集成。关于Hadoop 2.x系列,我们将会在后面详细讲解到
2 安装hadoop 2.6
因为hadoop的安装所需要的环境是相同的,和hadoop 1.2.1版本的安装环境是相同的,在这里笔者将安装前的准备工作进行了简化
(1)安装sshd服务,并且实现节点之间的免密码登录。
因为在hadoop 2.x中,将JobTrack的任务调度和资源管理两个任务进行了分离,分别分布在不同的节点上,所以需要在安装namenode服务的节点上和安装ResourceManager服务的节点上都实现和所有节点实现免密码登录。
(2)配置hosts文件
本集群只实现了四个节点,节点名称与IP地址如下:
192.168.149.129 hadoop1 192.168.149.130 hadoop2 192.168.149.131 hadoop3 192.168.149.132 hadoop4(3)安装Java1.7
安装Java1.7 已经在hadoop1.2.1安装过程中详细解释了。这里只是简单显示了Java的一些配置
[hadoop@hadoop1 etc]$ ls /opt/ apache-ant-1.9.5 apache-maven-3.3.3 jdk1.7.0_75 protobuf protobuf-2.5.0 rh [hadoop@hadoop1 etc]$ cat /etc/profile # /etc/profile # System wide environment and startup programs, for login setup # Functions and aliases go in /etc/bashrc # It's NOT a good idea to change this file unless you know what you # are doing. It's much better to create a custom.sh shell script in # /etc/profile.d/ to make custom changes to your environment, as this # will prevent the need for merging in future updates. pathmunge () { case ":${PATH}:" in *:"$1":*) ;; *) if [ "$2" = "after" ] ; then PATH=$PATH:$1 else PATH=$1:$PATH fi esac } if [ -x /usr/bin/id ]; then if [ -z "$EUID" ]; then # ksh workaround EUID=`id -u` UID=`id -ru` fi USER="`id -un`" LOGNAME=$USER MAIL="/var/spool/mail/$USER" fi # Path manipulation if [ "$EUID" = "0" ]; then pathmunge /sbin pathmunge /usr/sbin pathmunge /usr/local/sbin else pathmunge /usr/local/sbin after pathmunge /usr/sbin after pathmunge /sbin after fi HOSTNAME=`/bin/hostname 2>/dev/null` HISTSIZE=1000 if [ "$HISTCONTROL" = "ignorespace" ] ; then export HISTCONTROL=ignoreboth else export HISTCONTROL=ignoredups fi export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL # By default, we want umask to get set. This sets it for login shell # Current threshold for system reserved uid/gids is 200 # You could check uidgid reservation validity in # /usr/share/doc/setup-*/uidgid file if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; then umask 002 else umask 022 fi for i in /etc/profile.d/*.sh ; do if [ -r "$i" ]; then if [ "${-#*i}" != "$-" ]; then . "$i" else . "$i" >/dev/null 2>&1 fi fi done #Java Install export JAVA_HOME=/opt/jdk1.7.0_75 export CLASSPATH=/opt/jdk1.7.0_75/lib/tools.jar:.:/opt/jdk1.7.0_75/lib/dt.jar export PATH=$PATH:/opt/jdk1.7.0_75/jre/bin:/opt/jdk1.7.0_75/bin #hadoop-2.6.0 install export HADOOP_HOME=/home/hadoop/hadoop-2.6.0 export PATH=$PATH:/home/hadoop/hadoop-2.6.0/bin:/home/hadoop/hadoop-2.6.0/sbin #maven install export MAVEN_HOME=/opt/apache-maven-3.3.3 export PATH=$PATH:/opt/apache-maven-3.3.3/bin #ant install export ANT_HOME=/opt/apache-ant-1.9.5 export PATH=$PATH:/opt/apache-ant-1.9.5/bin #protobuf install export PATH=$PATH:/opt/protobuf/bin unset i unset -f pathmunge
(4)安装Hadoop 2.6
1)Hadoop 2.6下载
Hadoop 2.6的下载地址为:http://www.apache.org/dyn/closer.cgi/hadoop/common;从该页面中选取下载地址,从中下载相应的Hadoop 2.6的版本
2)在hadoop用户下进行解压,并放在hadoop的家目录下
[hadoop@hadoop1 sources]$ ls apache-ant-1.9.5-bin.tar.gz hadoop-2.6.0-src.tar.gz protobuf-2.5.0.tar.gz apache-maven-3.3.3-bin.tar.gz hadoop-2.6.0.tar.gz hadoop-2.6.0-src jdk-7u75-linux-x64.tar.gz [hadoop@hadoop1 sources]$ tar -zxf hadoop-2.6.0.tar.gz [hadoop@hadoop1 sources]$ ls apache-ant-1.9.5-bin.tar.gz hadoop-2.6.0-src jdk-7u75-linux-x64.tar.gz apache-maven-3.3.3-bin.tar.gz hadoop-2.6.0-src.tar.gz protobuf-2.5.0.tar.gz hadoop-2.6.0 hadoop-2.6.0.tar.gz [hadoop@hadoop1 sources]$ pwd /home/hadoop/sources [hadoop@hadoop1 sources]$ mv hadoop-2.6.0 ../
3)配置Hadoop 2.6的环境变量
hadoop的环境变量配置是hadoop安装的核心,所有的配置文件全部放在/home/hadoop/hadoop-2.6.0/etc/hadoop目录下
(A)对hadoop-env.sh和yarn-env.sh文件进行Java环境变量的配置
hadoop-env.sh
# The java implementation to use. export JAVA_HOME=/opt/jdk1.7.0_75
yarn-env.sh
# some Java parameters export JAVA_HOME=/opt/jdk1.7.0_75
<property> <name>fs.defaultFS</name> <value>hdfs://192.168.149.129:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property>fs.defaultFS属性:和hadoop1.2.1中的fs.default.name属性相同,制定hdfs的入口位置。
io.file.buffer.size属性:在文件读取过程中的缓存,该属性配置的越大,文件的读取速度越快,但是相应的所需要的内存就会增加。设置一般为文件系统页面的大小(4K)的倍数
core-site.xml文件配置内容详细参考:http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/core-default.xml
(C)hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///home/hadoop/hadoop-2.6.0/data/hdfs/namenode</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.149.129:50090</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///home/hadoop/hadoop-2.6.0/data/hdfs/datanode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
dfs.namenode.name.dir属性:该属性表示hadoop集群中namenode节点上的文件元数据、系统文件树镜像和edits文件存放的位置。
dir.namenode.secondary.http-address属性:表示secondary节点的访问入口。
dfs.datanode.data.dir属性:表示在datanode节点中数据块(Block)所存放的位置
dfs.replication属性:hadoop集群中文件冗余的份数
hdfs-site.xml配置详细参考:http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
(C)mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>mapreduce.framework.name属性:表示Mapreduce处理方案使用的YARN框架,默认情况下为local
mapred-site.xml配置详细参考:http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
(D)yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
yarn.nodemanager.aux-services属性:本来该属性的默认值为mapreduce.shuffle,如果从hadoop 2.2以后这样的写法将无法启动集群,只有改成mapreduce_shuffle集群才能正常的启动。
yarn-site.xml配置详细参考:http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
(5)slaves
[hadoop@hadoop1 hadoop]$ cat slaves 192.168.149.131 192.168.149.132该文件是datanode节点的IP地址。
当配置完配置环境中,然后将hadoop-2.6.0文件传到每一个节点中
scp -r hadoop-2.6.0/ hadoop@hadoop2:/home/hadoop/
在格式化之前,要求需要将四个节点上的所有防火墙和selinux全部关闭,默认情况下都是关闭的,如果以防万一可以切换到root用户下,通过 chkconfig iptables off 命令关闭节点上的防火墙。然后进入到任何一个节点中的hadoop-2.6.0文件夹,执行下面的命令:
./bin/hadoop namenode -format(6)启动hadoop2.6集群
为了更好的使用hadoop集群,可以讲hadoop的命令加载到环境变量PATH中:
[hadoop@hadoop1 ~]$ vim /etc/profile #hadoop-2.6.0 install export HADOOP_HOME=/home/hadoop/hadoop-2.6.0 export PATH=$PATH:/home/hadoop/hadoop-2.6.0/bin:/home/hadoop/hadoop-2.6.0/sbin然后就是启动我们的 hadoop集群了:
首先我们要进入到ResourceManager节点中,启动资源管理程序:
[hadoop@hadoop2 ~]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-hadoop2.out 192.168.149.132: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0/logs/yarn-hadoop-nodemanager-hadoop4.out 192.168.149.131: starting nodemanager, logging to /home/hadoop/hadoop-2.6.0/logs/yarn-hadoop-nodemanager-hadoop3.out [hadoop@hadoop2 ~]$ jps 27413 Jps
然后进入namenode节点,启动所有的进程:
[hadoop@hadoop1 ~]$ start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh 15/06/17 08:30:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [hadoop1] hadoop1: starting namenode, logging to /home/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-namenode-hadoop1.out 192.168.149.132: starting datanode, logging to /home/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-datanode-hadoop4.out 192.168.149.131: starting datanode, logging to /home/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-datanode-hadoop3.out Starting secondary namenodes [hadoop1] hadoop1: starting secondarynamenode, logging to /home/hadoop/hadoop-2.6.0/logs/hadoop-hadoop-secondarynamenode-hadoop1.out 15/06/17 08:31:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.6.0/logs/yarn-hadoop-resourcemanager-hadoop1.out 192.168.149.132: nodemanager running as process 24754. Stop it first. 192.168.149.131: nodemanager running as process 27169. Stop it first. [hadoop@hadoop1 ~]$ jps 8922 NameNode 9242 ResourceManager 9498 Jps 9080 SecondaryNameNode
datanode节点的信息:
[hadoop@hadoop3 ~]$ jps 27460 Jps 27169 NodeManager 27329 DataNode [hadoop@hadoop3 ~]$
(6)总结
hadoop2.x系列针对hadoop1.x系列的缺点做出了很大的改进,在HDFS和MapReduce框架中都做出了很大的改变,并且实现了和现主流大数据框架spark等的集合。
(7)修正
在hadoop2.x系列中,要求ResourceManager进程单独分布在一个节点上,所以在start-yarn.sh后,在namenode节点启动命令不是start-all.sh,因为start-all.sh会在namenode节点上也启动一个ResourceManager进程,这里应该使用的是start-dfs.sh,这样启动namenode节点和datanode节点,并且不会再namenode节点中启动ResourceManager进程。
参考文章:http://www.bkjia.com/yjs/931164.html