hadoop集群
首先关闭 selinux,
vim /etc/selinux/config SELINUX=disabled
防火墙
systemctl stop firewalld systemctl disable firewalld
1.master和slave机都修改/etc/hostname
添加
192.168.1.129 hadoop1 192.168.1.130 hadoop2 192.168.1.132 hadoop3
2.免密码登录
master主机(hadoop1)
切换到/root/.ssh
ssh-keygen -t rsa
一直按回车
生成 id_rsa 和id_rsa.pub
cat id_rsa.pub >> master
将公钥保存到master,发送到slave机器
scp master hadoop2:/root/.ssh/
登录slave(hadoop2,hadoop3)
将master追加到authorized_keys
cat master>>authorized_keys
slave机同
3.配置
解压hadoop-2.6.0.tar.gz到/usr/lib/目录下
tar -zxvf hadoop-2.6.0.tar.gz -C /usr/lib/ cd /usr/lib/hadoop-2.6.0/etc/hadoop
配置文件
4.安装zookeeper
配置环境变量
export JAVA_HOME=/usr/lib/jdk1.7.0_79 export MAVEN_HOME=/usr/lib/apache-maven-3.3.3 export LD_LIBRARY_PATH=/usr/lib/protobuf export ANT_HOME=/usr/lib/apache-ant-1.9.4 export ZOOKEEPER_HOME=/usr/lib/zookeeper-3.4.6 export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$LD_LIBRARY_PATH/bin:$ANT_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$ZOOKERPER_HOME/lib
4.1配置zookeeper/conf/ 将zoo_sample.cfg复制为zoo.cfg
cp zoo_sample.cfg zoo.cfg
修改
dataDir=/usr/lib/zookeeper-3.4.6/datas
增加
server.1=hadoop1:2888:3888 server.2=hadoop2:2888:3888 server.3=hadoop3:2888:3888
创建/usr/lib/zookeeper-3.4.6/datas并创建myid在myid中写入对应的数字
将zookeeper-3.4.6 拷贝到hadoop2 和hadoop3以及/etc/profile
运行
hadoop1,hadoop2,hadoop3上执行
zkServer.sh start
查看状态
zkServer.sh status
有Mode: leader,Mode: follower等说明运行正常
5.安装hadoop
在master(hadoop1)上执行
将前面编译的hadoop-2.6.0.tar.gz 解压到/usr/lib/
配置环境变量
export JAVA_HOME=/usr/lib/jdk1.7.0_79 export MAVEN_HOME=/usr/lib/apache-maven-3.3.3 export LD_LIBRARY_PATH=/usr/lib/protobuf export ANT_HOME=/usr/lib/apache-ant-1.9.4 export ZOOKEEPER_HOME=/usr/lib/zookeeper-3.4.6 export HADOOP_HOME=/usr/lib/hadoop-2.6.0 export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$LD_LIBRARY_PATH/bin:$ANT_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
(hadoop2,hadoop3中可以没有maven等这些是编译hadoop时候配置的)
5.1修改配置文件
cd hadoop-2.6.0/etc/hadoop
配置文件(hadoop-env.sh、core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、slaves)
5.1.1 hadoop-env.sh
export JAVA_HOME=/usr/lib/jdk1.7.0_79
5.1.2 core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://cluster1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/lib/hadoop-2.6.0/tmp</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> </property>
5.1.3 hdfs-site.xml
<property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.nameservices</name> <value>cluster1</value> </property> <property> <name>dfs.ha.namenodes.cluster1</name> <value>hadoop101,hadoop102</value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.hadoop101</name> <value>hadoop1:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.hadoop101</name> <value>hadoop1:50070</value> </property> <property> <name>dfs.namenode.rpc-address.cluster1.hadoop102</name> <value>hadoop2:9000</value> </property> <property> <name>dfs.namenode.http-address.cluster1.hadoop102</name> <value>hadoop2:50070</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.cluster1</name> <value>true</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop2:8485;hadoop3:8485/cluster1</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/usr/lib/hadoop-2.6.0/tmp/journal</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>
5.1.4 yarn-site.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
2.1.5 mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
2.1.6 slaves
hadoop2 hadoop3
6.集群启动:
6.1格式化zookeeper集群
hadoop1中执行
bin/hdfs zkfc -formatZK
6.2启动journalnode集群,在hadoop2和hadoop3当中执行
sbin/hadoop-daemon.sh start journalnode
6.3格式化namenode,启动namenode
在hadoop1当中执行
bin/hdfs namenode -format sbin/hadoop-daemon.sh start namenode
在hadoop2上执行
bin/hdfs namenode -bootstrapStandby sbin/hadoop-daemon.sh start namenode
启动datanode 直接在hadoop1中执行
sbin/hadoop-daemons.sh start datanode
启动zkfc,哪里有namenode就在哪里启动这个进程
在hadoop1和hadoop2中执行
sbin/hadoop-daemon.sh start zkfc
启动yarn 和resourcemanager,在hadoop1中执行
sbin/start-yarn.sh start resourcemanager
在浏览器输入
http://192.168.1.129:50070
Overview 'hadoop1:9000' (active)
http://192.168.1.130:50070/
Overview 'hadoop2:9000' (standby)
hadoop -fs ls /
查看hadoop目录