Hadoop集群的安装
在master1节点解压hadoop安装包
tar zxvf ./hadoop-2.6.5.tar.gz
修改hadoop-env.sh文件
cd /usr/local/hadoop-2.6.5/etc/hadoop
sudo nano hadoop-env.sh
添加JAVA_HOME
export JAVA_HOME=/usr/local/jdk1.7.0_80
配置core-site.xml文件
sudo nano core-site.xml
添加以下内容:
fs.defaultFS
hdfs://mycluster
hadoop.tmp.dir
/usr/local/hadoop-2.6.5/data/tmp
fs.trash.interval
1440
ha.zookeeper.quorum
slave1:2181,slave2:2181,slave3:2181
配置hdfs-site.xml文件
sudo nano hdfs-site.xml
添加以下内容:
dfs.namenode.name.dir
/usr/local/hadoop-2.6.5/data/namenode
dfs.datanode.data.dir
/usr/local/hadoop-2.6.5/data/datanode
dfs.replication
3
dfs.permissions.enabled
false
dfs.webhdfs.enabled
true
dfs.nameservices
mycluster
dfs.ha.namenodes.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
master1:8020
dfs.namenode.rpc-address.mycluster.nn2
master2:8020
dfs.namenode.http-address.mycluster.nn1
master1:50070
dfs.namenode.http-address.mycluster.nn2
master2:50070
dfs.namenode.shared.edits.dir
qjournal://slave1:8485;slave2:8485;slave3:8485/mycluster
dfs.journalnode.edits.dir
/usr/local/hadoop-2.6.5/data/journal
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/home/hadoop-sna/.ssh/id_rsa
dfs.ha.automatic-failover.enabled
true
配置mapred-site.xml文件
sudo nano mapred-site.xml
添加以下内容:
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
master1:10020
mapreduce.jobhistory.webapp.address
master1:19888
mapreduce.job.ubertask.enable
true
mapreduce.job.ubertask.maxmaps
9
mapreduce.job.ubertask.maxreduces
1
配置yarn-site.xml文件
sudo nano yarn-site.xml
添加以下内容:
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.web-proxy.address
master2:8888
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
604800
yarn.nodemanager.remote-app-log-dir
/logs
yarn.nodemanager.resource.memory-mb
2048
yarn.nodemanager.resource.cpu-vcores
2
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.ha.automatic-failover.enabled
true
yarn.resourcemanager.cluster-id
yarncluster
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
master1
yarn.resourcemanager.hostname.rm2
master2
yarn.resourcemanager.webapp.address.rm1
master1:8088
yarn.resourcemanager.webapp.address.rm2
master2:8088
yarn.resourcemanager.zk-address
slave1:2181,slave2:2181,slave3:2181
yarn.resourcemanager.zk-state-store.parent-path
/rmstore
yarn.resourcemanager.recovery.enabled
true
yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
yarn.nodemanager.recovery.enabled
true
yarn.nodemanager.address
0.0.0.0:45454
配置slaves文件
sudo nano slaves
添加以下内容:
slave1
slave2
slave3
创建配置文件中涉及的目录
cd /usr/local/hadoop-2.6.5/
mkdir-p data/tmp
mkdir-p data/journal
mkdir-p data/namenode
mkdir-p data/datanode
将hadoop工作目录同步到集群其它节点
scp -r /usr/local/hadoop-2.6.5/ master2:/usr/local/
scp -r /usr/local/hadoop-2.6.5/ slave1:/usr/local/
scp -r /usr/local/hadoop-2.6.5/ slave2:/usr/local/
scp -r /usr/local/hadoop-2.6.5/ slave3:/usr/local/
在集群各节点上修改环境变量
sudo nano /etc/profile
添加以下几行:
export HADOOP_HOME=/usr/local/hadoop-2.6.5
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
使修改的环境变量生效
source /etc/profile
方便起见,还是将hadoop-2.6.5的目录权限改一下比较好
sudo chown -R hadoop-sna hadoop-2.6.5
sudo chgrp -R hadoop-sna hadoop-2.6.5
这样,hadoop集群的安装就完成了,接下来是启动集群服务。
Hadoop集群的初始化
启动zookeeper集群(分别在slave1、slave2和slave3上执行)
zkServer.sh start
格式化ZKFC(在master1上执行)
hdfs zkfc -formatZK
启动journalnode(分别在slave1、slave2和slave3上执行)
hadoop-daemon.sh start journalnode
格式化HDFS(在master1上执行)
hdfs namenode -format
将格式化后master1节点hadoop工作目录中的元数据目录复制到master2节点
scp -r /usr/local/hadoop-2.6.5/data/namenode/* master2:/usr/local/hadoop-2.6.5/data/namenode/
初始化完毕后可关闭journalnode(分别在slave1、slave2和slave3上执行)
hadoop-daemon.sh stop journalnode
遇到的问题:
- hdfs namenode –format的时候出现connection refused
原因:连接datanode的时候出现拒绝连接的现象,原因是journalnode没有启动,可以查看datanode上hadoop文件夹里面的logs里面的hadoop-hadoop-journalnode-data.out,里面有写journalnode启动失败的原因,我的原因是没有写入权限
解决方法:sudo chown –R hadoop-sna hadoop-2.6.5 sudo chgrp –R hadoop-sna hadoop-2.6.5