主机名 | ip | soft | 运行的进程 |
---|---|---|---|
master | 192.168.1.115 | jdk,hadoop | NameNode,DFSZKFailoverController(zkfc) |
slave1 | 192.168.1.116 | jdk,hadoop | NameNode,DFSZKFailoverController(zkfc) |
slave2 | 192.168.1.117 | jdk,hadoop | ResourceManager |
slave3 | 192.168.1.118 | jdk,hadoop | ResourceManager |
slave4 | 192.168.1.119 | jdk,hadoop,zookeeper | DataNode,NodeManager、JournalNode、QuorumPeerMain |
slave5 | 192.168.1.120 | jdk、hadoop、zookeeper | DataNode、NodeManager、JournalNode、QuorumPeerMain |
slave6 | 192.168.1.121 | jdk、hadoop、zookeeper | DataNode、NodeManager、JournalNode、QuorumPeerMain |
创建数据目录,方便起见,在所有的节点都执行如下命令【有些节点因为角色不一样,有些目录是无用】
mkdir -p /home/qun/data/hadoop-2.8/name
mkdir -p /home/qun/data/hadoop-2.8/data
mkdir -p /home/qun/data/hadoop-2.8/tmp
mkdir -p /home/qun/data/hadoop-2.8/namesecondary
mkdir -p /home/qun/data/hadoop-2.8/journal
修改slave4,slave5,slave6 $ZooKeeper_HOME/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
#maxClientCnxns=60
#autopurge.snapRetainCount=3
#autopurge.purgeInterval=1
server.1=slave4:2888:3888
server.2=slave5:2888:3888
server.3=slave6:2888:3888
分别在slave4,slave5,slave6上设置zookeeper id
#slave4
echo 1 > /tmp/zookeeper/myid
#slave5
echo 2 > /tmp/zookeeper/myid
#slave6
echo 3 > /tmp/zookeeper/myid
启动zk,并且查看状态
./bin/zkServer.sh start conf/zoo.cfg &
[qun@slave5 zookeeper-3.4.6]$ ./bin/zkServer.sh status
JMX enabled by default
Using config: /home/qun/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
core-site.xml
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://ns1value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/home/qun/data/hadoop-2.8/tmpvalue>
property>
<property>
<name>fs.checkpoint.periodname>
<value>3600value>
property>
<property>
<name>fs.checkpoint.sizename>
<value>67108864value>
property>
<property>
<name>fs.checkpoint.dirname>
<value>/home/qun/data/hadoop-2.8/namesecondaryvalue>
property>
<property>
<name>ha.zookeeper.quorumname>
<value>slave4:2181,slave5:2181,slave6:2181value>
property>
configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replicationname>
<value>2value>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/home/qun/data/hadoop-2.8/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/home/qun/data/hadoop-2.8/datavalue>
property>
<property>
<name>dfs.nameservicesname>
<value>ns1value>
property>
<property>
<name>dfs.ha.namenodes.ns1name>
<value>nn1,nn2value>
property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1name>
<value>master:9000value>
property>
<property>
<name>dfs.namenode.http-address.ns1.nn1name>
<value>master:50070value>
property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2name>
<value>slave1:9000value>
property>
<property>
<name>dfs.namenode.http-address.ns1.nn2name>
<value>slave1:50070value>
property>
<property>
<name>dfs.namenode.shared.edits.dirname>
<value>qjournal://slave4:8485;slave5:8485;slave6:8485/ns1value>
property>
<property>
<name>dfs.journalnode.edits.dirname>
<value>/home/qun/data/hadoop-2.8/journalvalue>
property>
<property>
<name>dfs.ha.automatic-failover.enabledname>
<value>truevalue>
property>
<property>
<name>dfs.client.failover.proxy.provider.ns1name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
property>
<property>
<name>dfs.ha.fencing.methodsname>
<value>
sshfence
shell(/bin/true)
value>
property>
<property>
<name>dfs.ha.fencing.ssh.private-key-filesname>
<value>/home/qun/.ssh/id_rsavalue>
property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeoutname>
<value>30000value>
property>
configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabledname>
<value>truevalue>
property>
<property>
<name>yarn.resourcemanager.cluster-idname>
<value>yrcvalue>
property>
<property>
<name>yarn.resourcemanager.ha.rm-idsname>
<value>rm1,rm2value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm1name>
<value>slave2value>
property>
<property>
<name>yarn.resourcemanager.hostname.rm2name>
<value>slave3value>
property>
<property>
<name>yarn.resourcemanager.zk-addressname>
<value>slave4:2181,slave5:2181,slave6:2181value>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
configuration>
slaves
slave4
slave5
slave6
首先配置namenode的两个节点需要能够互相免秘钥登录,并且这两个节点还需要到datanode的免秘钥登录
在master上执行,生成秘钥,一直按回车即可
ssh-keygen -t rsa
将公钥拷贝到其他节点,包括自己
ssh-copy-id -i qun@master
ssh-copy-id -i qun@slave1
ssh-copy-id -i qun@slave2
ssh-copy-id -i qun@slave3
ssh-copy-id -i qun@slave4
ssh-copy-id -i qun@slave5
ssh-copy-id -i qun@slave6
在slave1上执行,生成秘钥,一直按回车即可
ssh-keygen -t rsa
将公钥拷贝到其他节点,包括自己
ssh-copy-id -i qun@slave1
ssh-copy-id -i qun@master
ssh-copy-id -i qun@slave4
ssh-copy-id -i qun@slave5
ssh-copy-id -i qun@slave6
hdfs zkfc -formatZK
在slave4,slave5,slave6节点启动journalnode
hadoop-daemon.sh start journalnode
hdfs namenode -format
hadoop-daemon.sh start namenode
hdfs namenode -bootstrapStandby
hadoop-daemon.sh start namenode
启动所有datanode,在master节点上执行【之前有些进程是已经启动过的,例如namenode,journalnode】
start-dfs.sh
[qun@master hadoop]$ hdfs haadmin -getServiceState nn1
active
[qun@master hadoop]$ hdfs haadmin -getServiceState nn2
standby
启动yarn,在slave2上执行
start-yarn.sh
在slave3上启动resourceManager【备用RM】
yarn-daemon.sh start resourcemanager
[qun@slave3 hadoop]$ yarn rmadmin -getServiceState rm1
active
[qun@slave3 hadoop]$ yarn rmadmin -getServiceState rm2
standby
测试hdfs,先上传一个文件,然后干掉master上的namenode进程【active】,再去访问hdfs,此时slave1上的namenode变成了active,再次启动master上的namenode,此时master上的namenode为standby状态
上传文件
[qun@master hadoop]$ hadoop fs -put core-site.xml /
[qun@master hadoop]$ hadoop fs -ls /
Found 1 items
-rw-r--r-- 2 qun supergroup 1348 2018-08-06 22:00 /core-site.xml
[qun@master hadoop]$ hadoop fs -ls hdfs://ns1/
Found 1 items
-rw-r--r-- 2 qun supergroup 1348 2018-08-06 22:00 hdfs://ns1/core-site.xml
kill master上的namenode
[qun@master hadoop]$ jps
5507 Jps
4663 NameNode
5149 DFSZKFailoverController
You have new mail in /var/spool/mail/qun
[qun@master hadoop]$ kill -9 4663
再次访问hdfs
[qun@master hadoop]$ hadoop fs -ls /
18/08/06 22:06:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/06 22:06:17 WARN ipc.Client: Failed to connect to server: master/192.168.1.115:9000: try once and fail.
java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1717)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1434)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1434)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
at org.apache.hadoop.fs.Globber.doGlob(Globber.java:282)
at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1686)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:378)
Found 1 items
-rw-r--r-- 2 qun supergroup 1348 2018-08-06 22:00 /core-site.xml
我们可以看到他默认会先去访问master上的namenode,失败了,然后再去访问slave1上的namenode,这个有点费解,但总之访问是成功了,这点不知道如何去优化?有知道,请告知,多谢!!!
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /input /output2
因为ResourceManager也是做了高可用的,所以可以将其中一个kill掉,然后再去测试下Wordcount程序是否能成功执行,这里就不去赘述了
参考链接: