5、hadoop集群安装(对hdfs及resourcemanager做了HA)
备注:此处仅仅是列出了hadoop集群配置文件的配置,对于hadoop 环境变量的配置没有列出,可以再/etc/profile文件下自己配置
5.1、core-site.xml配置文件
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoopCluster</value> </property>
<property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property>
<property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property> |
5.2、Hadoop-env.xml配置文件
export JAVA_HOME=/home/hadoop/cluster/jdk1.7.0_67 |
5.3、HDFS-SITE.XML配置文件
<configuration>
<property> <name>dfs.nameservices</name> <value>hadoopCluster</value> </property>
<property> <name>dfs.ha.namenodes.hadoopCluster</name> <value>master,masterHA</value> </property>
<property> <name>dfs.namenode.rpc-address.hadoopCluster.master</name> <value>master:8020</value> </property>
<property> <name>dfs.namenode.rpc-address.hadoopCluster.masterHA</name> <value>masterHA:8020</value> </property>
<property> <name>dfs.namenode.http-address.hadoopCluster.master</name> <value>master:50070</value> </property>
<property> <name>dfs.namenode.http-address.hadoopCluster.masterHA</name> <value>masterHA:50070</value> </property>
<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://master:8485;masterHA:8485;node1:8485/hadoopCluster</value> </property>
<property> <name>dfs.journalnode.edits.dir</name> <value>/mnt/hgfs/datadir/</value> </property>
<property> <name>dfs.client.failover.proxy.provider.hadoopCluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>
<property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property>
<property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property>
<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>
<property> <name>ha.zookeeper.quorum</name> <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
</property>
<property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/hadoop230chd501/tmp</value> </property>
<property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/hadoop230chd501/data</value> </property>
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
<property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property>
</configuration> |
5.4、mapred-site.xml配置文件
<configuration>
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
</configuration> |
5.5、Yarn-site.xml配置文件
<configuration>
<!-- Site specific YARN configuration properties -->
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> -->
<property> <name>yarn.resourcemanager.connect.retry-interval.ms</name> <value>2000</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property>
<property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property>
<property> <name>yarn.resourcemanager.cluster-id</name> <value>yarn-cluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>master,masterHA</value> </property>
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name> <value>5000</value> </property>
<property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value> </property> <property> <name>yarn.resourcemanager.zk.state-store.address</name> <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value> </property> <property> <name>yarn.resourcemanager.address.master</name> <value>master:23140</value> </property> <property> <name>yarn.resourcemanager.address.masterHA</name> <value>masterHA:23140</value> </property>
<property> <name>yarn.resourcemanager.scheduler.address.master</name> <value>master:23130</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.masterHA</name> <value>masterHA:23130</value> </property>
<property> <name>yarn.resourcemanager.admin.address.master</name> <value>master:23141</value> </property> <property> <name>yarn.resourcemanager.admin.address.masterHA</name> <value>masterHA:23141</value> </property>
<property> <name>yarn.resourcemanager.resource-tracker.address.master</name> <value>master:23125</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.masterHA</name> <value>masterHA:23125</value> </property>
<property> <name>yarn.resourcemanager.webapp.address.master</name> <value>master:23188</value> </property> <property> <name>yarn.resourcemanager.webapp.address.masterHA</name> <value>masterHA:23188</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.master</name> <value>master:23189</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.masterHA</name> <value>masterHA:23189</value> </property>
<property> <name>yarn.log.aggregation.enable</name> <value>true</value> </property>
<property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
</configuration> |
5.6、slaver.xml
node1 node2 node3 |
5.7 hadoop 启动
初始化操作: hdfs zkfc -formatZK 启动journalnode:奇数台 hadoop-daemon.sh start journalnode active master 切换到hdfs用户下对NameNode格式化: hadoop namenode -format 初始化Shared Edits directory hdfs namenode -initializeSharedEdits 启动NameNode: hadoop-daemon.sh start namenode standby master: hdfs namenode -bootstrapStandby hadoop-daemon.sh start namenode 启动DataNode: hadoop-daemon.sh start datanode 启动ZKFC: hadoop-daemon.sh start zkfc (两台master上面都要启动) YARN 开启 Active:./start-yarn.sh Standby: ./yarn-daemon.sh start resourcemanager
集群开启 ./start-all.sh masterHA机器 ./yarn-daemon.sh start resourcemanager |
5.8、增加datanode节点
将新增加的datanode hostname 加入到slaves中 ./hadoop-daemon.sh start datanode即可 最后在集群上balance下就好 |
5.9、删除datanode节点
环境为Hadoop 2.2 with namenode ha,3台dn。 |