我们需要5台虚拟机或者5台服务器,它们的网络需要互通,并且需要配置hosts和各主机间免密登录等操作,以及相应脚本文件,具体详情请查阅以下博客1-5节还有一些相应脚本编写的章节:
准备好这些之后我们就可以开始了
cd /opt/install/
[root@jerry1 install]# ls
hadoop-2.6.0-cdh5.14.2.tar.gz kafka_2.11-2.0.0.tgz
jdk-8u111-linux-x64.tar.gz zookeeper-3.4.5-cdh5.14.2.tar.gz
[root@jerry1 install]# tar -zxf jdk-8u111-linux-x64.tar.gz -C /opt/bigdata/
[root@jerry1 install]# cd /opt/bigdata/
[root@jerry1 bigdata]# mv jdk1.8.0_111/ jdk180
xrsync jdk180
1.创建配置文件
cd /etc/profile.d
touch env.sh
vi env.sh
2.加入jdk配置
export JAVA_HOME=/opt/bigdata/jdk180
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
3.给集群中所有其他主机传输一下env.sh配置文件
xrsync env.sh
4.给集群中所有主机source一下配置文件,使它生效,所有主机的jdk就都安装完成了。
[root@jerry1 install]# tar -zxf hadoop-2.6.0-cdh5.14.2.tar.gz -C /opt/bigdata/
[root@jerry1 install]# mv /opt/bigdata/hadoop-2.6.0-cdh5.14.2/ /opt/bigdata/hadoop260
进入hadoop的etc文件夹中hadoop文件夹中,对一下几个文件进行修改
vi hadoop-env.sh
vi mapred-env.sh
vi yarn-env.sh
配置 core-site.xml ,配置了namenode所在节点,在configuration里面添加新内容。
vi core-site.xml
fs.defaultFS</name>
hdfs://mycluster</value>
</property>
hadoop.tmp.dir</name>
/opt/bigdata/hadoop260/hadoopdata</value>
</property>
ha.zookeeper.quorum</name>
jerry1:2181,jerry2:2181,jerry3:2181,jerry4:2181,jerry5:2181</value>
</property>
</configuration>
配置了hdfs-site.xml,配置了双namenode节点以及失败的切换。
vi hdfs-site.xml
dfs.replication</name>
1</value>
</property>
dfs.nameservices</name>
mycluster</value>
</property>
dfs.ha.namenodes.mycluster</name>
nn1,nn2</value>
</property>
dfs.namenode.rpc-address.mycluster.nn1</name>
jerry1:8020</value>
</property>
dfs.namenode.rpc-address.mycluster.nn2</name>
jerry2:8020</value>
</property>
dfs.namenode.http-address.mycluster.nn1</name>
jerry1:50070</value>
</property>
dfs.namenode.http-address.mycluster.nn2</name>
jerry2:50070</value>
</property>
dfs.namenode.shared.edits.dir</name>
qjournal://jerry3:8485;jerry4:8485;jerry5:8485/mycluster</value>
</property>
dfs.client.failover.proxy.provider.mycluster</name>
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
dfs.ha.fencing.methods</name>
sshfence</value>
</property>
dfs.ha.fencing.ssh.private-key-files</name>
/root/.ssh/id_rsa</value>
</property>
dfs.journalnode.edits.dir</name>
/opt/bigdata/hadoop260/ha/journalnode/edits</value>
</property>
dfs.ha.automatic-failover.enabled </name>
true</value>
</property>
dfs.permissions</name>
false</value>
</property>
配置了mapred-site.xml,配置了MapReduce任务监控节点
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
mapreduce.framework.name</name>
yarn</value>
</property>
mapreduce.jobhistory.address</name>
jerry1:10020</value>
</property>
mapreduce.jobhistory.webapp.address</name>
jerry1:19888</value>
</property>
</configuration>
配置yarn-site.xml,配置了双ResourceManager节点以及失败切换
vi yarn-site.xml
<!-- Site specific YARN configuration properties -->
<!-- reducer获取数据方式 -->
yarn.nodemanager.aux-services</name>
mapreduce_shuffle</value>
</property>
yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- 指定YARN的ResourceManager的地址 -->
yarn.resourcemanager.ha.enabled</name>
true</value>
</property>
yarn.resourcemanager.cluster-id</name>
cluster1</value>
</property>
yarn.resourcemanager.ha.rm-ids</name>
rm1,rm2</value>
</property>
yarn.resourcemanager.hostname.rm1</name>
jerry3</value>
</property>
yarn.resourcemanager.hostname.rm2</name>
jerry4</value>
</property>
yarn.resourcemanager.webapp.address.rm1</name>
jerry3:8088</value>
</property>
yarn.resourcemanager.webapp.address.rm2</name>
jerry4:8088</value>
</property>
yarn.resourcemanager.recovery.enabled</name>
true</value>
</property>
yarn.resourcemanager.zk-address</name>
jerry1:2181,jerry2:2181,jerry3:2181,jerry4:2181,jerry5:2181</value>
</property>
<!-- 日志聚集功能使用 -->
yarn.log-aggregation-enable</name>
true</value>
</property>
<!-- 日志保留时间设置7天 -->
yarn.log-aggregation.retain-seconds</name>
604800</value>
</property>
</configuration>
配置slaves,这个配置了datanode所在节点
vi slaves
jerry3
jerry4
jerry5
vi /etc/profile.d/env.sh
export HADOOP_HOME=/opt/bigdata/hadoop260
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
xrsync /etc/profile.d/env.sh
source /etc/profile.d/env.sh
mkdir hadoopdata
mkdir -p ha/journalnode/edits
xrsync /opt/bigdata/hadoop260
tar -zxf zookeeper-3.4.5-cdh5.14.2.tar.gz -C /opt/bigdata/
cd /opt/bigdata
mv zookeeper-3.4.5-cdh5.14.2/ zk345
mkdir zkData
cd ./zkData
touch myid
vi myid
1
cd /conf
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
server.1=jerry1.2287:3387
server.2=jerry2.2287:3387
server.3=jerry3.2287:3387
server.4=jerry4.2287:3387
server.5=jerry5.2287:3387
vi myid
jerry2: 2
jerry3: 3
jerry4: 4
jerry5: 5
vi /etc/profile.d/env.sh
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
xrsync /etc/profile.d/env.sh
source /etc/profile.d/env.sh
使用之前博客写过的脚本文件
zkop.sh start
jerry3: hadoop-daemon.sh start journalnode
jerry4: hadoop-daemon.sh start journalnode
jerry5: hadoop-daemon.sh start journalnode
hdfs zkfc -formatZK
hadoop namenode -format
jerry1: start-all.sh
由于jerry3,jerry4配置了ResourceManager节点,去jerry3,jerry4:
jerry3: yarn-daemon.sh start resourcemanager
jerry4: yarn-daemon.sh start resourcemanager
tar -zxf /opt/install/kafka_2.11-2.0.0.tgz -C /opt/bigdata/
mv /opt/bigdata/kafka_2.11-2.0.0/ /opt/bigdata/kafka211
mkdir logs
broker.id=1
delete.topic.enable=true
log.dirs=/opt/bigdata/kafka211/logs
xrsync /etc/profile.d/env.sh
xrsync kafka211/
source /etc/profile.d/env.sh #source*5
kfkop.sh start