第一章 准备
1.SSH互信配置
2.Zookeeper配置
3.hadoop配置
4.Flink配置
第二章操作
一、准备工作
1.设置master,slave1,slave2。编辑每台机器的/etc/hosts文件,添加如下配置:
192.168.100.141 master
192.168.100.142 slave1
192.168.100.143 slave2
2.添加master到slave1和slave2的ssh信任:
2.1 在master机器上执行:ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa;
2.2在master机器上执行:ssh-copy-id -i ~/.ssh/id_rsa.pub master、ssh-copy-id -i ~/.ssh/id_rsa.pub slave1、ssh-copy-id -i ~/.ssh/id_rsa.pub slave2。
二、zookeeper搭建
1.三台机器每台zookeeper-3.4.0下创建data文件夹;
2.在一步骤的data下创建myid,指定012;
3.zoo.cnf配置:
3.1dataDir=/perfma/zookeeper-3.4.0/data;
3.2文件底部添加一下几行,其中server.x的0和本机data文件夹下的myid相同;
server.0=192.168.100.141:2888:3888
server.1=192.168.100.142:2888:3888
server.2=192.168.100.143:2888:3888
4.将zk文件夹拷贝至另两台服务器,记得修改myid中的内容;
5.在每一台的zookeeper/bin/下执行zkServer.sh start启动每个节点的zk进程;
6.检查zk状态执行zkServer.sh status,看到状态即可。
三、hadoop搭建
1.修改~/.bash_profile添加 以下配置,并执行source ~/.bash_profile;
export HADOOP_HOME=${你的HAdoop安装路径}
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin;
2.在master机器上配置(hadoop-2.9.2/etc/hadoop)
2.1 hadoop-env.sh最底部添加:
export export JAVA_HOME=${你的JAVA_HOME路径};
2.2修改core-site.xml,添加如以下配置,并创建hadoop.tmp.dird的文件夹;
hadoop.tmp.dir
file:/home/admin/perfma/hadoop-2.9.2/hadoop/tmp/dir
A base for other temporary directories.
io.file.buffer.size
131072
fs.default.name
hdfs://master:9000
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.root.groups
*
2.3修改hdfs-site.xml,添加如以下配置,并创建dfs.namenode.name.dir和dfs.datanode.data.dir的文件夹;设置dfs.block.size为1048576;
dfs.replication
2
dfs.namenode.name.dir
file:/home/admin/perfma/hadoop-2.9.2/dfs/namenode/name/dir
true
dfs.datanode.data.dir
file:/home/admin/perfma/hadoop-2.9.2/dfs/datanode/name/dir
true
dfs.namenode.secondary.http-address
master:9001
dfs.webhdfs.enabled
true
dfs.permissions
false
dfs.block.size
1048576
2.4修改yarn-site.xml,添加如以下配置:
yarn.resourcemanager.address
master:18040
yarn.resourcemanager.scheduler.address
master:18030
yarn.resourcemanager.webapp.address
master:18088
yarn.resourcemanager.resource-tracker.address
master:18025
yarn.resourcemanager.admin.address
master:18141
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
2.5以mapred-site.xml.tmplate为模版复制一个mapred-site.xml,添加以下配置:
mapreduce.framework.name
yarn
2.6 修改slaves文件,添加slave1、slave2,如以下配置:
slave1
slave2
3.格式化namenode。
3.1在master节点上的hadoop-2.9.2/bin/下执行./hdfs namenode -format;
3.2 检查,看到有这句话即可: common.Storage: Storage directory /opt/hadoop/hdfs/name has been successfully formatted;
4.将master的hadoop文件夹拷贝到slave1和slave2的相同路径下;
5.启动,在hadoop-2.9.2/sbin/下执行sh start-all.sh;
6.检查。浏览器访问 master:50070,看到有两个datanode即可,如图:
四、flink搭建
1.准备。在hadoop环境创建 /flink、flink/check-point、flink/save-point、flink/ha;./hdfs dfs -mkdir /flink/ha;
2.配置。在flink-1.8.0/conf文件夹下修改flink-conf.yaml的以下配置:
jobmanager.rpc.address: master
high-availability: zookeeper
high-availability.storageDir: hdfs://master:9000/flink/ha/
high-availability.zookeeper.quorum: master:2181,slave1:2181,slave2:2181
state.checkpoints.dir: hdfs://master:9000/flink/check-point
state.savepoints.dir: hdfs://master:9000/flink/save-point
3.创建io.tmp.dirs设置的文件夹 /home/admin/perfma/flink/tmp;
4.配置slaves;
master
slave1
slave2
5.配置masters;
master:8084
slave1:8084
6.将flink安装包拷贝至slave1和slave2中;
7.启动。在flink-1.8.0/bin目录下使用./start-cluster.sh执行启动;
8.检查。用run-job.sh启动一个任务,然后kill master的flink进程号,观察在slave1的节点是否能继续接替任务。