主机名 | ip |
---|---|
bd-centos01 | 192.168.159.101 |
bd-centos02 | 192.168.159.102 |
bd-centos03 | 192.168.159.103 |
# vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改内容如下
TYPE=Ethernet
BOOTPROTO=static
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPADDR=192.168.159.101
PREFIX=24
GATEWAY=192.168.159.2
DNS1=192.168.159.2
NAME=ens33
DEVICE=ens33
ONBOOT=yes
# vi /etc/hostname
bd-centos01
# vi /etc/hosts
添加如下内容
192.168.159.101 bd-centos01
192.168.159.102 bd-centos02
192.168.159.103 bd-centos03
# useradd hadoop
# passwd hadoop
3次回车
$ ssh-keygen -t rsa
拷贝公钥,到所有节点的授权列表
$ ssh-copy-id bd-centos01
$ ssh-copy-id bd-centos02
$ ssh-copy-id bd-centos03
基于sync,写一个简单同步工具,将一台服务器的文件同步到其他节点上。
hadoop用户 在~/bin/
下,创建xsync文件,会自动加到用户环境变量中。添加下面脚本:
#!/bin/bash
#定义集群节点
nodes=(bd-centos01 bd-centos02 bd-centos03)
#需要同步的目录
paths=$@
if [ $# -lt 1 ]
then
echo input the paths to be synchronized
exit;
fi
#遍历节点
for host in ${nodes[@]}
do
echo ================ $host ===================
# 遍历录入的路径
for path in $paths
do
if [ -e $path ]
then
pdir=$(cd -P $(dirname $path); pwd)
fname=$(basename $path)
# 创建父级目录,同步路径
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $path does not exists!
fi
done
done
/opt/modules
用于安装大数据的文件 (所有节点)创建目录
# mkdir /opt/modules
更改用户组为hadoop所有
# chown hadoop:hadoop /opt/modules
[hadoop@bd-centos01 ~]$ tar -zxf software/jdk-8u212-linux-x64.tar.gz -C /opt/modules/
[hadoop@bd-centos01 ~]$ xsync /opt/modules/jdk1.8.0_212/
[root@bd-centos01 ~]# vi /etc/profile.d/path.sh
添加如下内容
export JAVA_HOME=/opt/modules/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
使配置生效
[hadoop@bd-centos01 ~]$ source /etc/profile.d/path.sh
java
javac
java -version
略。虚拟机模拟集群,不额外配置时间同步等
# systemctl stop firewalld
# systemctl disable firewalld.service
bd-centos01 | bd-centos02 | bd-centos03 | |
---|---|---|---|
HDFS | NameNode DataNode JobHistoryServer |
DataNode |
SecondaryNameNode DataNode |
Yarn | NodeManager | ResourceManager NodeManager |
NodeManager |
[hadoop@bd-centos01 ~]$ tar -zxf software/hadoop-3.3.4.tar.gz -C /opt/modules/
文件位于hadoop家目录的etc/hadoop/
下
添加如下内容
<property>
<name>fs.defaultFSname>
<value>hdfs://bd-centos01:8020value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/opt/modules/hadoop-3.3.4/datavalue>
property>
<property>
<name>hadoop.http.staticuser.username>
<value>hadoopvalue>
property>
<property>
<name>dfs.namenode.http-addressname>
<value>bd-centos01:9870value>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>bd-centos03:9868value>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>bd-centos02value>
property>
<property>
<name>yarn.nodemanager.env-whitelistname>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOMEvalue>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
property>
<property>
<name>yarn.log.server.urlname>
<value>http://bd-centos01:19888/jobhistory/logsvalue>
property>
<property>
<name>yarn.log-aggregation.retain-secondsname>
<value>604800value>
property>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>bd-centos01:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>bd-centos01:19888value>
property>
添加所有节点
bd-centos01
bd-centos02
bd-centos03
xsync /opt/modules/hadoop-3.3.4/etc/hadoop
[hadoop@bd-centos01 hadoop-3.3.4]$ bin/hdfs namenode -format
节点1
[hadoop@bd-centos01 hadoop-3.3.4]$ sbin/start-dfs.sh
节点2
[hadoop@bd-centos02 hadoop-3.3.4]$ sbin/start-yarn.sh
节点1
[hadoop@bd-centos01 hadoop-3.3.4]$ bin/mapred --daemon start historyserver
各个节点的进程因该与规划一致
$ jps
[hadoop@bd-centos01 hadoop-3.3.4]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar pi 100 1000
查看HDFS集群信息
http://bd-centos01:9870/
查看yarn的集群信息及运行任务
http://bd-centos02:8088/
查看jobHistory
http://bd-centos01:19888/