设置虚拟机名称:hadoop01
主机名称:@hadoop01
用户名称:zzk
密码:hadoop123456
给zzk用户sudo权限:
sudo su //切换root用户
vim /etc/sudoers
添加
zzk ALL=(ALL) NOPASSWD: ALL
①克隆之后设置虚拟机名称hadoop02、hadoop03
②在网络管理->高级->生成mac地址重新获得mac地址
③修改主机名:
//永久性的修改主机名称,重启后能保持修改后的。
hostnamectl set-hostname xxx
//删除hostname
hostnamectl set-hostname ""
hostnamectl set-hostname "" --static
hostnamectl set-hostname "" --pretty
vim /etc/sysconfig/network-scripts/ifcfg-ens32
BOOTPROTO="static"
IPADDR="192.168.12.129" //不同主机的Ip地址应不一样
NETMASK="255.255.255.0"
GATEWAY="192.168.12.2"
DNS1="8.8.8.8"
ONBOOT=“yes"
①查看防火墙状态
firewall-cmd --state
②关闭防火墙
systemctl stop firewalld.service
③禁止开机自启
systemctl disable firewalld.service
①安装ssh
yum install -y openssh-clients
②配置主机名与ip地址映射,可以通过主机名访问虚拟机
vim /etc/hosts
192.168.12.129 hadoop01
③配置免密码登录
1.创建目录 cd进入用户目录
mkdir .ssh
ssh-keygen -t rsa
后生成密钥 ,一直回车
2.进入.ssh目录,输入命令
cp id_rsa.pub authorized_keys
将公钥copy到认证文件
3.为.ssh赋予权限
cd 回到用户目录,为.ssh和其子目录授予权限
chmod 700 .ssh
chmod 600 .ssh/*
①进入root用户,cp覆盖本地时间
cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
yum install ntp //可能已经安装
ntpdate pool.ntp.org //同步时间
①在hadoop02节点切换到zzk用户
cat ~/.ssh/id_rsa.pub | ssh zzk@hadoop01 'cat >>~/.ssh/authorized_keys
同理hadoop03
②scp远程拷贝命令将hadoop01节点的authorized_keys拷贝到hadoop02、hadoop03节点
scp -r authorized_keys zzk@hadoop02:~/.ssh/
①创建脚本目录
②在tools目录下编写脚本配置文件和分发文件:
deploy.conf
deploy.sh
runRemoteCmd.sh
deploy.conf:
#规划集群角色
hadoop01,master,all,zookeeper,namenode,datanode,
hadoop02,slave,all,zookeeper,namenode,datanode,rs
hadoop03,slave,all,zookeeper,datanode,rs
deploy.sh:
#!/bin/bash
if [ $# -lt 3 ]
then
echo "Usage: ./deploy.sh srcFile(or Dir) descFile(or Dir) MachineTag"
echo "Usage: ./deploy.sh srcFile(or Dir) descFile(or Dir) MachineTag confFile"
exit
fi
src=$1
dest=$2
tag=$3
if [ 'a'$4'a' == 'aa' ]
then
confFile=/home/zzk/tools/deploy.conf
else
confFile=$4
fi
if [ -f $confFile ]
then
if [ -f $src ]
then
for server in `cat $confFile | grep -v '^#'|grep ','$tag','|awk -F',' '{print $1}'`
do
scp $src $server":"${dest}
done
elif [ -d $src ]
then
for server in `cat $confFile | grep -v '^#'|grep ','$tag','|awk -F',' '{print $1}'`
do
scp -r $src $server":"${dest}
done
else
echo "Error: No source file exist."
fi
else
echo "Error: please assign config file or run deploy.sh command with deploy.conf in same directory"
fi
runRemoteCmd.sh:
#!/bin/bash
if [ $# -lt 2 ]
then
echo "Usage: ./runRemoteCmd.sh Command MachineTag"
echo "Usage: ./runRemoteCmd.sh Command MachineTag confFile"
exit
fi
cmd=$1
tag=$2
if [ 'a'$3'a' == 'aa' ]
then
confFile=/home/zzk/tools/deploy.conf
else
confFile=$3
fi
if [ -f $confFile ]
then
for server in `cat $confFile | grep -v '^#'|grep ','$tag','|awk -F',' '{print $1}'`
do
echo "*************$server*****************"
ssh $server "source ~/.bashrc; $cmd"
done
else
echo "Error: please assign config file or run deploy.sh command with deploy.conf in same directory"
fi
③给脚本添加执行权限:
chmod u+x deploy.sh
chmod u+x runRemoteCmd.sh
⑤配置脚本环境变量:vim ~/.bashrc
PATH=/home/zzk/tools:$PATH
export PATH
①创建data目录
runRemoteCmd.sh "mkdir /home/zzk/data" all
将jdk压缩包在data目录解压缩
tar -zxvf jdk-8u271-linux-x64.tar.gz
创建软链接:ln -s jdk1.8.0_271/ jdk
②配置环境变量
1.修改/etc/profile
文件,所有用户都可以使用此环境变量,可能会造成系统安全问题
2.修改~/.bashrc
文件,针对某个用户的环境变量
在这里修改配置这个文件的环境变量
JAVA_HOME=/home/zzk/app/jdk
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:/home/zzk/tools:$PATH
export JAVA_HOME CLASSPATH PATH
③分发到另外两个节点
deploy.sh jdk1.8.0_271/ /home/zzk/app/ slave
在另外两个节点配置环境变量
官网下载:https://archive.apache.org/dist/zookeeper/
这里下载apache-zookeeper-3.6.3-bin.tar.gz 版本
上传到/home/zzk/app目录下解压
创建zookeeper软链接
1.修改zoo.cfg
cd zookeeper
cd conf/
cp zoo_sample.cfg zoo.cfg
在zoo.cfg中添加如下部分
dataDir=/home/zzk/data/zookeeper/zkdata
dataLogDir=/home/zzk/data/zookeeper/zkdatalog
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888
2.将apache-zookeeper-3.6.3-bin/用脚本分发到其他节点
①进入到app目录后deploy.sh apache-zookeeper-3.6.3-bin/ /home/zzk/app/ slave
(注意是zzk不是hadoop)
②创建runRemoteCmd.sh "mkdir -p /home/zzk/data/zookeeper/zkdata" all
目录
3.修改每个节点服务编号
每个节点进入/home/zzk/data/zookeeper/zkdata目录,创建myid
,内容为1,2,3
4.测试运行
runRemoteCmd.sh "/home/zzk/app/zookeeper/bin/zkServer.sh start" all
查看状态runRemoteCmd.sh "/home/zzk/app/zookeeper/bin/zkServer.sh status" all
进入zookeeper:bin/zkCli.sh
https://archive.apache.org/dist/hadoop/common/
镜像下载:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
ps:清华大学开源镜像网站
https://mirrors.tuna.tsinghua.edu.cn/
下载后上传到/home/zzk/app目录解压
①修改core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/zzk/data/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
</configuration>
②修改hdfs-site.xml
<configuration>
<!-- 副本数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 权限默认配置为false -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- 完全分布式集群名称 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- 集群中NameNode节点都有哪些 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- NameNode数据存储目录 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/zzk/data/hadoop/name</value>
</property>
<!-- DataNode数据存储目录 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/zzk/data/hadoop/data</value>
</property>
<!-- NameNode的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop02:8020</value>
</property>
<!-- NameNode的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop01:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop02:9870</value>
</property>
<!-- 启用nn故障自动转移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 指定NameNode元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value>
</property>
<!-- 访问代理类:client用于确定哪个NameNode为Active -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- JournalNode数据存储目录 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/zzk/data/journaldata/jn</value>
</property>
<!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 使用隔离机制时需要ssh秘钥登录-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/zzk/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>10000</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
</configuration>
③配置hadoop-env.sh
export JAVA_HOME=/home/zzk/app/jdk
export HADOOP_HOME=/home/zzk/app/hadoop
④配置workers
,不是slaves文件!!!datanode没启动害我找了好久的原因!!!在/home/zzk/app/hadoop/etc/hadoop
下有workers文件,vim修改,不然有点datanode不会启动!!
hadoop01
hadoop02
hadoop03
⑤用脚本分发到其他节点:deploy.sh hadoop-3.3.2/ /home/zzk/app/ slave
注意分发完后创建软链接:ln -s hadoop-3.3.2/ hadoop
①确认zkServer启动runRemoteCmd.sh "/home/zzk/app/zookeeper/bin/zkServer.sh status" all
②启动journalnode节点
runRemoteCmd.sh "/home/zzk/app/hadoop/sbin/hadoop-daemon.sh start journalnode" all
③nn1节点格式化namenode:在app/hadoop目录下
bin/hdfs namenode -format
一定要关闭所有节点的防火墙!!!(第一次格式化就没有关闭hadoop03的格式化失败,重新格式化需要删除所有日志文件和data目录下的文件)
④nn1节点格式化zkfc
bin/hdfs zkfc -formatZK
⑤nn1节点启动namenode
bin/hdfs namenode
⑥nn2同步nn1元数据信息,在hadoop02节点进入app/hadoop
bin/hdfs namenode -bootstrapStandby
同步完成后,ctrl c关闭nn1节点namenode进程
sbin/hadoop-daemon.sh stop namenode
⑦关闭journalnode
runRemoteCmd.sh "/home/zzk/app/hadoop/sbin/hadoop-daemon.sh stop journalnode" all
⑧一键启动hdfs
sbin/start-dfs.sh
查看状态:bin/hdfs haadmin -getServiceState nn1
①修改hadoop01的yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.connect.retry-interval.ms</name>
<value>2000</value>
</property>
<!--开启resource manager HA,默认为false-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--开启resourcemanager故障自动切换,指定机器-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--rm启动内置选举-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 集群的Id,使用该值确保RM不会做为其它集群的active -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-rm-cluster</value>
</property>
<!--配置resource manager 命名-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 配置第一台机器的resourceManager -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<!-- 配置第二台机器的resourceManager -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<!--开启resourcemanager自动恢复功能-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--zookeeper存储地址-->
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<!-- 配置第一台机器的resourceManager通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>hadoop01:8033</value>
</property>
<!-- 配置第二台机器的resourceManager通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop02:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop02:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop02:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>hadoop02:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop02:8088</value>
</property>
<!-- 逗号隔开的服务列表,列表名称应该只包含a-zA-Z0-9_,不能以数字开始-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- nodemanager本地文件存储目录-->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/zzk/data/yarn/local</value>
</property>
</configuration>
和mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/zzk/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/zzk/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/zzk/hadoop</value>
</property>
</configuration>
之后分发到另外两个节点:
deploy.sh yarn-site.xml /home/zzk/app/hadoop/etc/hadoop/ slave
deploy.sh mapred-site.xml /home/zzk/app/hadoop/etc/hadoop/ slave
②一键启动yarn:sbin/start-yarn.sh
如果有个节点resourcemanager没启动,手动启动:
sbin/yarn-daemon.sh start resourceManager
③查看状态:bin/yarn rmadmin -getServiceState rm1
④查看网站界面(standby节点不能访问,需要跳转)
⑤测试
在/home/zzk/data下创建一个testData文件夹,创建一个wd.txt文件:
hadoop hadoop yarn yarn yarn
bin/hdfs dfs -mkdir /test
bin/hdfs dfs -put ~/data/testData/wd.txt /text
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.2.jar wordcount /test/wd.txt /test/out
步骤:windows->preferences->maven->add
之后找到maven的conf文件夹中的settings文件
D:\IdeaProjects\maven_repository
之后再windows->preferences->maven->user settings修改
①步骤:file->new->other->maven->maven project->next
②选择骨架,finish
③切换jre
项目右键->build path->
edit后选择jdk1.8
③编写wordcount程序
从官网拷贝WordCount源码
④导入依赖:https://mvnrepository.com/
hadoop common
Apache Hadoop Client Aggregator -> hadoop client
⑤添加log4j(跳过)
项目->>new->> source folder
⑤本地测试运行(windows运行有bug不运行了)
右键->>run as->>找到main class:WordCount->>argus
在D:\all_kind_projects\test\hadoopTest目录下创建wc.txt
①项目右键->>export->>java->>jar file
②上传jar包到hadoop集群
/home/zzk/data/jarPackage
执行bin/hadoop jar ~/data/jarPackage/wc.jar com.hadooptest.hadoop_test.WordCount /test/wd.txt /test/out2
配置Hadoop环境变量:
HADOOP_HOME=/home/zzk/app/hadoop
JAVA_HOME=/home/zzk/app/jdk
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:/home/zzk/tools:$PATH
export HADOOP_HOME JAVA_HOME CLASSPATH PATH