虚拟机环境准备
- 3台虚拟机(一个拥有root权限的用户,关闭防火墙、静态ip、主机名称)
- 修改
/etc/hosts
文件 - 创建用户和常用文件夹
- JDK安装配置
- 添加脚本
xsync.sh
和xcall.sh
- SSH 无密登录配置
- 集群时间同步
Hadoop安装
导入已经编译过压缩包(提取码:j9oe )至/opt/software目录下(因为官方只提供32位的版本,我们需要自己将32位编译成64位)
解压 hadoop 到/opt/module 目录下
[kevin@hadoop112 software]$ tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
- 配置 hadoop 环境变量
先获取 hadoop 路径:
[kevin@hadoop112 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
打开/etc/profile.d/myPath.sh 文件:
[kevin@hadoop112 hadoop-2.7.2]$ vim /etc/profile.d/myPath.sh
在 profie 文件最末尾添加 hadoop 路径:
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
让修改后的文件生效:
[kevin@hadoop112 hadoop-2.7.2]$ source /etc/profile
重启(如果 hadoop version 可以用就不用重启):
[kevin@hadoop112 hadoop-2.7.2]$ sync
[kevin@hadoop112 hadoop-2.7.2]$ reboot
测试 hadoop 安装成功
[kevin@hadoop112 hadoop-2.7.2]$ hadoop version
Hadoop 2.7.2
集群配置
- 集群部署规划
hadoop112 | hadoop113 | hadoop114 | |
---|---|---|---|
HDFS | NameNode;DataNode | DataNode | SecondaryNameNode;DataNode |
YARN | NodeManager | ResourceManager;NodeManager | NodeManager |
- 配置集群
配置core-site.xml
fs.defaultFS
hdfs://hadoop112:9000
hadoop.tmp.dir
/opt/module/hadoop-2.7.2/data/tmp
配置hadoop-env.sh
# 修改JAVA_HOME 路径:
export JAVA_HOME=/opt/module/jdk1.8.0_241
配置hdfs-site.xml
dfs.replication
3
dfs.namenode.secondary.http-address
hadoop114:50090
配置yarn-env.sh
# 修改JAVA_HOME 路径:
export JAVA_HOME=/opt/module/jdk1.8.0_241
配置yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
hadoop113
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
604800
yarn.nodemanager.pmem-check-enabled
false
yarn.nodemanager.vmem-check-enabled
false
配置mapred-env.sh
# 修改JAVA_HOME 路径:
export JAVA_HOME=/opt/module/jdk1.8.0_241
配置mapred-site.xml
[kevin@hadoop112 hadoop]$ mv mapred-site.xml.template mapred-site.xml
[kevin@hadoop112 hadoop]$ vim mapred-site.xml
mapreduce.framework.name
yarn
配置历史服务器
配置mapred-site.xml
mapreduce.jobhistory.address
hadaoop112:10020
mapreduce.jobhistory.webapp.address
hadaoop112:19888
配置日志的聚集
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
配置yarn-site.xml (前面已经写入)
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
604800
在集群上分发配置好的Hadoop配置文件
[kevin@hadoop112 hadoop]$ xsync.sh /opt/module/hadoop-2.7.2/
查看文件分发情况
[kevin@hadoop113 hadoop]$ cat /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml
如果集群是第一次启动,需要格式化NameNode(若存在data和logs文件夹,就需要删除)
[kevin@hadoop112 hadoop-2.7.2]$ hadoop namenode -format
后面若想重置HDFS,就删除data和logs文件夹,在格式化NameNode
集群单点启动(后面使用群起集群)
- 在hadoop112上启动NameNode
[kevin@hadoop112 hadoop-2.7.2]$ hadoop-daemon.sh start namenode
[kevin@hadoop112 hadoop-2.7.2]$ jps
3461 NameNode
3608 Jps
- 在hadoop102、hadoop103以及hadoop104上分别启动DataNode
[kevin@hadoop112 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kevin@hadoop112 hadoop-2.7.2]$ jps
3461 NameNode
3608 Jps
3561 DataNode
[kevin@hadoop113 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kevin@hadoop113 hadoop-2.7.2]$ jps
3190 DataNode
3279 Jps
[kevin@hadoop114 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kevin@hadoop114 hadoop-2.7.2]$ jps
3237 Jps
3163 DataNode
群起集群配置
- 配置slaves
[kevin@hadoop112 ~]$cd /opt/module/hadoop-2.7.2/etc/hadoop/
[kevin@hadoop112 hadoop]$ vim slaves
# 清空再添加:
hadoop112
hadoop113
hadoop114
注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。
同步所有节点配置文件
[kevin@hadoop112 hadoop]$ xsync.sh slaves
启动HDFS(关闭就是start改为stop)
[kevin@hadoop112 hadoop-2.7.2]$ sbin/start-dfs.sh
再查看每台机器的jps,看是否HDFS是否和上面的规划一致,(不一致就是配置有问题)
启动YARN (先看下面的注意再操作)(关闭就是start改为stop)
[kevin@hadoop113 hadoop-2.7.2]$ sbin/start-yarn.sh
再查看每台机器的jps,看是否YARN是否和上面的规划一致,(不一致就是配置有问题)
注意:NameNode和ResourceManger如果不是同一台机器,不能在NameNode上启动 YARN,应该在ResouceManager所在的机器上启动YARN。这里是在hadoop113
常用Web页面:
HDFS的web端查看
http://hadoop112:50070/explorer.html#/
YARN的Web端查看
http://hadoop113:8088/cluster
SecondaryNameNode的Web端查看
http://hadoop114:50090/status.html
JobHistory的Web端查看
http://hadoop1112:19888/jobhistory
NameNode的Web端查看
http://hadoop112:8042/node
查询集群状态: (查看是否和规划的相对应)
[kevin@hadoop101 ~]$ xcall.sh jps
================= hadoop101 的jps ===============
1920 NodeManager
1639 DataNode
1530 NameNode
1995 JobHistoryServer
2124 Jps
================= hadoop102 的jps ===============
1448 ResourceManager
1561 NodeManager
1356 DataNode
1900 Jps
================= hadoop101 的jps ===============
1680 Jps
1363 DataNode
1430 SecondaryNameNode
1516 NodeManager
[kevin@hadoop101 ~]$
集群基本测试
准备文件 test.txt
[kevin@hadoop112 hadoop-2.7.2]$ touch test.txt
[kevin@hadoop112 hadoop-2.7.2]$ vim test.txt
kevin kevin
Spark Hive Hello
上传文件到集群
[kevin@hadoop112 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/kevin/input
[kevin@hadoop112 hadoop-2.7.2]$ hdfs dfs -put test.txt /user/kevin/input
查看HDFS的web端
http://hadoop112:50070/explorer.html#/user/kevin/input
执行简单的MapReduce程序(注意:HDFS中 /user/kevin/output文件夹不能已经创建)
[kevin@hadoop112 hadoop-2.7.2]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/kevin/input/ /user/kevin/output
查看输出结果
[kevin@hadoop112 hadoop-2.7.2]$ bin/hdfs dfs -cat /user/kevin/output/*
脚本
hadoop-cluster.sh
[kevin@hadoop101 ~]$ cd ~/bin
[kevin@hadoop101 bin]$ touch hadoop-cluster.sh
[kevin@hadoop101 bin]$ chmod 775 hadoop-cluster.sh
编辑 hadoop-cluster.sh
#!/bin/bash
case $1 in
"start"){
echo "================= hadoop102正在启动HDFS ==============="
ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/start-dfs.sh"
echo "================= hadoop103正在启动YARN ==============="
ssh kevin@hadoop103 "/opt/module/hadoop-2.7.2/sbin/start-yarn.sh"
echo "================= hadoop102正在启动历史服务器 ==============="
ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh start historyserver"
};;
"stop"){
echo "================= hadoop102正在关闭历史服务器 ==============="
ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/mr-jobhistory-daemon.sh stop historyserver"
echo "================= hadoop103 正在关闭YARN ==============="
ssh kevin@hadoop103 "/opt/module/hadoop-2.7.2/sbin/stop-yarn.sh"
echo "================= hadoop102 正在关闭HDFS ==============="
ssh kevin@hadoop102 "/opt/module/hadoop-2.7.2/sbin/stop-dfs.sh"
};;
esac
使用:
# 启动集群
[kevin@hadoop101 ~]$ hadoop-cluster.sh start
# 关闭集群
[kevin@hadoop101 ~]$ hadoop-cluster.sh stop
记录:
# 启动记录
[kevin@node101 ~]$ hadoop-cluster.sh start
================= node101正在启动HDFS ===============
Starting namenodes on [node101]
node101: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-namenode-node101.out
node101: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-datanode-node101.out
node102: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-datanode-node102.out
node103: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-datanode-node103.out
Starting secondary namenodes [node103]
node103: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-kevin-secondarynamenode-node103.out
================= node102正在启动YARN ===============
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-resourcemanager-node102.out
node102: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-nodemanager-node102.out
node103: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-nodemanager-node103.out
node101: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-kevin-nodemanager-node101.out
================= hadoop101正在启动历史服务器 ===============
starting historyserver, logging to /opt/module/hadoop-2.7.2/logs/mapred-kevin-historyserver-node101.out
[kevin@node101 ~]$
# 关闭记录
[kevin@node101 ~]$ hadoop-cluster.sh stop
================= node101 正在关闭历史服务器 ===============
stopping historyserver
================= node102 正在关闭YARN ===============
stopping yarn daemons
stopping resourcemanager
node103: stopping nodemanager
node101: stopping nodemanager
node102: stopping nodemanager
no proxyserver to stop
================= node101 正在关闭HDFS ===============
Stopping namenodes on [node101]
node101: stopping namenode
node103: stopping datanode
node101: stopping datanode
node102: stopping datanode
Stopping secondary namenodes [node103]
node103: stopping secondarynamenode
[kevin@node101 ~]$