1. 说明
网上找了篇Hadoop 3.1环境搭建的文章,错误百出,所以决定自己写一个,用作学习和测试环境使用。
2. 软件版本
软件 | 版本 | 下载地址 |
---|---|---|
CentOS | 7.2.1511 x86_64 | |
Hadoop | 3.1.0 | hadoop-3.1.0-src.tar.gz |
JDK | jdk-8u172-linux-x64.rpm | jdk8 |
系统全部在虚拟机中运行。
3. 服务器规划
主机名 | IP地址 | 说明 | 运行进程 |
---|---|---|---|
node1 | 10.211.55.4 | node1节点(master) | NameNode, ResourceManager, SecondaryNameNode |
node2 | 10.211.55.5 | node2节点(worker) | DataNode, NodeManager |
node3 | 10.211.55.6 | node3节点(worker) | DataNode, NodeManager |
4. 服务器环境准备
本章配置每个节点都要进行,并且都以root账号登录服务器进行配置。
4.1 基础环境
# 添加host,把以下内容添加到hosts文件末尾
[root@node1 ~] vi /etc/hosts
10.211.55.4 node1
10.211.55.5 node2
10.211.55.6 node3
# 执行以下命令关闭防火墙
[root@node1 ~]systemctl stop firewalld && systemctl disable firewalld
# 停止SELINUX
[root@node1 ~]setenforce 0
# 将SELINUX的值改成disabled
[root@node1 ~]vi /etc/selinux/config
# 修改SELINUX值为disabled,禁用SELINUX
SELINUX=disabled
# 修改服务器hostname,和hosts文件中定义一致,重启后shell提示为node1
[root@node1 ~]vi /etc/hostname
node1
# 修改服务器shell,hadoop不支持非bash的shell,如果使用了zsh(例如:使用了ohmyzsh),则需要改回bash
[root@node1 ~]chsh -s /bin/bash
# 重启服务器
[root@node1 ~]reboot
4.2 配置免密码登录
使用root账号登录服务器。
# node1执行以下命令
# 生成密钥对,输入之后一直选择enter即可。生成的秘钥位于 ~/.ssh文件夹下
[root@node1 ~]# ssh-keygen -t rsa
[root@node1 ~]# scp ~/.ssh/id_rsa.pub root@node2:~
[root@node1 ~]# scp ~/.ssh/id_rsa.pub root@node3:~
# node2,node3 执行以下命令
[root@node2 ~]# mkdir -p .ssh
[root@node2 ~]# cd .ssh/
[root@node2 .ssh]# cat ~/id_rsa.pub >> authorized_keys
# 三个节点分别执行以下命令,修改权限,否则远程启动的时候会报错
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
4.3 安装JDK
每个节点下载并上传jdk安装程序。
# 安装jdk,默认安装到/usr/java/jdk1.8.0_172-amd64目录下,安装程序同时会创建链接:/usr/java/default和/usr/java/latest,可以用来配置JAVA_HOME
[root@node1 ~]# rpm -ivh jdk-8u172-linux-x64.rpm
# 配置环境变量
[root@node1 ~]# vi ~/.bash_profile
# 在末尾添加
export JAVA_HOME=/usr/java/default
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
# 刷新配置文件
source ~/.bash_profile
5. 安装Hadoop
在node1节点上安装hadoop并进行配置,然后复制到其他节点。使用root账号登录node1,依次执行以下配置。
5.1 安装Hadoop
# 创建目录
[root@node1 opt]# cd /opt/ && mkdir hadoop && cd hadoop
# 解压hadoop-3.1.0.tar.gz
[root@node1 hadoop]# tar xvf hadoop-3.1.0.tar.gz
# 修改环境变量
[root@node1 hadoop]# vi ~/.bash_profile
# 在文件末尾添加
export HADOOP_HOME=/opt/hadoop/hadoop-3.1.0
export PATH=$PATH:$HADOOP_HOME/bin
# 刷新配置文件
[root@node1 hadoop]# source ~/.bash_profile
5.2 修改配置文件
这些配置文件全部位于 /opt/hadoop/hadoop-3.1.0/etc/hadoop 文件夹下。
hadoop-env.sh
#The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
# export JAVA_HOME=
export JAVA_HOME=/usr/java/default
core-site.xml
fs.defaultFS
hdfs://node1:9000
hadoop.tmp.dir
/opt/hadoop/data/tmp
hdfs-site.xml
dfs.namenode.name.dir
/opt/hadoop/data/name
dfs.replication
2
dfs.datanode.data.dir
/opt/hadoop/data/datanode
mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.application.classpath
/opt/hadoop/hadoop-3.1.0/etc/hadoop,
/opt/hadoop/hadoop-3.1.0/share/hadoop/common/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/common/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/hdfs/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/hdfs/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/mapreduce/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/mapreduce/lib/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/yarn/*,
/opt/hadoop/hadoop-3.1.0/share/hadoop/yarn/lib/*
yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandle
yarn.resourcemanager.hostname
node1
yarn.resourcemanager.address
node1:8040
yarn.resourcemanager.resource-tracker.address
node1:8025
yarn.resourcemanager.scheduler.address
node1:8030
在sbin/start-dfs.sh
和sbin/stop-dfs.sh
脚本开始处添加以下代码设置环境变量,设置dfs程序运行的账户。
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
在sbin/start-yarn.sh
和sbin/stop-yarn.sh
文件开始处添加以下代码设置环境变量,设置yarn程序运行的账。
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
workers
[root@node1 hadoop]# touch /opt/hadoop/hadoop-3.1.0/etc/hadoop/workers
[root@node1 hadoop]# vim /opt/hadoop/hadoop-3.1.0/etc/hadoop/workers
#添加
node2
node3
注意:Hadoop 3.1.0中,workers文件名字为workers,不是slaves!
创建文件夹
[root@node1 hadoop]# mkdir -p /opt/hadoop/data/tmp
[root@node1 hadoop]# mkdir -p /opt/hadoop/data/name
[root@node1 hadoop]# mkdir -p /opt/hadoop/data/datanode
复制到其他节点
[root@node1 opt]# scp -r /opt/hadoop node2:/opt/
[root@node1 opt]# scp -r /opt/hadoop node3:/opt/
6. 启动
以root账户登录node1节点,第一次启动需要格式化。
[root@node1 opt]# /opt/hadoop/hadoop-3.1.0/bin/hdfs namenode -format
启动
[root@node1 opt]# /opt/hadoop/hadoop-3.1.0/sbin/start-all.sh
停止
[root@node1 opt]# /opt/hadoop/hadoop-3.1.0/sbin/stop-all.sh
7. 验证
在每个节点上运行jps
命令,查看JAVA进程。
访问yarn,检查Active Nodes,应该有2个,可以查看状态。
http://node1:8088/
访问namenode,检查datanodes,可以看到2个datanode,可以查看状态。
http://node1:9870/
访问secondarynamenode,出现空白hadoop界面。
http://node1:9868/
8. 测试
使用自带的example进行测试,使用root账号登录node1节点。
- 创建HDFS目录,创建好root目录之后,hdfs会自动用当前系统登录的账号作为当前目录,所以无需指定/user/root
cd /opt/hadoop/hadoop-3.1.0
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/root
hdfs dfs -mkdir input
- 将输入文件拷贝到分布式文件系统
hdfs dfs -put etc/hadoop/*.xml input
- 运行提供的示例程序,执行MapReduce任务
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep input output 'dfs[a-z.]+'
- 查看输出文件
将输出文件从分布式文件系统拷贝到本地文件系统查看:
hdfs dfs -get output output
cat output/*
或者直接查看:
hdfs dfs -cat output/*
9. 常见问题
9.1 Hadoop集群格式化hdfs报错:java.net.UnknownHostException: node1: node1: Name or service not known.
执行hdfs namenode -format
命令格式化hdfs的时候报错。
原因:
Hadoop在格式化HDFS的时候,通过hostname命令获取到的主机名是localhost.localdomain,然后在/etc/hosts文件中进行映射的时候,没有找到。
解决方案:
- 修改/etc/hostname文件中的主机名,使其和/etc/hosts文件中的一致:node1
- 重启操作系统
参考:
Hadoop格式化报错java.net.UnknownHostException:
错误日志:
2018-07-03 02:47:48,309 WARN net.DNS: Unable to determine local hostname -falling back to 'localhost'
java.net.UnknownHostException: node1: node1: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:283)
at org.apache.hadoop.net.DNS.(DNS.java:61)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:1014)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:608)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1190)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1631)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
Caused by: java.net.UnknownHostException: node1: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 8 more
2018-07-03 02:47:48,321 WARN net.DNS: Unable to determine address of the host -falling back to 'localhost' address
java.net.UnknownHostException: node1: node1: Name or service not known
at java.net.InetAddress.getLocalHost(InetAddress.java:1505)
at org.apache.hadoop.net.DNS.resolveLocalHostIPAddress(DNS.java:306)
at org.apache.hadoop.net.DNS.(DNS.java:62)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newBlockPoolID(NNStorage.java:1014)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.newNamespaceInfo(NNStorage.java:608)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:169)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1190)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1631)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1741)
Caused by: java.net.UnknownHostException: node1: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
... 8 more
2018-07-03 02:47:48,328 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1508911316-127.0.0.1-1530600468322
2018-07-03 02:47:48,328 INFO common.Storage: Will remove files: [/opt/hadoop/data/name/current/fsimage_0000000000000000000, /opt/hadoop/data/name/current/seen_txid, /opt/hadoop/data/name/current/fsimage_0000000000000000000.md5, /opt/hadoop/data/name/current/VERSION]
2018-07-03 02:47:48,336 INFO common.Storage: Storage directory /opt/hadoop/data/name has been successfully formatted.
2018-07-03 02:47:48,346 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop/data/name/current/fsimage.ckpt_0000000000000000000 using no compression
2018-07-03 02:47:48,420 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop/data/name/current/fsimage.ckpt_0000000000000000000 of size 389 bytes saved in 0 seconds .
2018-07-03 02:47:48,428 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2018-07-03 02:47:48,433 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: node1: node1: Name or service not known
************************************************************/
9.2 Hadoop集群启动报错:bash v3.2+ is required. Sorry.
执行命令sbin/start-all.sh
启动Hadoop集群的时候报错bash v3.2+ is required. Sorry.
原因:
系统使用了zsh或者其他非bash的shell,而hadoop的脚本都是使用bash编写的。
解决方案:
修改每个节点的shell为bash: chsh -s /bin/bash
错误消息:
sbin/start-all.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [node1]
Last login: Tue Jul 3 02:53:34 EDT 2018 from 10.211.55.2 on pts/0
bash v3.2+ is required. Sorry.
Starting datanodes
Last login: Tue Jul 3 03:06:41 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
Starting secondary namenodes [node2]
Last login: Tue Jul 3 03:06:41 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Starting resourcemanager
Last login: Tue Jul 3 03:06:42 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
Starting nodemanagers
Last login: Tue Jul 3 03:06:44 EDT 2018 on pts/0
bash v3.2+ is required. Sorry.
9.3 Hadoop集群启动报错:Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
ssh登录本机的错误,解决方案如下:
- 执行start_all.sh的节点必须把本机的证书也放到authorized_keys中:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 每台节点执行以下命令设置权限:
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
9.4 Hadoop集群启动报错:ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
解决方案请参考5.2
在sbin/start-dfs.sh
和sbin/stop-dfs.sh
脚本开始处添加以下代码设置环境变量,设置dfs程序运行的账户。
HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
在sbin/start-yarn.sh
和sbin/stop-yarn.sh
文件开始处添加以下代码设置环境变量,设置yarn程序运行的账户。
YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
[root@node1 hadoop-3.1.0]# sbin/start-all.sh
Starting namenodes on [node1]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [node2]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
10. 参考
官方文档
core-default.xml
hdfs-default.xml
mapred-default.xml
yarn-default.xml
Hadoop: Setting up a Single Node Cluster网上文章
hadoop3.0 分布式搭建/安装
Hadoop3.0集群环境搭建
hadoop3.1.0集群搭建