配置hadoop集群常见报错汇总
1、使用hdfs namenode -format 格式化报错找不到JAVAHOME
该问题只需在对应的窗口导入JAVAHOME即可,注意,此处为对应环境安装的JDK路径,笔者为/usr/local/java
[hadoop@hadoop0 var]$ export JAVA_HOME=/usr/local/java
鉴于每次执行都要导入,建议直接在对应的/XXX/hadoop-xxx/etc/hadoop/hadoop-env.sh 添加如下语句,可以免去这个麻烦
export JAVA_HOME=/usr/local/java
export HADOOP_CLASSPATH=.: J A V A _ H O M E / j r e / l i b / r t . j a r : JAVA\_HOME/jre/lib/rt.jar: JAVA_HOME/jre/lib/rt.jar:JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
2、使用hdfs namenode -format 格式化报错“Exception in thread “main” java.lang.UnsupportedClassVersionError: org/apache/hadoop/yarn/server/nodemanager/NodeManager : Unsupported major.minor version 52.0”
此问题为jdk版本过旧,hadoop3.x需要JDK1.8
解决措施:在JDK对应解压目录上删除旧的链接,重新建立JDK1.8的新链接
[root@hadoop2 ~]# cd /usr/local/
[root@hadoop2 local]# rm -f java
[root@hadoop2 local]# ln -sv jdk1.8.0_231 java
‘java’ -> ‘jdk1.7.0_80’
[root@hadoop2 local]# java -version
java version “1.8.0_231”
Java™ SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot™ 64-Bit Server VM (build 25.231-b11, mixed mode)
3、启动 start-all.sh 脚本发现如下一堆如下找不到命令的报错
/home/hadoop/hadoop-3.2.1/sbin/…/libexec/hadoop-functions.sh: line 398: syntax error near unexpected token `<’
/home/hadoop/hadoop-3.2.1/sbin/…/libexec/hadoop-functions.sh: line 398: ` done < <(for text in "KaTeX parse error: Undefined control sequence: \[ at position 7: {input\̲[̲@\]}"; do' /h…
该问题可能是未配置程序运行的属主,也可能是执行脚本的shell错误
如下两步应可解决:
1)、 /XXX/hadoop-3.2.1/etc/hadoop/hadoop-env.sh 添加执行用户为hadoop,需已新建并配置免密
[hadoop@hadoop0 hadoop]$ tail hadoop-env.sh
# to only allow certain users to execute certain subcommands.
# It uses the format of (command)_(subcommand)_USER.
# For example, to limit who can execute the namenode command,
export HDFS_NAMENODE_USER=“hadoop”
export HDFS_DATANODE_USER=“hadoop”
export HDFS_SECONDARYNAMENODE_USER=“hadoop”
export YARN_RESOURCEMANAGER_USER=“hadoop”
export YARN_NODEMANAGER_USER=“hadoop”
2)、可能是执行脚本的shell错误,不要使用sh,使用.执行
[hadoop@hadoop0 sbin]$ . start-all.sh
4、启动集群时报错Cannot set priority of datanode process 1620 ,该问题需要针对日志具体查看
[hadoop@hadoop0 sbin]$ . start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoop0]
Starting datanodes
WARNING: ‘slaves’ file has been deprecated. Please use ‘workers’ file instead.
hadooo0: ssh: Could not resolve hostname hadooo0: Name or service not known
hadoop3: ERROR: Cannot set priority of datanode process 1620
hadoop2: ERROR: Cannot set priority of datanode process 1614
hadoop1: ERROR: Cannot set priority of datanode process 1606
Starting secondary namenodes [hadoop0]
Starting resourcemanager
Starting nodemanagers
WARNING: ‘slaves’ file has been deprecated. Please use ‘workers’ file instead.
hadooo0: ssh: Could not resolve hostname hadooo0: Name or service not known
hadoop3: ERROR: Cannot set priority of nodemanager process 1695
hadoop1: ERROR: Cannot set priority of nodemanager process 1682
hadoop2: ERROR: Cannot set priority of nodemanager process 1689
[hadoop@hadoop0 sbin]$
启动集群后datanode节点无法启动,查看日志为java.lang.UnsupportedClassVersionError,该问题为版本不支持,升级为JDK8即可
[hadoop@hadoop1 logs]$ cat hadoop-hadoop-nodemanager-hadoop1.out
Exception in thread “main” java.lang.UnsupportedClassVersionError: org/apache/hadoop/yarn/server/nodemanager/NodeManager : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader 1. r u n ( U R L C l a s s L o a d e r . j a v a : 355 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a . n e t . U R L C l a s s L o a d e r . f i n d C l a s s ( U R L C l a s s L o a d e r . j a v a : 354 ) a t j a v a . l a n g . C l a s s L o a d e r . l o a d C l a s s ( C l a s s L o a d e r . j a v a : 425 ) a t s u n . m i s c . L a u n c h e r 1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher 1.run(URLClassLoader.java:355)atjava.security.AccessController.doPrivileged(NativeMethod)atjava.net.URLClassLoader.findClass(URLClassLoader.java:354)atjava.lang.ClassLoader.loadClass(ClassLoader.java:425)atsun.misc.LauncherAppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:482)
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 3837
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 3837
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[hadoop@hadoop1 logs]$
解决措施:在JDK对应解压目录上删除旧的链接,重新建立JDK1.8的新链接
[root@hadoop2 ~]# cd /usr/local/
[root@hadoop2 local]# rm -f java
[root@hadoop2 local]# ln -sv jdk1.8.0_231 java
‘java’ -> ‘jdk1.7.0_80’
[root@hadoop2 local]# java -version
java version “1.8.0_231”
Java™ SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot™ 64-Bit Server VM (build 25.231-b11, mixed mode)
5、启动集群时提示:‘slaves’ file has been deprecated. Please use ‘workers’ file instead.
[hadoop@hadoop0 sbin]$ . start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoop0]
Starting datanodes
WARNING: ‘slaves’ file has been deprecated. Please use ‘workers’ file instead.
Starting secondary namenodes [hadoop0]
Starting resourcemanager
Starting nodemanagers
WARNING: ‘slaves’ file has been deprecated. Please use ‘workers’ file instead.
[hadoop@hadoop0 sbin]$
该问题是因为在hadoop3.x 中slaves已经被workers文件替代,cat slaves>workers即可
6、浏览器访问50070,50030无法访问,curl 返回Connection refused
[hadoop@hadoop1 hadoop]$ curl hadoop0:50070
curl: (7) Failed connect to hadoop0:50070; Connection refused
[hadoop@hadoop1 hadoop]$ curl hadoop0:50030
curl: (7) Failed connect to hadoop0:50030; Connection refused
[hadoop@hadoop1 hadoop]$
该问题是因为3.x很多端口已经改变,使用8088访问集群监控界面,端口改变对应如下
改变了哪些端口
hadoop3.0修改端口如下:
Namenode 端口:
50470 --> 9871
50070 --> 9870
8020 --> 9820
Secondary NN 端口:
50091 --> 9869
50090 --> 9868
Datanode 端口:
50020 --> 9867
50010 --> 9866
50475 --> 9865
50075 --> 9864
对于以上端口相信我们比较重要,而且最熟悉的应该是8020端口,这是hdfs访问端口。
[hadoop@hadoop1 hadoop]$ curl http://hadoop0:8088
[hadoop@hadoop1 hadoop]$ curl http://hadoop0:9070
curl: (7) Failed connect to hadoop0:9070; Connection refused
[hadoop@hadoop1 hadoop]$ curl http://hadoop0:9870
**7、通过命令 jps 可以查看各个节点所启动的进程。**在 Master 节点上可以正确看到 NameNode、ResourceManager、SecondrryNameNode、JobHistoryServer 进程,data节点可已看到NodeManager、DataNode进程,使用 hdfs dfsadmin -report查看时,存活的节点只有一个
[hadoop@hadoop0 sbin]$ jps
6048 NodeManager
5923 ResourceManager
6435 Jps
5333 NameNode
5466 DataNode
5678 SecondaryNameNode
[hadoop@hadoop2 local]$ jps
1920 DataNode
2028 NodeManager
2125 Jps
[hadoop@hadoop2 local]$
[hadoop@hadoop1 hadoop]$ ps -ef|grep hadoop
root 1795 1187 0 17:59 pts/0 00:00:00 su hadoop
hadoop 1796 1795 0 17:59 pts/0 00:00:00 bash
hadoop 1886 1 47 18:02 00:00:17 /usr/local/java/bin/java -Dproc_datanode -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/hadoop/hadoo-3.2.1/logs -Dyarn.log.file=hadoop-hadoop-datanode-hadoop1.log -Dyarn.home.dir=/home/hadoop/hadoop-3.2.1 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-3.2.1/lib/native -Dhadoop.log.dir=/home/hadoop/hadoop-3.2.1/logs -Dhadoop.log.file=hadoop-hadoop-datanode-hadoop1.log -Dhadoop.home.dir=/home/hadoop/hadoop-3.2.1 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
hadoop 1994 1 99 18:02 00:00:28 /usr/local/java/bin/java -Dproc_nodemanager -Djava.net.preferIPv4Stack=true -Dyarn.log.dir=/home/hadoop/hadoop-3.2.1/logs -Dyarn.log.file=hadoop-hadoop-nodemanager-hadoop1.log -Dyarn.home.dir=/home/hadoop/hadoop-3.2.1 -Dyarn.root.logger=INFO,console -Djava.library.path=/home/hadoop/hadoop-3.2.1/lib/native -Dhadoop.log.dir=/home/hadoop/hadoop-3.2.1/logs -Dhadoop.log.file=hadoop-hadoop-nodemanager-hadoop1.log -Dhadoop.home.dir=/home/hadoop/hadoop-3.2.1 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager
[hadoop@hadoop0 sbin]$ hdfs dfsadmin -report
Configured Capacity: 50432839680 (46.97 GB)
Present Capacity: 48499961856 (45.17 GB)
DFS Remaining: 48499957760 (45.17 GB)
DFS Used: 4096 (4 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:9866 (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 50432839680 (46.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 1932877824 (1.80 GB)
DFS Remaining: 48499957760 (45.17 GB)
DFS Used%: 0.00%
DFS Remaining%: 96.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 30 18:06:22 CST 2020
Last Block Report: Mon Mar 30 18:02:31 CST 2020
Num of Blocks: 0
[hadoop@hadoop0 sbin]$
该问题困扰许久,最后通过查看日志看出问题所在
[hadoop@hadoop1 logs]$ tail -f hadoop-hadoop-datanode-hadoop1.log
2020-03-30 18:10:07,322 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hadoop0/192.168.2.130:49000
2020-03-30 18:10:13,325 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:14,327 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:15,329 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:16,331 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:17,333 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:18,335 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:19,337 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:20,339 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-03-30 18:10:21,341 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop0/192.168.2.130:49000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
^C
[hadoop@hadoop1 logs]$
从如上日志可以看出,本身data节点启动并无问题,但在与主节点通信时报“Problem connecting to server: hadoop0/192.168.2.130:49000”,之后持续重试。
该日志说明无法连接上hadoop0/192.168.2.130:49000 49000端口,于是尝试在hadoop0查看端口监听
[root@hadoop0 ~]# netstat -anl |grep 49000
tcp 0 0 127.0.0.1:49000 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:49000 127.0.0.1:44680 ESTABLISHED
tcp 0 0 127.0.0.1:44714 127.0.0.1:49000 TIME_WAIT
tcp 0 0 127.0.0.1:44680 127.0.0.1:49000 ESTABLISHED
[root@hadoop0 ~]#
可以看到,只监听在了127.0.0.1:49000,怀疑/etc/hosts设置有问题
[root@hadoop0 ~]# cat /etc/hosts|grep ‘127.0.0.1’
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 hadoop0
[root@hadoop0 ~]#
果然发现笔记在配置时再127.0.0.1上添加了hadoop0,删除后重试
[root@hadoop0 ~]# cat /etc/hosts|grep ‘127.0.0.1’
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
[hadoop@hadoop0 sbin]$ . start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoop0]
Starting datanodes
Starting secondary namenodes [hadoop0]
Starting resourcemanager
Starting nodemanagers
[hadoop@hadoop0 sbin]$
[hadoop@hadoop0 sbin]$ . start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [hadoop0]
Starting datanodes
Starting secondary namenodes [hadoop0]
Starting resourcemanager
Starting nodemanagers
[hadoop@hadoop0 sbin]$ hdfs dfsadmin -report
Configured Capacity: 201731358720 (187.88 GB)
Present Capacity: 196838965248 (183.32 GB)
DFS Remaining: 196838948864 (183.32 GB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (4):
Name: 192.168.2.130:9866 (hadoop0)
Hostname: hadoop0
Decommission Status : Normal
Configured Capacity: 50432839680 (46.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 1933074432 (1.80 GB)
DFS Remaining: 48499761152 (45.17 GB)
DFS Used%: 0.00%
DFS Remaining%: 96.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 30 18:21:22 CST 2020
Last Block Report: Mon Mar 30 18:20:34 CST 2020
Num of Blocks: 0
Name: 192.168.2.131:9866 (hadoop1)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 50432839680 (46.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 986411008 (940.71 MB)
DFS Remaining: 49446424576 (46.05 GB)
DFS Used%: 0.00%
DFS Remaining%: 98.04%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 30 18:21:20 CST 2020
Last Block Report: Mon Mar 30 18:20:40 CST 2020
Num of Blocks: 0
Name: 192.168.2.132:9866 (hadoop2)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 50432839680 (46.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 986542080 (940.84 MB)
DFS Remaining: 49446293504 (46.05 GB)
DFS Used%: 0.00%
DFS Remaining%: 98.04%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 30 18:21:20 CST 2020
Last Block Report: Mon Mar 30 18:20:41 CST 2020
Num of Blocks: 0
Name: 192.168.2.133:9866 (hadoop3)
Hostname: hadoop3
Decommission Status : Normal
Configured Capacity: 50432839680 (46.97 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 986365952 (940.67 MB)
DFS Remaining: 49446469632 (46.05 GB)
DFS Used%: 0.00%
DFS Remaining%: 98.04%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Mar 30 18:21:20 CST 2020
Last Block Report: Mon Mar 30 18:20:39 CST 2020
Num of Blocks: 0