Hadoop开发常见异常及解决办法总结

文章目录

  • 1.MapReduce Map端Join报错'winutils.exe symlink xxx/position.txt \tmp\xxx\position.txt' failed 1 with: CreateSymbolicLink error (1314)
  • 2.通过脚本停止Yarn时提示no resourcemanager to stop、no nodemanager to stop
  • 3.Hadoop编译源码报错[ERROR] xxx.java:864: 警告: 没有 @return
  • 4.Hive开发自定义UDF报错Failure to find org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde
  • 5.Hive元数据配置内嵌模式报错Error: FUNCTION 'NUCLEUS_ASCII' already exists
  • 6.编译Hue时报错/usr/bin/ld: cannot find -lcrypto和/usr/bin/ld: cannot find -lssl
  • 7.编译Hue报错EnvironmentError: mysql_config not found
  • 8.启动Impala时报错Unit not found
  • 9.安装Impala后启动HDFS报错java.io.IOException

1.MapReduce Map端Join报错’winutils.exe symlink xxx/position.txt \tmp\xxx\position.txt’ failed 1 with: CreateSymbolicLink error (1314)

在Hadoop中进行MapReduce开始时,会有进行Map端Join的场景,一般都需要在Driver中添加缓存文件。
但是执行时可能会报错:

INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local1986965861_0001
INFO [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Creating symlink: \tmp\xxx\position.txt <- xxx/position.txt
WARN [org.apache.hadoop.fs.FileUtil] - Command 'xxx\winutils.exe symlink xxx\position.txt \tmp\xxx\position.txt' failed 1 with: CreateSymbolicLink error (1314): ???????????

WARN [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Failed to create symlink: \tmp\xxx\position.txt <- xxx/position.txt
INFO [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Localized file:/xxx/position.txt as file:/xxx/position.txt
INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local1986965861_0001
java.lang.Exception: java.io.FileNotFoundException: position.txt (系统找不到指定的文件。)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551)
Caused by: java.io.FileNotFoundException: position.txt (系统找不到指定的文件。)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
...

出现异常的主要问题是winutils.exe文件在创建符号表链接时失败,导致缓存找Windows本地的临时文件不存在,程序找不到缓存文件,所以不能正常执行。

原因在于:
Mapreduce正在访问一些受限制的路径/位置,而Windows账户不具备创建符号表的权限,也就是存在权限的问题。

解决方法如下:
(1)Win+R键打开Run窗口,输入gpedit.msc打开本地组策略编辑器,并按下图进行操作:
Hadoop开发常见异常及解决办法总结_第1张图片
注意,在操作完成后需要重启电脑。
(2)如果在Run窗口中输入gpedit.msc提示Windows找不到文件 'gpedit.msc'。请确定文件名是否正确后,再试一次.,则可能是组策略编辑器遗失,需要重新安装,可以参考https://blog.csdn.net/qq_41731507/article/details/115875247进行解决,解决后再重新打开组策略编辑器进行编辑和重启。

2.通过脚本停止Yarn时提示no resourcemanager to stop、no nodemanager to stop

在通过脚本stop-yarn.sh停止Yarn集群时,有时候会报错,例如:

[root@node03 ~]$ stop-yarn.sh 
stopping yarn daemons
no resourcemanager to stop
node01: no nodemanager to stop
node02: no nodemanager to stop
node03: no nodemanager to stop
no proxyserver to stop

但是通过jps命令查看各个节点时,可以看到ResourceManager和NodeManager都还在运行状态,说明没有成功停止。

原因:
这是因为yarn-deamon.sh文件中配置了ResourceManager和NodeManager服务的pid文件,里面存储了它们的pid,默认的存储位置是/tmp,但是系统会定期清理这个目录,所以pid文件可能会丢失,找不到文件就会报上面的错。

解决办法:
为了一劳永逸地解决这个问题,先在一个节点进行操作,需要修改yarn-deamon.sh中的pid文件路径,该文件位于Hadoop安装目录下的sbin目录下,编辑该文件的88行左右,如下:

if [ "$YARN_PID_DIR" = "" ]; then
  # YARN_PID_DIR=/tmp
  YARN_PID_DIR=/opt/software/hadoop-2.9.2/data/pids
fi

可以看到,原目录YARN_PID_DIR的值为/tmp,这里修改为/opt/software/hadoop-2.9.2/data/pids,也可以根据自己的需要进行设置。
同时手动创建该目录mkdir /opt/software/hadoop-2.9.2/data/pids
在修改和创建完成后需要通过分发脚本将yarn-deamon.sh脚本和pids目录分发到其他节点,或者在其他节点手动进行同样的操作。
然后通过kill -9 pid停止各个节点的Yarn ResourceManager和NodeManager服务,然后再执行start-yarn.sh就会在指定的目录(/opt/software/hadoop-2.9.2/data/pids)下创建对应的pid文件。

扩展:
停止Hadoop和HistoryServer时可能也会遇到类似的问题,例如no namenode to stopno historyserver to stop,问题的原因和Yarn类似,也需要修改对应的pid文件路径:
Hadoop修改对应的hadoop-daemon.sh114行左右,如下:

if [ "$HADOOP_PID_DIR" = "" ]; then
  # HADOOP_PID_DIR=/tmp
  HADOOP_PID_DIR=/opt/software/hadoop-2.9.2/data/pids
fi

HistoryServer修改对应的mr-jobhistory-daemon.sh87行左右,如下:

if [ "$HADOOP_MAPRED_PID_DIR" = "" ]; then
  # HADOOP_MAPRED_PID_DIR=/tmp
  HADOOP_MAPRED_PID_DIR=/opt/software/hadoop-2.9.2/data/pids
fi

然后将脚本分发至其他节点,先手动停止对应开启的服务,再通过脚本启动也会在指定目录下生成相应的pid文件。

3.Hadoop编译源码报错[ERROR] xxx.java:864: 警告: 没有 @return

有时候需要对Hadoop源码进行二次开发,开发完成需要使用Maven编译打包才能使用,但是有时候打包会报错,类似如下:

...
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/generated-sources/avro/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinished.java:864: 警告: 没有 @return
[ERROR]     public org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinished.Builder clearPhysMemKbytes() {
     
[ERROR]                                                                              ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:190: 警告: 没有 @return
[ERROR]   public TaskID getTaskId() {
     
[ERROR]                 ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:194: 警告: 没有 @return
[ERROR]   public TaskAttemptID getAttemptId() {
     
[ERROR]                        ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:199: 警告: 没有 @return
[ERROR]   public TaskType getTaskType() {
     
[ERROR]                   ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:203: 警告: 没有 @return
[ERROR]   public String getTaskStatus() {
      return taskStatus.toString(); }
[ERROR]                 ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:205: 警告: 没有 @return
[ERROR]   public long getMapFinishTime() {
      return mapFinishTime; }
...
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:63: 警告: @param 没有说明
[ERROR]    * @param startTime
[ERROR]      ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:64: 警告: @param 没有说明
[ERROR]    * @param finishTime
[ERROR]      ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:65: 警告: @param 没有说明
[ERROR]    * @param counters
[ERROR]      ^
[ERROR] 
[ERROR] Command line was: /opt/software/java/jdk1.8.0_231/jre/../bin/javadoc @options @packages
[ERROR] 
[ERROR] Refer to the generated Javadoc files in '/opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target' dir.
[ERROR] 
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :hadoop-mapreduce-client-core

即错误类似于xxx.java:864: 警告: 没有 @return[ERROR] * @param counters
经过初步观察和分析,可以得到是编译Java文档时Java代码的参数说明格式不对导致的,此时可以选择不编译Java文档,即在编译命令后添加-D maven.javadoc.skip=true参数跳过编译Java文档,完整的命令为mvn package -P dist,native -D skipTests -D tar -D maven.javadoc.skip=true,此时再编译就不会报错了。

4.Hive开发自定义UDF报错Failure to find org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde

在使用Hive时,有时候需要自定义函数,即UDF,定义好类后需要打包,有时候会报错如下:
Hadoop开发常见异常及解决办法总结_第2张图片即如下错误信息:

Failure to find org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde in http://maven.aliyun.com/nexus/content/repositories/central/ was cached in the local repository, resolution will not be reattempted until the update interval of alimaven has elapsed or updates are forced

[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] 读取XXX\.m2\repository\org\pentaho\pentaho-aggdesigner-algorithm\5.1.5-jhyde\pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar时出错; zip END header not found
[ERROR] XXX/HiveUDFDemo/src/main/java/com/bigdata/hive/nvl.java:[1,1] 无法访问com.bigdata.hive
  zip END header not found
[INFO] 2 errors 

显然,这是pentaho-aggdesigner-algorithm包出了问题,此时需要在Maven项目的配置文件pom.xml中修改hive-exec依赖的配置,增加以下内容:

<exclusions>
    <exclusion>
        <groupId>org.pentahogroupId>
        <artifactId>pentaho-aggdesigner-algorithmartifactId>
    exclusion>
exclusions>

完整内容如下:


<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0modelVersion>

    <groupId>org.examplegroupId>
    <artifactId>HiveUDFDemoartifactId>
    <version>1.0-SNAPSHOTversion>

    <properties>
        <maven.compiler.source>8maven.compiler.source>
        <maven.compiler.target>8maven.compiler.target>
    properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.hivegroupId>
            <artifactId>hive-execartifactId>
            <version>2.3.7version>
            <exclusions>
                <exclusion>
                    <groupId>org.pentahogroupId>
                    <artifactId>pentaho-aggdesigner-algorithmartifactId>
                exclusion>
            exclusions>
        dependency>
    dependencies>

project>

再重启IDEA,打包即可成功。

5.Hive元数据配置内嵌模式报错Error: FUNCTION ‘NUCLEUS_ASCII’ already exists

在Hive中,元数据又种配置方式,即内置模式、本地模式和远程模式,在配置内置模式执行命令schema initialization to 2.3.0初始化元数据库时,执行命令用于生成metastore目录metastore_db,可能会报错Error: FUNCTION 'NUCLEUS_ASCII' already exists,具体如下:

Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.

这是因为在Hive的安装路径下已经存在metastore_db目录,所以在执行命令创建该目录时会存在冲突。
解决办法:
在安装目录下修改metastore_db目录的名字或者直接删除即可解决问题。

6.编译Hue时报错/usr/bin/ld: cannot find -lcrypto和/usr/bin/ld: cannot find -lssl

Hue是Hadoop的可视化框架,安装是通过编译实现的,通过make install命令进行编译,在编译时可能会报错,如下:

/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
collect2: 错误:ld 返回 1
error: command 'gcc' failed with exit status 1
make[2]: *** [/opt/software/hue-release-4.3.0/desktop/core/build/python-ldap-2.3.13/egg.stamp] 错误 1
make[2]: 离开目录“/opt/software/hue-release-4.3.0/desktop/core”
make[1]: *** [.recursive-install-bdist/core] 错误 2
make[1]: 离开目录“/opt/software/hue-release-4.3.0/desktop”
make: *** [install-desktop] 错误 2

分析:
即没有找到libssl和libcrypto,查看yum info openssl可以发现openssl已经安装,再查看如下:

[root@node02 lib64]$ ll /usr/lib64/libssl*
-rwxr-xr-x. 1 root root 340976 927 2018 /usr/lib64/libssl3.so
lrwxrwxrwx. 1 root root     16 819 03:23 /usr/lib64/libssl.so.10 -> libssl.so.1.0.2k
-rwxr-xr-x. 1 root root 470360 1031 2018 /usr/lib64/libssl.so.1.0.2k

可以看到,根本原因是虽然有libssl的动态库文件,但没有文件名为libssl.so的文件,系统找不到它。

解决办法:
添加软链接,将动态库文件指向ld可以找到的链接,如下:

[root@node02 lib64]$ ln -s /usr/lib64/libssl.so.1.0.2k /usr/lib64/libssl.so
[root@node02 lib64]$ ln -s /usr/lib64/libcrypto.so.1.0.2k /usr/lib64/libcrypto.so
[root@node02 lib64]$ ll /usr/lib64/libssl*
-rwxr-xr-x. 1 root root 340976 927 2018 /usr/lib64/libssl3.so
lrwxrwxrwx  1 root root     27 103 13:28 /usr/lib64/libssl.so -> /usr/lib64/libssl.so.1.0.2k
lrwxrwxrwx. 1 root root     16 819 03:23 /usr/lib64/libssl.so.10 -> libssl.so.1.0.2k
-rwxr-xr-x. 1 root root 470360 1031 2018 /usr/lib64/libssl.so.1.0.2k
[root@node02 lib64]$ ll /usr/lib64/libcrypto*
lrwxrwxrwx 1 root root      19 103 13:32 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.2k
lrwxrwxrwx 1 root root      19 103 13:32 /usr/lib64/libcrypto.so.10 -> libcrypto.so.1.0.2k
-rwxr-xr-x 1 root root 2520768 1217 2020 /usr/lib64/libcrypto.so.1.0.2k

可以看到,已经添加了软链接。
此时再编译即能成功。

7.编译Hue报错EnvironmentError: mysql_config not found

在编译Hue时,需要MySQL,所以在编译时可能会报错,如下:

EnvironmentError: mysql_config not found
make[2]: *** [/opt/software/hue-release-4.3.0/desktop/core/build/MySQL-python-1.2.5/egg.stamp] 错误 1
make[2]: 离开目录“/opt/software/hue-release-4.3.0/desktop/core”
make[1]: *** [.recursive-install-bdist/core] 错误 2
make[1]: 离开目录“/opt/software/hue-release-4.3.0/desktop”
make: *** [install-desktop] 错误 2

此时需要安装mysql-devel,直接执行命令yum -y install mysql-devel进行安装即可。
然后再编译即能编译成功。

8.启动Impala时报错Unit not found

Impala包括3个角色:

  • impala-server
  • impala-statestored
  • impala-catalogd

在配置好安装时需要分别启动,在启动impala-state-store和impala-catalog时可能会报错:

[root@node03 ~]$ service impala-state-store start
Redirecting to /bin/systemctl start impala-state-store.service
Failed to start impala-state-store.service: Unit not found.
[root@node03 ~]$ service impala-catalog start
Redirecting to /bin/systemctl start impala-catalog.service
Failed to start impala-catalog.service: Unit not found.

即会报错Unit not found.,说明安装的服务未找到,此时可以查看:

[root@node03 ~]$ yum list | grep impala
impala.x86_64                               2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-catalog.x86_64                       2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-server.x86_64                        2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-shell.x86_64                         2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-state-store.x86_64                   2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
hue-impala.x86_64                           3.9.0+cdh5.7.6+1881-1.cdh5.7.6.p0.7.el7
impala-debuginfo.x86_64                     2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-udf-devel.x86_64                     2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7

显然已经安装了这两个服务,说明可能安装的时候出了问题,未加载到systemctl的列表中,查看systemctl的服务列表可以使用systemctl list-unit-files --type=service命令。
解决办法:
先执行命令yum remove impala-state-store.x86_64 -yyum remove impala-catalog.x86_64 -y这两个服务,再执行yum -y install impala-state-storeyum -y install impala-catalog命令重新安装这两个服务,然后再启动就能成功了。

9.安装Impala后启动HDFS报错java.io.IOException

在安装Impala时,需要进行Hadoop相关的配置,但是在进行相关配置后,可能会出现一些问题,例如启动HDFS时可能不能成功启动DataNode,此时可以查看日志,发现报错如下:

2021-10-10 20:44:40,037 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: The path component: '/var/lib/hadoop-hdfs' in '/var/lib/hadoop-hdfs/dn_socket' has permissions 0755 uid 993 and gid 1003. It is not protected because it is owned by a user who is not root and not the effective user: '0'. This might help: 'chown root /var/lib/hadoop-hdfs' or 'chown 0 /var/lib/hadoop-hdfs'. For more information: https://wiki.apache.org/hadoop/SocketPathSecurity
        at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method)
        at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:193)
        at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:1171)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:1137)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1369)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:495)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2695)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2598)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2645)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2789)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2813)
2021-10-10 20:44:40,046 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: The path component: '/var/lib/hadoop-hdfs' in '/var/lib/hadoop-hdfs/dn_socket' has permissions 0755 uid 993 and gid 1003. It is not protected because it is owned by a user who is not root and not the effective user: '0'. This might help: 'chown root /var/lib/hadoop-hdfs' or 'chown 0 /var/lib/hadoop-hdfs'. For more information: https://wiki.apache.org/hadoop/SocketPathSecurity
2021-10-10 20:44:40,052 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at node01/192.168.31.155
************************************************************/

显然,这是因为root用户没有对/var/lib/hadoop-hdfs目录的权限。
在安装Impala时可以进行短路读取配置,需要创建短路读取中转站,即目录/var/lib/hadoop-hdfs,如果不添加root用户的操作权限,就可能存在启动失败。
解决办法:
此时只需要在所有节点都执行chown root /var/lib/hadoop-hdfs设置用户即可。

你可能感兴趣的:(大数据开发实战,Hadoop开发,异常,解决办法)