在Hadoop中进行MapReduce开始时,会有进行Map端Join的场景,一般都需要在Driver中添加缓存文件。
但是执行时可能会报错:
INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local1986965861_0001
INFO [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Creating symlink: \tmp\xxx\position.txt <- xxx/position.txt
WARN [org.apache.hadoop.fs.FileUtil] - Command 'xxx\winutils.exe symlink xxx\position.txt \tmp\xxx\position.txt' failed 1 with: CreateSymbolicLink error (1314): ???????????
WARN [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Failed to create symlink: \tmp\xxx\position.txt <- xxx/position.txt
INFO [org.apache.hadoop.mapred.LocalDistributedCacheManager] - Localized file:/xxx/position.txt as file:/xxx/position.txt
INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local1986965861_0001
java.lang.Exception: java.io.FileNotFoundException: position.txt (系统找不到指定的文件。)
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551)
Caused by: java.io.FileNotFoundException: position.txt (系统找不到指定的文件。)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
...
出现异常的主要问题是winutils.exe文件在创建符号表链接时失败,导致缓存找Windows本地的临时文件不存在,程序找不到缓存文件,所以不能正常执行。
原因在于:
Mapreduce正在访问一些受限制的路径/位置,而Windows账户不具备创建符号表的权限,也就是存在权限的问题。
解决方法如下:
(1)Win+R键
打开Run窗口,输入gpedit.msc
打开本地组策略编辑器,并按下图进行操作:
注意,在操作完成后需要重启电脑。
(2)如果在Run窗口中输入gpedit.msc
提示Windows找不到文件 'gpedit.msc'。请确定文件名是否正确后,再试一次.
,则可能是组策略编辑器遗失,需要重新安装,可以参考https://blog.csdn.net/qq_41731507/article/details/115875247进行解决,解决后再重新打开组策略编辑器进行编辑和重启。
在通过脚本stop-yarn.sh
停止Yarn集群时,有时候会报错,例如:
[root@node03 ~]$ stop-yarn.sh
stopping yarn daemons
no resourcemanager to stop
node01: no nodemanager to stop
node02: no nodemanager to stop
node03: no nodemanager to stop
no proxyserver to stop
但是通过jps
命令查看各个节点时,可以看到ResourceManager和NodeManager都还在运行状态,说明没有成功停止。
原因:
这是因为yarn-deamon.sh文件中配置了ResourceManager和NodeManager服务的pid文件,里面存储了它们的pid,默认的存储位置是/tmp
,但是系统会定期清理这个目录,所以pid文件可能会丢失,找不到文件就会报上面的错。
解决办法:
为了一劳永逸地解决这个问题,先在一个节点进行操作,需要修改yarn-deamon.sh中的pid文件路径,该文件位于Hadoop安装目录下的sbin目录下,编辑该文件的88行左右,如下:
if [ "$YARN_PID_DIR" = "" ]; then
# YARN_PID_DIR=/tmp
YARN_PID_DIR=/opt/software/hadoop-2.9.2/data/pids
fi
可以看到,原目录YARN_PID_DIR的值为/tmp
,这里修改为/opt/software/hadoop-2.9.2/data/pids
,也可以根据自己的需要进行设置。
同时手动创建该目录mkdir /opt/software/hadoop-2.9.2/data/pids
。
在修改和创建完成后需要通过分发脚本将yarn-deamon.sh脚本和pids目录分发到其他节点,或者在其他节点手动进行同样的操作。
然后通过kill -9 pid
停止各个节点的Yarn ResourceManager和NodeManager服务,然后再执行start-yarn.sh
就会在指定的目录(/opt/software/hadoop-2.9.2/data/pids
)下创建对应的pid文件。
扩展:
停止Hadoop和HistoryServer时可能也会遇到类似的问题,例如no namenode to stop
和no historyserver to stop
,问题的原因和Yarn类似,也需要修改对应的pid文件路径:
Hadoop修改对应的hadoop-daemon.sh114行左右,如下:
if [ "$HADOOP_PID_DIR" = "" ]; then
# HADOOP_PID_DIR=/tmp
HADOOP_PID_DIR=/opt/software/hadoop-2.9.2/data/pids
fi
HistoryServer修改对应的mr-jobhistory-daemon.sh87行左右,如下:
if [ "$HADOOP_MAPRED_PID_DIR" = "" ]; then
# HADOOP_MAPRED_PID_DIR=/tmp
HADOOP_MAPRED_PID_DIR=/opt/software/hadoop-2.9.2/data/pids
fi
然后将脚本分发至其他节点,先手动停止对应开启的服务,再通过脚本启动也会在指定目录下生成相应的pid文件。
有时候需要对Hadoop源码进行二次开发,开发完成需要使用Maven编译打包才能使用,但是有时候打包会报错,类似如下:
...
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/generated-sources/avro/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinished.java:864: 警告: 没有 @return
[ERROR] public org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinished.Builder clearPhysMemKbytes() {
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:190: 警告: 没有 @return
[ERROR] public TaskID getTaskId() {
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:194: 警告: 没有 @return
[ERROR] public TaskAttemptID getAttemptId() {
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:199: 警告: 没有 @return
[ERROR] public TaskType getTaskType() {
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:203: 警告: 没有 @return
[ERROR] public String getTaskStatus() {
return taskStatus.toString(); }
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/MapAttemptFinishedEvent.java:205: 警告: 没有 @return
[ERROR] public long getMapFinishTime() {
return mapFinishTime; }
...
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:63: 警告: @param 没有说明
[ERROR] * @param startTime
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:64: 警告: @param 没有说明
[ERROR] * @param finishTime
[ERROR] ^
[ERROR] /opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/TaskReport.java:65: 警告: @param 没有说明
[ERROR] * @param counters
[ERROR] ^
[ERROR]
[ERROR] Command line was: /opt/software/java/jdk1.8.0_231/jre/../bin/javadoc @options @packages
[ERROR]
[ERROR] Refer to the generated Javadoc files in '/opt/packages/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target' dir.
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :hadoop-mapreduce-client-core
即错误类似于xxx.java:864: 警告: 没有 @return
和[ERROR] * @param counters
;
经过初步观察和分析,可以得到是编译Java文档时Java代码的参数说明格式不对导致的,此时可以选择不编译Java文档,即在编译命令后添加-D maven.javadoc.skip=true
参数跳过编译Java文档,完整的命令为mvn package -P dist,native -D skipTests -D tar -D maven.javadoc.skip=true
,此时再编译就不会报错了。
在使用Hive时,有时候需要自定义函数,即UDF,定义好类后需要打包,有时候会报错如下:
即如下错误信息:
Failure to find org.pentaho:pentaho-aggdesigner-algorithm:pom:5.1.5-jhyde in http://maven.aliyun.com/nexus/content/repositories/central/ was cached in the local repository, resolution will not be reattempted until the update interval of alimaven has elapsed or updates are forced
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] 读取XXX\.m2\repository\org\pentaho\pentaho-aggdesigner-algorithm\5.1.5-jhyde\pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar时出错; zip END header not found
[ERROR] XXX/HiveUDFDemo/src/main/java/com/bigdata/hive/nvl.java:[1,1] 无法访问com.bigdata.hive
zip END header not found
[INFO] 2 errors
显然,这是pentaho-aggdesigner-algorithm包出了问题,此时需要在Maven项目的配置文件pom.xml中修改hive-exec依赖的配置,增加以下内容:
<exclusions>
<exclusion>
<groupId>org.pentahogroupId>
<artifactId>pentaho-aggdesigner-algorithmartifactId>
exclusion>
exclusions>
完整内容如下:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0modelVersion>
<groupId>org.examplegroupId>
<artifactId>HiveUDFDemoartifactId>
<version>1.0-SNAPSHOTversion>
<properties>
<maven.compiler.source>8maven.compiler.source>
<maven.compiler.target>8maven.compiler.target>
properties>
<dependencies>
<dependency>
<groupId>org.apache.hivegroupId>
<artifactId>hive-execartifactId>
<version>2.3.7version>
<exclusions>
<exclusion>
<groupId>org.pentahogroupId>
<artifactId>pentaho-aggdesigner-algorithmartifactId>
exclusion>
exclusions>
dependency>
dependencies>
project>
再重启IDEA,打包即可成功。
在Hive中,元数据又种配置方式,即内置模式、本地模式和远程模式,在配置内置模式执行命令schema initialization to 2.3.0
初始化元数据库时,执行命令用于生成metastore目录metastore_db
,可能会报错Error: FUNCTION 'NUCLEUS_ASCII' already exists
,具体如下:
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
这是因为在Hive的安装路径下已经存在metastore_db
目录,所以在执行命令创建该目录时会存在冲突。
解决办法:
在安装目录下修改metastore_db
目录的名字或者直接删除即可解决问题。
Hue是Hadoop的可视化框架,安装是通过编译实现的,通过make install
命令进行编译,在编译时可能会报错,如下:
/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
collect2: 错误:ld 返回 1
error: command 'gcc' failed with exit status 1
make[2]: *** [/opt/software/hue-release-4.3.0/desktop/core/build/python-ldap-2.3.13/egg.stamp] 错误 1
make[2]: 离开目录“/opt/software/hue-release-4.3.0/desktop/core”
make[1]: *** [.recursive-install-bdist/core] 错误 2
make[1]: 离开目录“/opt/software/hue-release-4.3.0/desktop”
make: *** [install-desktop] 错误 2
分析:
即没有找到libssl和libcrypto,查看yum info openssl
可以发现openssl已经安装,再查看如下:
[root@node02 lib64]$ ll /usr/lib64/libssl*
-rwxr-xr-x. 1 root root 340976 9月 27 2018 /usr/lib64/libssl3.so
lrwxrwxrwx. 1 root root 16 8月 19 03:23 /usr/lib64/libssl.so.10 -> libssl.so.1.0.2k
-rwxr-xr-x. 1 root root 470360 10月 31 2018 /usr/lib64/libssl.so.1.0.2k
可以看到,根本原因是虽然有libssl的动态库文件,但没有文件名为libssl.so的文件,系统找不到它。
解决办法:
添加软链接,将动态库文件指向ld可以找到的链接,如下:
[root@node02 lib64]$ ln -s /usr/lib64/libssl.so.1.0.2k /usr/lib64/libssl.so
[root@node02 lib64]$ ln -s /usr/lib64/libcrypto.so.1.0.2k /usr/lib64/libcrypto.so
[root@node02 lib64]$ ll /usr/lib64/libssl*
-rwxr-xr-x. 1 root root 340976 9月 27 2018 /usr/lib64/libssl3.so
lrwxrwxrwx 1 root root 27 10月 3 13:28 /usr/lib64/libssl.so -> /usr/lib64/libssl.so.1.0.2k
lrwxrwxrwx. 1 root root 16 8月 19 03:23 /usr/lib64/libssl.so.10 -> libssl.so.1.0.2k
-rwxr-xr-x. 1 root root 470360 10月 31 2018 /usr/lib64/libssl.so.1.0.2k
[root@node02 lib64]$ ll /usr/lib64/libcrypto*
lrwxrwxrwx 1 root root 19 10月 3 13:32 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.2k
lrwxrwxrwx 1 root root 19 10月 3 13:32 /usr/lib64/libcrypto.so.10 -> libcrypto.so.1.0.2k
-rwxr-xr-x 1 root root 2520768 12月 17 2020 /usr/lib64/libcrypto.so.1.0.2k
可以看到,已经添加了软链接。
此时再编译即能成功。
在编译Hue时,需要MySQL,所以在编译时可能会报错,如下:
EnvironmentError: mysql_config not found
make[2]: *** [/opt/software/hue-release-4.3.0/desktop/core/build/MySQL-python-1.2.5/egg.stamp] 错误 1
make[2]: 离开目录“/opt/software/hue-release-4.3.0/desktop/core”
make[1]: *** [.recursive-install-bdist/core] 错误 2
make[1]: 离开目录“/opt/software/hue-release-4.3.0/desktop”
make: *** [install-desktop] 错误 2
此时需要安装mysql-devel,直接执行命令yum -y install mysql-devel
进行安装即可。
然后再编译即能编译成功。
Impala包括3个角色:
在配置好安装时需要分别启动,在启动impala-state-store和impala-catalog时可能会报错:
[root@node03 ~]$ service impala-state-store start
Redirecting to /bin/systemctl start impala-state-store.service
Failed to start impala-state-store.service: Unit not found.
[root@node03 ~]$ service impala-catalog start
Redirecting to /bin/systemctl start impala-catalog.service
Failed to start impala-catalog.service: Unit not found.
即会报错Unit not found.
,说明安装的服务未找到,此时可以查看:
[root@node03 ~]$ yum list | grep impala
impala.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-catalog.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-server.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-shell.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-state-store.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
hue-impala.x86_64 3.9.0+cdh5.7.6+1881-1.cdh5.7.6.p0.7.el7
impala-debuginfo.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
impala-udf-devel.x86_64 2.5.0+cdh5.7.6+0-1.cdh5.7.6.p0.7.el7
显然已经安装了这两个服务,说明可能安装的时候出了问题,未加载到systemctl的列表中,查看systemctl的服务列表可以使用systemctl list-unit-files --type=service
命令。
解决办法:
先执行命令yum remove impala-state-store.x86_64 -y
和yum remove impala-catalog.x86_64 -y
这两个服务,再执行yum -y install impala-state-store
和yum -y install impala-catalog
命令重新安装这两个服务,然后再启动就能成功了。
在安装Impala时,需要进行Hadoop相关的配置,但是在进行相关配置后,可能会出现一些问题,例如启动HDFS时可能不能成功启动DataNode,此时可以查看日志,发现报错如下:
2021-10-10 20:44:40,037 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.io.IOException: The path component: '/var/lib/hadoop-hdfs' in '/var/lib/hadoop-hdfs/dn_socket' has permissions 0755 uid 993 and gid 1003. It is not protected because it is owned by a user who is not root and not the effective user: '0'. This might help: 'chown root /var/lib/hadoop-hdfs' or 'chown 0 /var/lib/hadoop-hdfs'. For more information: https://wiki.apache.org/hadoop/SocketPathSecurity
at org.apache.hadoop.net.unix.DomainSocket.validateSocketPathSecurity0(Native Method)
at org.apache.hadoop.net.unix.DomainSocket.bindAndListen(DomainSocket.java:193)
at org.apache.hadoop.hdfs.net.DomainPeerServer.<init>(DomainPeerServer.java:40)
at org.apache.hadoop.hdfs.server.datanode.DataNode.getDomainPeerServer(DataNode.java:1171)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:1137)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1369)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:495)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2695)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2598)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2645)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2789)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2813)
2021-10-10 20:44:40,046 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: The path component: '/var/lib/hadoop-hdfs' in '/var/lib/hadoop-hdfs/dn_socket' has permissions 0755 uid 993 and gid 1003. It is not protected because it is owned by a user who is not root and not the effective user: '0'. This might help: 'chown root /var/lib/hadoop-hdfs' or 'chown 0 /var/lib/hadoop-hdfs'. For more information: https://wiki.apache.org/hadoop/SocketPathSecurity
2021-10-10 20:44:40,052 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at node01/192.168.31.155
************************************************************/
显然,这是因为root用户没有对/var/lib/hadoop-hdfs
目录的权限。
在安装Impala时可以进行短路读取配置,需要创建短路读取中转站,即目录/var/lib/hadoop-hdfs
,如果不添加root用户的操作权限,就可能存在启动失败。
解决办法:
此时只需要在所有节点都执行chown root /var/lib/hadoop-hdfs
设置用户即可。