hadoop distcp踩坑记

distcp(分布式拷贝)是用于大规模集群内部和集群之间拷贝的工具。 它使用Map/Reduce实现文件分发,错误处理和恢复,以及报告生成。 它把文件和目录的列表作为map任务的输入,每个任务会完成源列表中部分文件的拷贝。

 

1、在nn1上执行

hadoop distcp hdfs://source-nn1:9000/user/xxx.txt hdfs://dest-nn1:9000/

结果报错如下:

19/10/19 17:34:17 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://source-nn1:9000/user/xxx.txt], targetPath=hdfs://dest-nn1:9000/, targetPathExists=true, preserveRawXattrs=false}
19/10/19 17:34:18 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: java.lang.reflect.InvocationTargetException
19/10/19 17:34:18 ERROR tools.DistCp: Exception encountered 
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
	at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
	at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
	at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
	at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:379)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

查阅资料,发现是缺少这样的jar包hadoop-mapreduce-client-common。但是查看之后发现,/home/work/hadoopcluster/hadoop/share/hadoop/mapreduce路径下有hadoop-mapreduce-client-common-2.6.0.jar这个包。

 

2、在nn2上执行

hadoop distcp hdfs://source-nn1:9000/user/xxx.txt hdfs://dest-nn2:9000/

结果报错如下:

19/10/21 11:06:05 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://source-nn1:9000/user/xxx.txt], targetPath=hdfs://dest-nn2/, targetPathExists=true, preserveRawXattrs=false}
19/10/21 11:07:37 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create over /192.168.4.168:9000. Not retrying because retries (11) exceeded maximum allowed (10)
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/tmp/hadoop-yarn/staging/work/.staging/_distcp1370007999/fileList.seq. Name node is in safe mode.
The reported blocks 0 needs additional 11 blocks to reach the threshold 0.9990 of total blocks 11.
The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1368)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2630)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2519)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:566)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:394)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/tmp/hadoop-yarn/staging/work/.staging/_distcp1370007999/fileList.seq. Name node is in safe mode.

Name node is in safe mode,从字面意义上可以看出,是因为namenode处于安全模式下,所以无法对hdfs进行读写操作。

在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全模式结束。安全模式主要是为了系统启动的时候检查各个datanode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。 

namenode在启动的时候首先进入安全模式,如果datanode丢失的block达到一定的比例(1-dfs.safemode.threshold.pct),则系统会一直处于安全模式状态即只读状态。

dfs.safemode.threshold.pct(缺省值0.999f)表示HDFS启动的时候,如果datanode上报的block个数达到了元数据记录的block个数的0.999倍才可以离开安全模式,否则一直是这种只读模式。如果设为1则hdfs永远是处于safe mode。

 

解决方法:

离开安全模式:hadoop dfsadmin -safemode leave

 

用户可以通过dfsadmin -safemode value来操作安全模式,参数value的说明如下:

enter - 进入安全模式

leave - 强制NameNode离开安全模式

get - 返回安全模式是否开启的信息

wait - 等待,一直到安全模式结束。

 

再次执行,数据拷贝成功。

 

总结:

在nn1和nn2两个namenode上分别执行distcp操作时,同样的命令在nn1上执行时会报错,但是在nn2上则执行成功了,而此时nn1状态为standby,nn2状态为active,所以猜测distcp操作只能在active状态时才能使用。

 

3、队列设置

hadoop集群设置队列后,再次执行前面的distcp语句时出错,具体报错信息如下:

19/10/25 17:35:11 ERROR tools.DistCp: Exception encountered 
java.io.IOException: Failed to run job : Application application_1571828871124_0002 submitted by user work to unknown queue: default,bigdata
	at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:162)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

解决方法:

显然是未指定具体执行队列,可以通过-Dmapreduce.job.queuename=default的方式指定,即

hadoop distcp -Dmapreduce.job.queuename=default hdfs://source-nn1:9000/user/xxx.txt hdfs://dest-nn2:9000/

 

执行该命令,结果依然报错。

19/10/25 17:37:10 ERROR tools.DistCp: Exception encountered 
java.io.IOException: Failed to run job : Application application_1571828871124_0003 submitted by user work to non-leaf queue: default
	at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:162)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

解决方法:

突然想到default和bigdata队列下又分别拆分为了3个子队列,所以需要指定到具体队列,例如:

hadoop distcp -Dmapreduce.job.queuename=hql hdfs://source-nn1:9000/user/xxx.txt hdfs://dest-nn2:9000/

再次执行,成功~

 

4、distcp注意事项

(1)目的集群的所有节点中需要有源集群各个机器的hosts配置信息;

(2)可以通过ip+端口或hostname+端口的方式访问其他集群;

(3)distcp操作需要在目的集群的active namenode节点上完成,且hadoop配置文件中无需添加源集群的任何配置信息;

(4)对于设置有队列的hadoop集群来说,执行distcp操作时需要指定具体队列,可通过添加-Dmapreduce.job.queuename的方式进行指定。

 

你可能感兴趣的:(Hadoop)