HDFS dfs.replication 设定

执行hive语句时,遇到

 

2014-12-12 10:21:48,709 INFO  [pool-4-thread-491]: exec.Task (SessionState.java:printInfo(538)) - 2014-12-12 10:21:48,708 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2181
.62 sec
2014-12-12 10:21:49,931 INFO  [pool-4-thread-491]: mapred.ClientServiceDelegate (ClientServiceDelegate.java:getProxy(273)) - Application state is completed. FinalApplicationStat
us=FAILED. Redirecting to job history server
2014-12-12 10:21:50,088 ERROR [pool-4-thread-491]: exec.Task (SessionState.java:printError(547)) - Ended Job = job_1417594562342_0536 with exception 'java.io.IOException(Could n
ot find status of job:job_1417594562342_0536)'
java.io.IOException: Could not find status of job:job_1417594562342_0536
        at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:294)
        at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
        at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
        at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
        at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:144)
        at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:68)
        at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:199)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:500)
        at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

2014-12-12 10:21:50,097 WARN  [pool-4-thread-491]: exec.Utilities (Utilities.java:clearWork(251)) - Failed to clean-up tmp directories.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hive-hive/hive_2014-12-12_10-18-08_920_5780140361268508076-74/-mr-10004/3d645a6c-b66b-4b98-8e8b-4e34b7764c99/map.xml. Name node is in safe mode.
Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE:  If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1207)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3348)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3308)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3292)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:733)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:547)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

 

百思不得其解,后面发现

Safe Mode Status In safe mode
HDFS进入了保护模式

 

为什么执行hive的sql进入会进入保护模式?

觉得SQL有问题,或者hive的bug。。。不过想想,hive也已经很稳定了,有这个bug,这么浅显的问题早就发现了。

 

后面想想,进入保护模式,会不会是  HDFS文件系统的问题呢?

执行如下命令:

hdfs fsck -blocks

啪啪:

/user/oozie/share/lib/sharelib.properties:  Under replicated BP-1568389693-10.10.11.231-1404794792018:blk_1073742198_1374. Target Replicas is 3 but found 2 replica(s).
.
/user/oozie/share/lib/sqoop/commons-io-2.1.jar:  Under replicated BP-1568389693-10.10.11.231-1404794792018:blk_1073742199_1375. Target Replicas is 3 but found 2 replica(s).
.
/user/oozie/share/lib/sqoop/hsqldb-1.8.0.10.jar:  Under replicated BP-1568389693-10.10.11.231-1404794792018:blk_1073742200_1376. Target Replicas is 3 but found 2 replica(s).
.

还真是HDFS出了问题:

我已经将dfs.replication设置为2了,为什么没有自动生效。

问题到这里,已经清晰明了。

原理还需要,对原有问的文件进行修改

hadoop dfs -setrep -w 2 -R /

执行这个明了后,文件系统备份都改为2。OK搞定。

你可能感兴趣的:(Replication)