hdfs raid从facebook移植过来很久了,包括hadoop0.20.203和hadoop2.4.0版本,但是最近才准备上线hadoop2.4.0版本的hdfs raid,上线前准备在好好测试测试,确保上线顺利,hdfs raid代码分成两部分,一部分是hdfs下面的代码,这部分代码上一次和其他patch一起已经上线,另外一部分是raid自己比较独立的代码,按照之前的计划,后一部分代码准备只部署在RaidNode节点和gateway上面,这样对集群的影响是最小的,不需要重启hadoop进程,在测试RaidNode的时候一切都很正常,但是昨天在gateway上面测试hdfs raid配置对提交job的影响时出现了问题,其实在gateway上面hdfs raid增加的配置比较简单,增加的配置是fs.hdfs.impl,值为DistributedRaidFileSystem,该类大部分功能主要由DistributedFileSystem来完成的,由配置项fs.raid.underlyingfs.impl来指定,DistributedRaidFileSystem自己有一个很大的功能就是数据恢复功能,当用户在gateway上面get数据时,如果要获取的数据出现数据块损坏,这个时候就由DistributedRaidFileSystem来完成数据恢复功能,同时gateway上面提交的job要处理的数据刚好数据块丢失时,也可以使用DistributedRaidFileSystem来恢复,hdfs-site.xml配置如下:
<!-- raid begin --> <property> <name>fs.hdfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedRaidFileSystem</value> <description>The FileSystem for hdfs: uris.</description> </property> <property> <name>fs.raid.underlyingfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> <description>The raid FileSystem underlying impl.</description> </property>
在gateway上面提交mapreduce 作业时,MRAppMaster启动失败,jobhistory上面显示报找不着DistributedRaidFileSystem类错误。
2014-12-24 23:04:14,518 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedRaidFileSystem not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedRaidFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2379)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getFileSystem(MRAppMaster.java:496)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:284)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1459)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedRaidFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 16 more
这个错误很好理解,在提交job的时候,JobConf(继承类Configuration)会读取gateway上面的配置文件,包括core-site.xml、hdfs-site.xml、yarn-site.xml和mapred-site.xml文件,然后序列化成job.xml,并上传到hdfs,然后再NodeManager启动container运行MRAppMaster时,首先会将job.xml和其他文件,比如job.jar和分布式缓存等会被下载到本地,然后启动MRAppMaster,在MRAppMaster启动并初始化服务的时候有一个判断hdfs上面配置项yarn.app.mapreduce.am.staging-dir对应的目录是否存在,由于在NodeManager上面DistributedRaidFileSystem类对应的jar包没有部署,出现这个错误也能够给理解。
接下来就解决这个问题,我们的hdfs raid在tools下面,我手动将tools下面raid的jar包设置到NodeManager启动classpath里面去,这样NodeManager重启时就能拿到raid下面的class了。然后重启NodeManager,在gateway上面提交job,但是发现还是报相同的错误,既找不着类,我看了一下NodeManager启动的classpath:
-classpath /usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-2.4.0/etc/hadoop:/usr/local/hadoop-2.4.0/share/hadoop/tools:/usr/local/hadoop-2.4.0/share/hadoop/tools/*:/usr/local/hadoop-2.4.0/share/hadoop/tools/lib/hadoop-raid-2.4.0.jar:/usr/local/hadoop-2.4.0/share/hadoop/common/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/common/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/hdfs/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-2.4.0/share/hadoop/mapreduce/*:/usr/local/hadoop-2.4.0/contrib/capacity-scheduler/*.jar:/usr/local/hadoop-2.4.0/contrib/capacity-scheduler/*.jar:/usr/local/hadoop-2.4.0/share/hadoop/yarn/*:/usr/local/hadoop-2.4.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-2.4.0/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
在classpath中hadoop-raid-2.4.0.jar是存在的,那为啥还说找不着类呢,难不成MRAppMaster启动的时候不是继承父进程的classpath,没办法,那我就打印出container启动时给MRAppMaster设置的环境变量:
2014-12-24 13:11:46,547 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: liangjun: HADOOP_YARN_HOME=/usr/local/hadoop-2.4.0 2014-12-24 13:11:46,547 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: liangjun: CLASSPATH= $PWD:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/share/hadoop/common/*:$HADOOP_COMMON_HOME/share/hadoop/common/lib/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/*:$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*:$HADOOP_YARN_HOME/share/hadoop/yarn/*:$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/* 2014-12-24 13:11:46,547 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: heliangjun: HADOOP_TOKEN_FILE_LOCATION=/data0/hadoop/local/usercache/hadoop/appcache/application_1418709224786_2697/container_1418709224786_2697_01_000001/container_tokens
MRApps.setClasspath(environment,conf)方法进行环境变量设置,代码如下:
// Setup the CLASSPATH in environment // i.e. add { Hadoop jars, job jar, CWD } to classpath. Map<String, String> environment = new HashMap<String, String>(); MRApps.setClasspath(environment, conf); // Setup the environment variables for Admin first MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV), conf); // Setup the environment variables (LD_LIBRARY_PATH, etc) MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV), conf);进一步查看 MRApps.setClasspath( environment , conf )方法细节,发现该方法调用了MRApps.setMRFrameworkClasspath( environment, conf)方法来设置MR框架的classpath,代码如下:
@SuppressWarnings("deprecation") public static void setClasspath(Map<String, String> environment, Configuration conf) throws IOException { boolean userClassesTakesPrecedence = conf.getBoolean(MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, false); String classpathEnvVar = conf.getBoolean(MRJobConfig.MAPREDUCE_JOB_CLASSLOADER, false) ? Environment.APP_CLASSPATH.name() : Environment.CLASSPATH.name(); MRApps.addToEnvironment(environment, classpathEnvVar, crossPlatformifyMREnv(conf, Environment.PWD), conf); if (!userClassesTakesPrecedence) { MRApps.setMRFrameworkClasspath(environment, conf); } MRApps.addToEnvironment( environment, classpathEnvVar, MRJobConfig.JOB_JAR + Path.SEPARATOR + MRJobConfig.JOB_JAR, conf); MRApps.addToEnvironment( environment, classpathEnvVar, MRJobConfig.JOB_JAR + Path.SEPARATOR + "classes" + Path.SEPARATOR, conf); MRApps.addToEnvironment( environment, classpathEnvVar, MRJobConfig.JOB_JAR + Path.SEPARATOR + "lib" + Path.SEPARATOR + "*", conf); MRApps.addToEnvironment( environment, classpathEnvVar, crossPlatformifyMREnv(conf, Environment.PWD) + Path.SEPARATOR + "*", conf); // a * in the classpath will only find a .jar, so we need to filter out // all .jars and add everything else addToClasspathIfNotJar(DistributedCache.getFileClassPaths(conf), DistributedCache.getCacheFiles(conf), conf, environment, classpathEnvVar); addToClasspathIfNotJar(DistributedCache.getArchiveClassPaths(conf), DistributedCache.getCacheArchives(conf), conf, environment, classpathEnvVar); if (userClassesTakesPrecedence) { MRApps.setMRFrameworkClasspath(environment, conf); } }
回归正题,MRApps.setMRFrameworkClasspath(environment, conf)方法的代码:
private static void setMRFrameworkClasspath( Map<String, String> environment, Configurationconf) throws IOException { // Propagate the system classpath when using the mini cluster if (conf.getBoolean(YarnConfiguration.IS_MINI_YARN_CLUSTER,false)) { MRApps.addToEnvironment(environment, Environment.CLASSPATH.name(), System.getProperty("java.class.path"),conf); } booleancrossPlatform = conf.getBoolean(MRConfig.MAPREDUCE_APP_SUBMISSION_CROSS_PLATFORM, MRConfig.DEFAULT_MAPREDUCE_APP_SUBMISSION_CROSS_PLATFORM); // if the framework is specified then only use the MRclasspath String frameworkName = getMRFrameworkName(conf); if (frameworkName ==null) { // Add standard Hadoop classes for (String c :conf.getStrings(YarnConfiguration.YARN_APPLICATION_CLASSPATH, crossPlatform ? YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH : YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH)) { MRApps.addToEnvironment(environment, Environment.CLASSPATH.name(), c.trim(), conf); } } booleanfoundFrameworkInClasspath = (frameworkName ==null); for (String c :conf.getStrings(MRJobConfig.MAPREDUCE_APPLICATION_CLASSPATH, crossPlatform ? StringUtils.getStrings(MRJobConfig.DEFAULT_MAPREDUCE_CROSS_PLATFORM_APPLICATION_CLASSPATH) : StringUtils.getStrings(MRJobConfig.DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH))) { MRApps.addToEnvironment(environment, Environment.CLASSPATH.name(), c.trim(), conf); if (!foundFrameworkInClasspath) { foundFrameworkInClasspath =c.contains(frameworkName); } } if (!foundFrameworkInClasspath) { throw new IllegalArgumentException( "Could not locate MapReduce framework name '" +frameworkName + "' in " + MRJobConfig.MAPREDUCE_APPLICATION_CLASSPATH); } // TODO: Remove duplicates. }
该方法设置了两个常量值到MRAppMaster的classpath里面去:
一个是yarn相关的,YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH:
/** * <p> * Default platform-specific CLASSPATH for YARN applications. A * comma-separated list of CLASSPATH entries constructed based on the client * OS environment expansion syntax. * </p> * <p> * Note: Use {@link DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH} for * cross-platform practice i.e. submit an application from a Windows client to * a Linux/Unix server or vice versa. * </p> */ public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { ApplicationConstants.Environment.HADOOP_CONF_DIR.$(), ApplicationConstants.Environment.HADOOP_COMMON_HOME.$() + "/share/hadoop/common/*", ApplicationConstants.Environment.HADOOP_COMMON_HOME.$() + "/share/hadoop/common/lib/*", ApplicationConstants.Environment.HADOOP_HDFS_HOME.$() + "/share/hadoop/hdfs/*", ApplicationConstants.Environment.HADOOP_HDFS_HOME.$() + "/share/hadoop/hdfs/lib/*", ApplicationConstants.Environment.HADOOP_YARN_HOME.$() + "/share/hadoop/yarn/*", ApplicationConstants.Environment.HADOOP_YARN_HOME.$() + "/share/hadoop/yarn/lib/*" };
** * Default platform-specific CLASSPATH for all YARN MapReduce applications * constructed based on client OS syntax. * <p> * Note: Use {@link DEFAULT_MAPREDUCE_CROSS_PLATFORM_APPLICATION_CLASSPATH} * for cross-platform practice i.e. submit an application from a Windows * client to a Linux/Unix server or vice versa. * </p> */ public final String DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH = Shell.WINDOWS ? "%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\*," + "%HADOOP_MAPRED_HOME%\\share\\hadoop\\mapreduce\\lib\\*" : "$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*," + "$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*";