Yarn/MRv2中MapReduce的启动过程之Client端
Hadoop版本0.23.1
Shell端
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar wordcount input output
Client端
1、 bin/hadoop文件
(该文件主要用于解析hadoop的命令参数,并传给相应的Java类进行处理,其中与运行WordCount相关代码如下)
#将第一个参数即字符串jar传给参数COMMAND COMMAND=$1 #判断参数COMMAND的值,如果是jar,则将参数CLASS设为org.apache.hadoop.util.RunJar elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar #执行java命令,相当于$JAVA_HOME/bin/java org.apache.hadoop.util.RunJar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar wordcount input output exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
2、 RunJar.java
(该java文件用于加载参数传递过来的jar包并执行,相关代码如下)
int firstArg = 0; //初始化加载jar包的参数,注意在这里fileName值为args[0],++操作先赋值后递增 String fileName = args[firstArg++]; File file = new File(fileName); String mainClassName = null; JarFile jarFile; try { jarFile = new JarFile(fileName); } catch(IOException io) { throw new IOException("Error opening job jar: " + fileName) .initCause(io); } /*获取jar包的mainClassName,用WinRAR打开hadoop-mapreduce-examples-0.23.1.jar,在META-INF目录下的MANIFEST.MF文件中可以看到Main-Class: org.apache.hadoop.examples.ExampleDriver,这是在打包时生成的。定义这个class在pom.xml中,代码如下 <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>org.apache.hadoop.examples.ExampleDriver</mainClass> </manifest> </archive> </configuration> </plugin>*/ Manifest manifest = jarFile.getManifest(); if (manifest != null) { mainClassName = manifest.getMainAttributes().getValue("Main-Class"); } jarFile.close(); if (mainClassName == null) { if (args.length < 2) { System.err.println(usage); System.exit(-1); } mainClassName = args[firstArg++]; } mainClassName = mainClassName.replaceAll("/", "."); File tmpDir = new File(new Configuration().get("hadoop.tmp.dir")); ensureDirectory(tmpDir); //创建jar包运行的临时目录 final File workDir; try { workDir = File.createTempFile("hadoop-unjar", "", tmpDir); } catch (IOException ioe) { // If user has insufficient perms to write to tmpDir, default // "Permission denied" message doesn't specify a filename. System.err.println("Error creating temp dir in hadoop.tmp.dir " + tmpDir + " due to " + ioe.getMessage()); System.exit(-1); return; } if (!workDir.delete()) { System.err.println("Delete failed for " + workDir); System.exit(-1); } ensureDirectory(workDir); //添加运行结束后执行hook,用于删除临时文件 Runtime.getRuntime().addShutdownHook(new Thread() { public void run() { FileUtil.fullyDelete(workDir); } }); unJar(file, workDir); //初始化CLASSPATH ArrayList<URL> classPath = new ArrayList<URL>(); classPath.add(new File(workDir+"/").toURI().toURL()); classPath.add(file.toURI().toURL()); classPath.add(new File(workDir, "classes/").toURI().toURL()); File[] libs = new File(workDir, "lib").listFiles(); if (libs != null) { for (int i = 0; i < libs.length; i++) { classPath.add(libs[i].toURI().toURL()); } } ClassLoader loader = new URLClassLoader(classPath.toArray(new URL[0])); //利用反射加载jar包中的mainclass Thread.currentThread().setContextClassLoader(loader); Class<?> mainClass = Class.forName(mainClassName, true, loader); Method main = mainClass.getMethod("main", new Class[] { Array.newInstance(String.class, 0).getClass() }); String[] newArgs = Arrays.asList(args) .subList(firstArg, args.length).toArray(new String[0]); try { main.invoke(null, new Object[] { newArgs }); } catch (InvocationTargetException e) { throw e.getTargetException(); } }
3、 ExampleDriver.java
(在执行wordcount时,命令中并没有执行wordcount的类,只有一个字符串“wordcount”,ExampleDriver就是将这个字符串解析成对应的类,并通过ProgramDriver调用,相关代码如下)
//初始化ProgramDriver,并添加wordcount和其对应的类 ProgramDriver pgd = new ProgramDriver(); try { pgd.addClass("wordcount", WordCount.class, "A map/reduce program that counts the words in the input files."); … //执行传递进来的参数,即wordcount exitCode = pgd.driver(argv); } catch(Throwable e){ e.printStackTrace(); }
4、 ProgramDriver.java
(wordcount被传递给driver,在这里将真正执行WordCount.class)
public int driver(String[] args) throws Throwable { … //通过参数wordcount获取封装了WordCount.class的ProgramDescription ProgramDescription pgm = programs.get(args[0]); if (pgm == null) { System.out.println("Unknown program '" + args[0] + "' chosen."); printUsage(programs); return -1; } //通过反射调用WordCount.class的main方法 // Remove the leading argument and call main String[] new_args = new String[args.length - 1]; for(int i=1; i < args.length; ++i) { new_args[i-1] = args[i]; } pgm.invoke(new_args); return 0; }
5、 WordCount.java
(WordCount没什么好说的,初始化job的一些参数,提交job)
public static void main(String[] args) throws Exception { … Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); //在这里通过waitForCompletion(true)提交Job System.exit(job.waitForCompletion(true) ? 0 : 1); }
6、 之后,WordCount将在Job中通过JobSubmitter提交到实现了ClientProtocol协议的类去真正提交Job。