之前介绍了我在Ubuntu下组合虚拟机Centos6.4搭建hadoop2.7.2集群,为了做mapreduce开发,要使用eclipse,并且需要对应的hadoop插件hadoop-eclipse-plugin-2.7.2.jar,首先说明一下,在hadoop1.x之前官方hadoop安装包中都自带有eclipse的插件,而如今随着程序员的开发工具eclipse版本的增多和差异,hadoop插件也必须要和开发工具匹配,hadoop的插件包也不可能全部兼容.为了简化,如今的hadoop安装包内不会含有eclipse的插件.需要各自根据自己的eclipse自行编译.
使用ant制作自己的eclipse插件,介绍一下我的环境和工具
Ubuntu 14.04,(系统不重要Win也可以,方法都一样)ide工具eclipse-jee-mars-2-linux-gtk-x86_64.tar.gz
ant(这个也随意,二进制安装或者apt-get安装都可以,配置好环境变量)
export ANT_HOME=/usr/local/ant/apache-ant-1.9.7
export PATH=$PATH:$ANT_HOME/bin
如果提示找不到ant的launcher.ja包,添加环境变量
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/toos.jar:$ANT_HOME/lib/ant-launcher.jar
hadoop@hadoop:~$ ant -version Apache Ant(TM) version 1.9.7 compiled on April 9 2016ant制作eclipse插件需要获取ant的hadoop2x-eclipse-plugin插件,下面是github提供的资源网址
https://github.com/winghc/hadoop2x-eclipse-plugin
以zip格式下载,然后解压到一个合适的路径下.注意路径的权限和目录所有者是当前用户下的
三个编译工具和资源的路径如下
hadoop@hadoop:~$ cd hadoop2x-eclipse-plugin-master hadoop@hadoop:hadoop2x-eclipse-plugin-master$ pwd /home/hadoop/hadoop2x-eclipse-plugin-master hadoop@hadoop:hadoop2x-eclipse-plugin-master$ cd /opt/software/hadoop-2.7.2 hadoop@hadoop:hadoop-2.7.2$ pwd /opt/software/hadoop-2.7.2 hadoop@hadoop:hadoop-2.7.2$ cd /home/hadoop/eclipse/ hadoop@hadoop:eclipse$ pwd /home/hadoop/eclipse
解压下载过来的hadoop2x-eclipse-plugin,进入其中目录hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/执行操作
How to build [hdpusr@demo hadoop2x-eclipse-plugin]$ cd src/contrib/eclipse-plugin # Assume hadoop installation directory is /usr/share/hadoop [hdpusr@apclt eclipse-plugin]$ ant jar -Dversion=2.4.1 -Dhadoop.version=2.4.1 -Declipse.home=/opt/eclipse -Dhadoop.home=/usr/share/hadoop final jar will be generated at directory ${hadoop2x-eclipse-plugin}/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.4.1.jar
但是此时我需要的是2.7.2的eclilpse插件,而github下载过来的hadoop2x-eclipse-plugin配置是hadoop2.6的编译环境,所以执行ant之前需要需要修改ant的build.xml配置文件以及相关文件
第一个文件: hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml
在第83行 找到 <target name="jar" depends="compile" unless="skip.contrib">标签,添加和修改copy子标签标签一下内容
也就是127行下面
<copy file="${hadoop.home}/share/hadoop/common/lib/htrace-core-${htrace.version}-incubating.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/servlet-api-${servlet-api.version}.jar" todir="${build.dir}/lib" verbose="true"/> <copy file="${hadoop.home}/share/hadoop/common/lib/commons-io-${commons-io.version}.jar" todir="${build.dir}/lib" verbose="true"/>
然后找到标签<attribute name="Bundle-ClassPath"在齐总的value的列表中对应的添加和修改lib,如下
lib/servlet-api-${servlet-api.version}.jar, lib/commons-io-${commons-io.version}.jar, lib/htrace-core-${htrace.version}-incubating.jar"/>
但是只是添加和修改这些lib是不行的,hadoop2.6到hadoop2.7中share/home/common/lib/下的jar版本都是有很多不同的,因此还需要修改相应的jar版本..这个耗费了我半天的时间啊.一个个的对号修改.
注意这个版本的环境配置文件在hadoop2x-eclipse-plugin-master跟目录的ivy目录下,也就hihadoop2x-eclipse-plugin-master/ivy/libraries.properties中
最终修改如下所示
为了方便大家,我复制过来,#覆盖的就是原来的配置
hadoop.version=2.7.2 hadoop-gpl-compression.version=0.1.0 #These are the versions of our dependencies (in alphabetical order) apacheant.version=1.7.0 ant-task.version=2.0.10 asm.version=3.2 aspectj.version=1.6.5 aspectj.version=1.6.11 checkstyle.version=4.2 commons-cli.version=1.2 commons-codec.version=1.4 # commons-collections.version=3.2.1 commons-collections.version=3.2.2 commons-configuration.version=1.6 commons-daemon.version=1.0.13 # commons-httpclient.version=3.0.1 commons-httpclient.version=3.1 commons-lang.version=2.6 # commons-logging.version=1.0.4 commons-logging.version=1.1.3 # commons-logging-api.version=1.0.4 commons-logging-api.version=1.1.3 # commons-math.version=2.1 commons-math.version=3.1.1 commons-el.version=1.0 commons-fileupload.version=1.2 # commons-io.version=2.1 commons-io.version=2.4 commons-net.version=3.1 core.version=3.1.1 coreplugin.version=1.3.2 # hsqldb.version=1.8.0.10 # htrace.version=3.0.4 hsqldb.version=2.0.0 htrace.version=3.1.0 ivy.version=2.1.0 jasper.version=5.5.12 jackson.version=1.9.13 #not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy # but still declared here as we are going to have a local copy from the lib folder jsp.version=2.1 jsp-api.version=5.5.12 jsp-api-2.1.version=6.1.14 jsp-2.1.version=6.1.14 # jets3t.version=0.6.1 jets3t.version=0.9.0 jetty.version=6.1.26 jetty-util.version=6.1.26 # jersey-core.version=1.8 # jersey-json.version=1.8 # jersey-server.version=1.8 jersey-core.version=1.9 jersey-json.version=1.9 jersey-server.version=1.9 # junit.version=4.5 junit.version=4.11 jdeb.version=0.8 jdiff.version=1.0.9 json.version=1.0 kfs.version=0.1 log4j.version=1.2.17 lucene-core.version=2.3.1 mockito-all.version=1.8.5 jsch.version=0.1.42 oro.version=2.0.8 rats-lib.version=0.5.1 servlet.version=4.0.6 servlet-api.version=2.5 # slf4j-api.version=1.7.5 # slf4j-log4j12.version=1.7.5 slf4j-api.version=1.7.10 slf4j-log4j12.version=1.7.10 wagon-http.version=1.0-beta-2 xmlenc.version=0.52 # xerces.version=1.4.4 xerces.version=2.9.1 protobuf.version=2.5.0 guava.version=11.0.2 netty.version=3.6.2.Final修改完成后,大工搞成,开始ant
进入src/contrib/eclipse-plugin/执行ant命令,如下
hadoop@hadoop:hadoop2x-eclipse-plugin-master$ cd src/contrib/eclipse-plugin/ hadoop@hadoop:eclipse-plugin$ ls build.properties build.xml.bak ivy.xml META-INF resources build.xml ivy makePlus.sh plugin.xml src hadoop@hadoop:eclipse-plugin$ ant jar -Dhadoop.version=2.7.2 -Declipse.home=/home/hadoop/eclipse -Dhadoop.home=/opt/software/hadoop-2.7.2这个过程第一次会慢点,后来就会很快
当最终显示如下,就表示ant制作成功
compile: [echo] contrib: eclipse-plugin [javac] /home/hadoop/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml:76: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: [jar] Building jar: /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar BUILD SUCCESSFUL Total time: 4 seconds hadoop@hadoop:eclipse-plugin$然后将自己制作的插件放入到eclipse目录的plugins下
然后重启eclipse或者shell命令行刷新eclipse如下,同时也可以在shell中显示eclipse的运行过程,以及出错后及时发现原因
hadoop@hadoop:eclipse-plugin$ cp /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar /home/hadoop/eclipse/plugins/ hadoop@hadoop:eclipse-plugin$ /home/hadoop/eclipse/eclipse -clean选择自己的workspace,进入eclipse,点击windows选择preferences后,在列表中可以发现多出来一个Hadoop Map/Reduce,选择一个安装目录
打开MR Locations窗口,出现了亲切的大象图标,然后选择添加一个M/R配置,并且配置如下
当然这里的Location name随便填,然后是Map/Reduce的Master这里要和你自己配置的hadoop集群或者为分布式的core-site.xml和mapred-sitexml文件一一对应,配置错了就会显示链接失败
我的配置如下,所以Host为hadoop(主节点名称),这个也可以写自己的配置主节点的ip地址,端口号分别是9000(文件系统的主机端口号)和9001(Mapreduce的管理节点joptracker主机端口号)
然后启动hadoop集群,在shell中简单测试一下然后通过eclipse的DFS Locations进行文件传输测试,以及使用FileSystem接口编程和MapReduce的API编程测试,这里只是为了验证这个插件是否可用,hdfs自己测试一下,很简单,这里测试一个mr程序.电话统计,格式如下,左边是拨打电话,右面是被打电话,统计被打的电话次数排名,并显示拨打者
11500001211 10086 11500001212 10010 15500001213 110 15500001214 120 11500001211 10010 11500001212 10010 15500001213 10086 15500001214 110代码部分如下
package hdfs; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class MR extends Configured implements Tool { enum Counter{ LINESKIP, } public static class WCMapper extends Mapper<LongWritable, Text, Text, Text> { @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { String line = value.toString(); try { String[] lineSplit = line.split(" "); String anum = lineSplit[0]; String bnum = lineSplit[1]; context.write(new Text(bnum), new Text(anum)); } catch (Exception e) { context.getCounter(Counter.LINESKIP).increment(1);//出错计数器+1 return; } } } public static class IntSumReduce extends Reducer<Text, Text, Text, Text> { @Override protected void reduce(Text key, Iterable<Text> values,Context context) throws IOException, InterruptedException { String valueString; String out=""; for(Text value: values){ valueString = value.toString(); out+=valueString+"|"; } context.write(key, new Text(out)); } } public int run(String[] args) throws Exception { Configuration conf = new Configuration(); String[] strs = new GenericOptionsParser(conf, args).getRemainingArgs(); Job job = parseInputAndOutput(this, conf, args); job.setJarByClass(MR.class); FileInputFormat.addInputPath(job, new Path(strs[0])); FileOutputFormat.setOutputPath(job, new Path(strs[1])); job.setMapperClass(WCMapper.class); job.setInputFormatClass(TextInputFormat.class); //job.setCombinerClass(IntSumReduce.class); job.setReducerClass(IntSumReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); return job.waitForCompletion(true) ? 0 : 1; } public Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) throws Exception { // validate if (args.length != 2) { System.err.printf("Usage: %s [generic options] <input> <output> \n"); return null; } // step 2:create job Job job = Job.getInstance(conf, tool.getClass().getSimpleName()); return job; } public static void main(String[] args) throws Exception { // run map reduce int status = ToolRunner.run(new MR(), args); // step 5 exit System.exit(status); } }上传文件结构如下
hadoop@hadoop:~$ hdfs dfs -mkdir -p /user/hadoop/mr/wc/input hadoop@hadoop:~$ hdfs dfs -put top.data /user/hadoop/mr/wc/input
在eclipse中进行运行mr程序
执行成功,在eclipse控制台输出执行步骤,查看执行结果
说明插件没有任何问题