hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar

之前介绍了我在Ubuntu下组合虚拟机Centos6.4搭建hadoop2.7.2集群,为了做mapreduce开发,要使用eclipse,并且需要对应的hadoop插件hadoop-eclipse-plugin-2.7.2.jar,首先说明一下,在hadoop1.x之前官方hadoop安装包中都自带有eclipse的插件,而如今随着程序员的开发工具eclipse版本的增多和差异,hadoop插件也必须要和开发工具匹配,hadoop的插件包也不可能全部兼容.为了简化,如今的hadoop安装包内不会含有eclipse的插件.需要各自根据自己的eclipse自行编译.

使用ant制作自己的eclipse插件,介绍一下我的环境和工具

Ubuntu 14.04,(系统不重要Win也可以,方法都一样)ide工具eclipse-jee-mars-2-linux-gtk-x86_64.tar.gz

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第1张图片

ant(这个也随意,二进制安装或者apt-get安装都可以,配置好环境变量)

export ANT_HOME=/usr/local/ant/apache-ant-1.9.7
export PATH=$PATH:$ANT_HOME/bin

如果提示找不到ant的launcher.ja包,添加环境变量

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/toos.jar:$ANT_HOME/lib/ant-launcher.jar

hadoop@hadoop:~$ ant -version
Apache Ant(TM) version 1.9.7 compiled on April 9 2016
ant制作eclipse插件需要获取ant的hadoop2x-eclipse-plugin插件,下面是github提供的资源网址

https://github.com/winghc/hadoop2x-eclipse-plugin

以zip格式下载,然后解压到一个合适的路径下.注意路径的权限和目录所有者是当前用户下的

三个编译工具和资源的路径如下

hadoop@hadoop:~$ cd hadoop2x-eclipse-plugin-master
hadoop@hadoop:hadoop2x-eclipse-plugin-master$ pwd
/home/hadoop/hadoop2x-eclipse-plugin-master
hadoop@hadoop:hadoop2x-eclipse-plugin-master$ cd /opt/software/hadoop-2.7.2
hadoop@hadoop:hadoop-2.7.2$ pwd
/opt/software/hadoop-2.7.2
hadoop@hadoop:hadoop-2.7.2$ cd /home/hadoop/eclipse/
hadoop@hadoop:eclipse$ pwd
/home/hadoop/eclipse

根据github说明部分:如何制作,按照操作进行ant

解压下载过来的hadoop2x-eclipse-plugin,进入其中目录hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/执行操作

How to build

[hdpusr@demo hadoop2x-eclipse-plugin]$ cd src/contrib/eclipse-plugin

# Assume hadoop installation directory is /usr/share/hadoop

[hdpusr@apclt eclipse-plugin]$ ant jar -Dversion=2.4.1 -Dhadoop.version=2.4.1 -Declipse.home=/opt/eclipse -Dhadoop.home=/usr/share/hadoop

final jar will be generated at directory

${hadoop2x-eclipse-plugin}/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.4.1.jar

但是此时我需要的是2.7.2的eclilpse插件,而github下载过来的hadoop2x-eclipse-plugin配置是hadoop2.6的编译环境,所以执行ant之前需要需要修改ant的build.xml配置文件以及相关文件

第一个文件: hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml

在第83行 找到 标签,添加和修改copy子标签标签一下内容

也就是127行下面

    
    
    

然后找到标签

 lib/servlet-api-${servlet-api.version}.jar,
 lib/commons-io-${commons-io.version}.jar,
 lib/htrace-core-${htrace.version}-incubating.jar"/>

保存退出.注意如果不修改这个,即便你编译完成jar包,放到eclipse中,配置链接会报错的

但是只是添加和修改这些lib是不行的,hadoop2.6到hadoop2.7中share/home/common/lib/下的jar版本都是有很多不同的,因此还需要修改相应的jar版本..这个耗费了我半天的时间啊.一个个的对号修改.

注意这个版本的环境配置文件在hadoop2x-eclipse-plugin-master跟目录的ivy目录下,也就hihadoop2x-eclipse-plugin-master/ivy/libraries.properties中

最终修改如下所示

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第2张图片

为了方便大家,我复制过来,#覆盖的就是原来的配置

hadoop.version=2.7.2
hadoop-gpl-compression.version=0.1.0

#These are the versions of our dependencies (in alphabetical order)
apacheant.version=1.7.0
ant-task.version=2.0.10

asm.version=3.2
aspectj.version=1.6.5
aspectj.version=1.6.11

checkstyle.version=4.2

commons-cli.version=1.2
commons-codec.version=1.4
# commons-collections.version=3.2.1
commons-collections.version=3.2.2
commons-configuration.version=1.6
commons-daemon.version=1.0.13
# commons-httpclient.version=3.0.1
commons-httpclient.version=3.1
commons-lang.version=2.6
# commons-logging.version=1.0.4
commons-logging.version=1.1.3
# commons-logging-api.version=1.0.4
commons-logging-api.version=1.1.3
# commons-math.version=2.1
commons-math.version=3.1.1
commons-el.version=1.0
commons-fileupload.version=1.2
# commons-io.version=2.1
commons-io.version=2.4
commons-net.version=3.1
core.version=3.1.1
coreplugin.version=1.3.2

# hsqldb.version=1.8.0.10
# htrace.version=3.0.4
hsqldb.version=2.0.0
htrace.version=3.1.0

ivy.version=2.1.0

jasper.version=5.5.12
jackson.version=1.9.13
#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy
# but still declared here as we are going to have a local copy from the lib folder
jsp.version=2.1
jsp-api.version=5.5.12
jsp-api-2.1.version=6.1.14
jsp-2.1.version=6.1.14
# jets3t.version=0.6.1
jets3t.version=0.9.0
jetty.version=6.1.26
jetty-util.version=6.1.26
# jersey-core.version=1.8
# jersey-json.version=1.8
# jersey-server.version=1.8
jersey-core.version=1.9
jersey-json.version=1.9
jersey-server.version=1.9
# junit.version=4.5
junit.version=4.11
jdeb.version=0.8
jdiff.version=1.0.9
json.version=1.0

kfs.version=0.1

log4j.version=1.2.17
lucene-core.version=2.3.1

mockito-all.version=1.8.5
jsch.version=0.1.42

oro.version=2.0.8

rats-lib.version=0.5.1

servlet.version=4.0.6
servlet-api.version=2.5
# slf4j-api.version=1.7.5
# slf4j-log4j12.version=1.7.5
slf4j-api.version=1.7.10
slf4j-log4j12.version=1.7.10


wagon-http.version=1.0-beta-2
xmlenc.version=0.52
# xerces.version=1.4.4
xerces.version=2.9.1

protobuf.version=2.5.0
guava.version=11.0.2
netty.version=3.6.2.Final
修改完成后,大工搞成,开始ant

进入src/contrib/eclipse-plugin/执行ant命令,如下

hadoop@hadoop:hadoop2x-eclipse-plugin-master$ cd src/contrib/eclipse-plugin/
hadoop@hadoop:eclipse-plugin$ ls
build.properties  build.xml.bak  ivy.xml      META-INF    resources
build.xml         ivy            makePlus.sh  plugin.xml  src
hadoop@hadoop:eclipse-plugin$ ant jar -Dhadoop.version=2.7.2 -Declipse.home=/home/hadoop/eclipse -Dhadoop.home=/opt/software/hadoop-2.7.2
这个过程第一次会慢点,后来就会很快

当最终显示如下,就表示ant制作成功

compile:
     [echo] contrib: eclipse-plugin
    [javac] /home/hadoop/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin/build.xml:76: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds

jar:
      [jar] Building jar: /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar

BUILD SUCCESSFUL
Total time: 4 seconds
hadoop@hadoop:eclipse-plugin$ 
然后将自己制作的插件放入到eclipse目录的plugins下

然后重启eclipse或者shell命令行刷新eclipse如下,同时也可以在shell中显示eclipse的运行过程,以及出错后及时发现原因

hadoop@hadoop:eclipse-plugin$ cp /home/hadoop/hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.7.2.jar /home/hadoop/eclipse/plugins/
hadoop@hadoop:eclipse-plugin$ /home/hadoop/eclipse/eclipse -clean

选择自己的workspace,进入eclipse,点击windows选择preferences后,在列表中可以发现多出来一个Hadoop Map/Reduce,选择一个安装目录

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第3张图片


此时在eclipse的Project Explorer中出现了一个分布式文件系统,点击Windows-->show View,选择MapReduce Tools

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第4张图片

打开MR Locations窗口,出现了亲切的大象图标,然后选择添加一个M/R配置,并且配置如下

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第5张图片

当然这里的Location name随便填,然后是Map/Reduce的Master这里要和你自己配置的hadoop集群或者为分布式的core-site.xml和mapred-sitexml文件一一对应,配置错了就会显示链接失败

我的配置如下,所以Host为hadoop(主节点名称),这个也可以写自己的配置主节点的ip地址,端口号分别是9000(文件系统的主机端口号)和9001(Mapreduce的管理节点joptracker主机端口号)

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第6张图片


然后启动hadoop集群,在shell中简单测试一下然后通过eclipse的DFS Locations进行文件传输测试,以及使用FileSystem接口编程和MapReduce的API编程测试,这里只是为了验证这个插件是否可用,hdfs自己测试一下,很简单,这里测试一个mr程序.电话统计,格式如下,左边是拨打电话,右面是被打电话,统计被打的电话次数排名,并显示拨打者

11500001211 10086
11500001212 10010
15500001213 110
15500001214 120
11500001211 10010
11500001212 10010
15500001213 10086
15500001214 110
代码部分如下

package hdfs;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MR extends Configured implements Tool {
	
	enum Counter{
		LINESKIP,
	}

	public static class WCMapper extends Mapper {
		@Override
		protected void map(LongWritable key, Text value,Context context)
				throws IOException, InterruptedException {
			String line = value.toString();
			
			try {
				String[] lineSplit = line.split(" ");
				String anum = lineSplit[0];
				String bnum = lineSplit[1];
				context.write(new Text(bnum), new Text(anum));
			} catch (Exception e) {
				context.getCounter(Counter.LINESKIP).increment(1);//出错计数器+1
				return;
			}
		}
	}

	public static class IntSumReduce extends Reducer {
		@Override
		protected void reduce(Text key, Iterable values,Context context)
				throws IOException, InterruptedException {
			
			String valueString;
			String out="";
			for(Text value: values){
				valueString = value.toString();
				out+=valueString+"|";
			}
			
			context.write(key, new Text(out));
		}
	}

	public int run(String[] args) throws Exception {
		Configuration conf = new Configuration();
		String[] strs = new GenericOptionsParser(conf, args).getRemainingArgs();
		Job job = parseInputAndOutput(this, conf, args);
		
		job.setJarByClass(MR.class);
		FileInputFormat.addInputPath(job, new Path(strs[0]));
		FileOutputFormat.setOutputPath(job, new Path(strs[1]));

		job.setMapperClass(WCMapper.class);
		job.setInputFormatClass(TextInputFormat.class);
		//job.setCombinerClass(IntSumReduce.class);
		job.setReducerClass(IntSumReduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		return job.waitForCompletion(true) ? 0 : 1;
	}

	public Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) throws Exception {

		// validate
		if (args.length != 2) {
			System.err.printf("Usage: %s [generic options]   \n");
			return null;
		}

		// step 2:create job
		Job job = Job.getInstance(conf, tool.getClass().getSimpleName());
		return job;
	}

	public static void main(String[] args) throws Exception {
		// run map reduce
		int status = ToolRunner.run(new MR(), args);
		// step 5 exit
		System.exit(status);
	}

}
上传文件结构如下

hadoop@hadoop:~$ hdfs dfs -mkdir -p /user/hadoop/mr/wc/input
hadoop@hadoop:~$ hdfs dfs -put top.data /user/hadoop/mr/wc/input
 

在eclipse中进行运行mr程序

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第7张图片




执行成功,在eclipse控制台输出执行步骤,查看执行结果

hadoop 2.7.2 (hadoop2.x)使用ant制作eclipse插件hadoop-eclipse-plugin-2.7.2.jar_第8张图片

说明插件没有任何问题


你可能感兴趣的:(hadoop2.x,ubuntu,java)