欢迎访问:鲁春利的工作笔记,学习是一种信仰,让时间考验坚持的力量。
系统:Win7 64位
JEE版本的Eclipse:Luna Release (4.4.0)
Hadoop:2.6.0
Hadoop-plugin:hadoop-eclipse-plugin-2.2.0.jar
0、写在前面
工作笔记之Hadoop2.6集群搭建 已经搭建好了hadoop的集群环境,通常情况下mapreduce的执行需要打成jar包提交到hadoop的集群,但为了测试的方便,现在准备具备mapreduce操作的eclipse环境。
1、插件安装
将hadoop-eclipse-plugin-2.2.0.jar复制到eclipse安装目录plugins下
2、环境配置
将hadoop-eclipse-plugin-2.2.0.jar复制到eclipse安装目录plugins下之后重启eclipse
a.) 查找mapreduce插件
b.) 新建hadoop location
c.) 配置Genernal
参数说明:
Location name: 自定义的名称 Map/Reduce(V2) Master : 指集群JobTracker的配置信息 与mapre-site.xml里面的mapreduce.jobtracker.address一致 DFS Master : 与core-site.xml文件里面的fs.defaultFS一致 配置为与Active NameNode一致,配置为cluster会将cluster作为主机名解析(解析失败) User name:配置为我在hadoop集群中使用的用户hadoop
说明:
Advanced Parameters里面的很多参数不清楚具体作用,这里就不再调整。
d.) 验证配置
可以看到hdfs上的目录了:
3、运行wordcount
Eclipse的hadoop插件已经集成成功,接下来就跑一个mapreduce的入门程序wordcount吧。
a.) 新建MapReduce Project
首先需要在本机解压hadoop安装程序,这样在创建mapreduce程序的时hadoop依赖的jar包会被自动引入。
b.) 准备程序
package com.invic.mapreduce.wordcount; import java.io.IOException; import java.util.StringTokenizer; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<Object, Text, Text, IntWritable> { private static final Log LOG = LogFactory.getLog(MyMapper.class); @Override public void map(Object key, Text value, Context context) throws IOException, InterruptedException { LOG.info("=====================mapper================"); LOG.info("key : " + key + "\tvalue : " + value); IntWritable one = new IntWritable(1); Text word = new Text(); StringTokenizer token = new StringTokenizer(value.toString()); while (token.hasMoreTokens()) { word.set(token.nextToken()); LOG.info(word.toString()); context.write(word, one); } } }
package com.invic.mapreduce.wordcount; import java.io.IOException; import java.util.Iterator; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; /** * * @author lucl * */ public class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private static final Log LOG = LogFactory.getLog(MyReducer.class); @Override public void reduce(Text key, Iterable<IntWritable> value, Context context) throws IOException, InterruptedException { LOG.info("=====================reducer================"); LOG.info("key " + key + "\tvalue : " + value); int result = 0; for (Iterator<IntWritable> it = value.iterator(); it.hasNext(); ) { IntWritable val = it.next(); LOG.info("\t\t : " + val.get()); result += val.get(); } LOG.info("total key : " + key + "\result : " + result); context.write(key, new IntWritable(result)); } }
package com.invic.mapreduce.wordcount; import java.io.IOException; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; /** * * @author lucl * */ public class WordCounterTool extends Configured implements Tool { private static final Log LOG = LogFactory.getLog(WordCounterTool.class); public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { // 这里需要设置系统参数,否则会包winutils.exe的错误 System.setProperty("hadoop.home.dir", "E:\\hadoop-2.6.0\\hadoop-2.6.0"); try { int exit = ToolRunner.run(new WordCounterTool(), args); LOG.info("result : " + exit); } catch (Exception e) { e.printStackTrace(); } } @Override public int run(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { LOG.info("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = Job.getInstance(); job.setJarByClass(WordCounterTool.class); job.setMapperClass(MyMapper.class); job.setCombinerClass(MyReducer.class); job.setReducerClass(MyReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); return job.waitForCompletion(true) ? 0 : 1; } }
c.) 运行MapReduce程序
选中WordCounterTool右键Run Configurations配置输入参数,点击“Run”按钮
data目录下file1.txt内容为:
hello world hello markhuang hello hadoop
data目录下file2.txt内容为:
hadoop ok hadoop fail hadoop 2.3
d.) 程序报错
15/07/19 22:17:31 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/tmp/hadoop-Administrator/mapred/staging/Administrator907501946/.staging/job_local907501946_0001 Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557) at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977) at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187) at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314) at com.invic.mapreduce.wordcount.WordCounterTool.run(WordCounterTool.java:60) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at com.invic.mapreduce.wordcount.WordCounterTool.main(WordCounterTool.java:31)
说明:
从网上下载hadoop2.6版本对应的hadoop.dll文件放到C:\Windows\System32目录下
e.) 再次执行
选中WordCounterTool右键Run AS --> Run On Hadoop,等一会后程序执行成功。
f.) 查看输出结果
总结:插件配置成功。