windows下eclipse项目wordcount操作hadoop2.8.3 mapreduce(3)

hadoop环境搭建详情见hadoop系列第一篇博客(hadoop配置直接影响到本程序的运行)   

另外,windows环境下运行mapreduce程序需要hadoop.dll与winutils.exe的支持https://github.com/steveloughran/winutils

本次示例为hadoop2.8.3,把对应版本的hadoop.dll与winutils.exe复制到本地hadoop文件夹的bin目录下,并把hadoop.dll复制一份到windows系统C:\Windows\System32中(本地hadoop无需修改etc目录下的配置,但要设置Windows系统环境变量HADOOP_HOME与path并加载到eclipse中)

数据准备:

[hadoop@yourname ~]$ hadoop dfs -mkdir /wordcount
[hadoop@yourname ~]$ hadoop dfs -mkdir /wordcount/input
[hadoop@yourname ~]$ hadoop dfs -copyFromLocal test.txt /wordcount/input/

yourname详见hadoop系列第一篇博客;hadoop是登录linux系统的用户名;~指/home/hadoop目录;test.txt是在/home/hadoop目录下,上传到hdfs中/wordcount/input/目录下

test.txt

test hadoop
hello hadoop
package com.hadoop.test;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCount {

    public static class WordCountMapper extends Mapper {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        
        @Override
        protected void map(Object key, Text value, Mapper.Context context)
        		throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
    }

    public static class WordCountReducer extends Reducer {
        private IntWritable result = new IntWritable();

        @Override
        protected void reduce(Text key, Iterable values,
        		Reducer.Context output) throws IOException, InterruptedException {
            int sum = 0;
            for(IntWritable value : values){
            	sum += value.get();
            }
            result.set(sum);
            output.write(key, result);
        }
    }
    
    
    public static void main(String[] args) throws Exception {
        String input = "hdfs://192.168.1.101:9000/wordcount/input";
        String output = "hdfs://192.168.1.101:9000/wordcount/output";

        
        Configuration conf = new Configuration();
        //配置信息不可缺少
        conf.set("mapreduce.framework.name","yarn");
        conf.set("yarn.resourcemanager.hostname","192.168.1.101");
        conf.set("fs.defaultFS","hdfs://192.168.1.101:9000/");
        conf.set("mapreduce.app-submission.cross-platform", "true");
        conf.set("mapreduce.jobhistory.address", "192.168.1.101:10020");
        
        Job job = Job.getInstance(conf);
        job.setJarByClass(WordCount.class);
        job.setJar("E:/wordcount.jar");
        job.setJobName("WordCount");
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(WordCountMapper.class);
        job.setCombinerClass(WordCountReducer.class);
        job.setReducerClass(WordCountReducer.class);

        FileInputFormat.addInputPath(job, new Path(input));
        FileOutputFormat.setOutputPath(job, new Path(output));

        job.waitForCompletion(true);
        System.exit(0);
    }

}

不配置conf.set("mapreduce.jobhistory.address", "192.168.1.101:10020");
异常:java.io.IOException:java.net.ConnectException: Call From yourname/192.168.182.100 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused

        若没有指定配置信息,mapreduce则不会运行在远程的linux中(http://192.168.1.101:8088/cluster中无运行记录)
        conf.set("mapreduce.framework.name","yarn");
        conf.set("yarn.resourcemanager.hostname","192.168.1.101");
        conf.set("fs.defaultFS","hdfs://192.168.1.101:9000/");
        conf.set("mapreduce.app-submission.cross-platform", "true");//指定远程跨平台运行
        conf.set("mapreduce.jobhistory.address", "192.168.1.101:10020");

job.setJar("E:/wordcount.jar");    设置jar包,运行程序之前需要先将应用打包并放在指定位置,否则报异常java.io.FileNotFoundException

运行mapreduce:sbin目录下启动start-dfs.sh    start-yarn.sh    mr-jobhistory-daemon.sh start historyserver

eclipse右键运行run on hadoop

通过DFS Locations查看运行结果:

windows下eclipse项目wordcount操作hadoop2.8.3 mapreduce(3)_第1张图片

 

 

 

 

在eclipse中可双击part-r-00000打开查看结果(命令行形式:hadoop dfs -cat /wordcount/output/part-r-00000):

hadoop    2
hello    1
test    1

若需要多次运行,则需要在运行前删掉output目录

如果本篇博客对你有帮助,请记得打赏给小哥哥哦丷丷。

windows下eclipse项目wordcount操作hadoop2.8.3 mapreduce(3)_第2张图片

你可能感兴趣的:(hadoop)