hadoop idea本地开发环境搭建

1.新建java工程用于开发Mapper和Reducer
2.导入hadoop的依赖包,依赖包在hadoop解压目录下的share/hadoop可以找到,添加成功如图所示:
hadoop idea本地开发环境搭建_第1张图片
3.添加Artifacts-》添加一个空的JAR包
hadoop idea本地开发环境搭建_第2张图片
添加Module Output选择当前目录
hadoop idea本地开发环境搭建_第3张图片
3.新建Application
hadoop idea本地开发环境搭建_第4张图片
hadoop idea本地开发环境搭建_第5张图片
Main class中输入org.apache.hadoop.util.RunJar
Working directory为当前目录
Program arguments,这个是设置默认参数的会在程序执行的时候传递进去

D:\workspace\wordCount\out\artifacts\wordCount\wordCount.jar 
Main 
input 
output

第一个是jar包所在的位置
第二个是Main函数所在的类 (如果有包名需要写全,如com.company.Main)
第三四两个参数是由自己决定的(这两个参数会作为args[0]和args[1]传入)

4.分别创建Mapper类,Reducer类,以及Main类

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordCountMapper extends Mapper {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()){
            word.set(itr.nextToken());
            context.write(word,one);
        }
    }
}

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordCountReducer extends Reducer {

    private IntWritable result = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for(IntWritable val:values){
            sum += val.get();
        }

        result.set(sum);
        context.write(key,result);
    }
}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        // write your code here

        Configuration configuration = new Configuration();

        if(args.length!=2){
            System.err.println("Usage:wordcount ");
            System.exit(2);
        }

        Job job = new Job(configuration,"word count");

        job.setJarByClass(Main.class);
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        System.exit(job.waitForCompletion(true)?0:1);
    }
}

5.在工程底下创建目录input放入文件
hadoop idea本地开发环境搭建_第6张图片
6.build artifacts 生成对应的jar包
7.执行对应的application

hadoop idea本地开发环境搭建_第7张图片
8.工程底下生成output文件夹
hadoop idea本地开发环境搭建_第8张图片
第二次运行的时候因为hadoop不会自动删除output目录所以可能会出现错误,请手动删除之后再运行。
这样就可以使用intellij来开发hadoop程序并进行调试了。

你可能感兴趣的:(hadoop)