Hadoop Mapreduce之WordCount实现

1.新建一个WCMapper继承Mapper
public class WCMapper extends Mapper {
       @Override
       protected void map(LongWritable key , Text value , Context context )
                   throws IOException, InterruptedException {
             //接收数据V1
            String line = value .toString();
             //切分数据
            String[] wordsStrings = line .split( " " );
             //循环
             for (String w : wordsStrings ) {
                   //出现一次,记一个一,输出
                   context .write( new Text( w ), new LongWritable(1));
            }
      }
}
 
2.新建一个WCReducer继承Reducer
public class WCReducer extends Reducer {
       @Override
       protected void reduce(Text key , Iterable v2s , Context context )
                   throws IOException, InterruptedException {
             // TODO Auto-generated method stub
             //接收数据
             //Text k3 = k2;
             //定义一个计算器
             long counter = 0;
             //循环v2s
             for (LongWritable i : v2s )
            {
                   counter += i .get();
            }
             //输出
             context .write( key , new LongWritable( counter ));
      }
}
3.WordCount类实现Main方法
/*
 * 1.分析具体的业力逻辑,确定输入输出数据样式
 * 2.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Mapper;
 * 重写map方法,实现具体业务逻辑,将新的kv输出
 * 3.自定义一个类,这个类要继承import org.apache.hadoop.mapreduce.Reducer;
 * 重写reduce,实现具体业务逻辑
 * 4.将自定义的mapper和reducer通过job对象组装起来
 */
public class WordCount {
       public static void main(String[] args ) throws Exception {
             // 构建Job对象
            Job job = Job.getInstance( new Configuration());
            
             // 注意:main方法所在的类
             job .setJarByClass(WordCount. class );
            
             // 设置Mapper相关属性
             job .setMapperClass(WCMapper. class );
             job .setMapOutputKeyClass(Text. class );
             job .setMapOutputValueClass(LongWritable. class );
            FileInputFormat.setInputPaths( job , new Path( "/words.txt" ));
            
             // 设置Reducer相关属性
             job .setReducerClass(WCReducer. class );
             job .setOutputKeyClass(Text. class );
             job .setOutputValueClass(LongWritable. class );
            FileOutputFormat.setOutputPath( job , new Path( "/wcount619" ));
             // 提交任务
             job .waitForCompletion( true );
      }     
}
 
 
4.打包为wc.jar,并上传到linux,并在Hadoop下运行
     hadoop jar /root/wc.jar

转载于:https://www.cnblogs.com/dulixiaoqiao/p/6985237.html

你可能感兴趣的:(大数据)