Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(三)

接着Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(二) 现在我们又有了新的需求,我们需要根据用户的手机号码所属的不同省份用不同的reduce来进行处理。

1.环境:Centos 6.5  32位, 在linux环境中开发。

2.核心代码如下:

2.1 Mapper类。

package com.npf.hadoop.partition;

import java.io.IOException;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import com.npf.hadoop.FlowBean;

public class FlowCountPartitionMapper extends Mapper<LongWritable, Text, Text, FlowBean>{

        private FlowBean flowBean = new FlowBean();

        private Text phoneNumKey = new Text();

        @Override
        protected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {
                String line = value.toString();
                String[] fields = StringUtils.split(line, "\t");
                String phoneNumber = fields[1];
                long upFlow = Long.valueOf(fields[fields.length - 3]);
                long downFlow = Long.valueOf(fields[fields.length - 2]);
                flowBean.setPhoneNumber(phoneNumber);
                flowBean.setUpFlow(upFlow);
                flowBean.setDownFlow(downFlow);
                flowBean.setSumFlow(upFlow + downFlow);
                phoneNumKey.set(phoneNumber);
                context.write(phoneNumKey, flowBean);
        }


}
2.2 Reducer类。

package com.npf.hadoop.partition;

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import com.npf.hadoop.FlowBean;

public class FlowCountPartitionReducer extends Reducer<Text, FlowBean, Text, FlowBean> {

        private FlowBean flowBean = new FlowBean();

        @Override
        protected void reduce(Text key, Iterable<FlowBean> iterable,Context context)throws IOException, InterruptedException {
                long upFlow = 0L;
                long downFlow = 0L;
                long sumFlow = 0L;
                for(FlowBean bean : iterable){
                        upFlow = upFlow + bean.getUpFlow();
                        downFlow = downFlow + bean.getDownFlow();
                }
                sumFlow = upFlow + downFlow;
                flowBean.setPhoneNumber(key.toString());
                flowBean.setDownFlow(downFlow);
                flowBean.setUpFlow(upFlow);
                flowBean.setSumFlow(sumFlow);
                context.write(key, flowBean);
        }


}

2.3 MyPartition类。

136开头的号码存储到0号reduce中,137开头的号码存储到1号reduce中138开头的号码存储到2号reduce中139开头的号码存储到3号reduce中,其他的存储在第4号reduce中。

package com.npf.hadoop.partition;

import java.util.HashMap;
import java.util.Map;

import org.apache.hadoop.mapreduce.Partitioner;

public class MyPartition<KEY,VALUE> extends Partitioner<KEY,VALUE>{

        private static Map<String,Integer> areaMap = new HashMap<String, Integer>();

        static {
                areaMap.put("136", 0);
                areaMap.put("137", 1);
                areaMap.put("138", 2);
                areaMap.put("139", 3);
        }

        @Override
        public int getPartition(KEY key, VALUE value, int numPartitions) {
                Integer code = areaMap.get(key.toString().substring(0, 3));
                return code == null ? 4 : code;
        }

}
2.3FlowCountPartitionRunner类。

package com.npf.hadoop.partition;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import com.npf.hadoop.FlowBean;

public class FlowCountPartitionRunner {

        public static void main(String[] args) throws Exception {

                Configuration configuration = new Configuration();
                Job job = Job.getInstance(configuration);
                job.setJarByClass(FlowCountPartitionRunner.class);

                job.setMapperClass(FlowCountPartitionMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setMapOutputValueClass(FlowBean.class);

                job.setReducerClass(FlowCountPartitionReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(FlowBean.class);

                job.setInputFormatClass(TextInputFormat.class);
                job.setOutputFormatClass(TextOutputFormat.class);

                job.setPartitionerClass(MyPartition.class);
                job.setNumReduceTasks(5);

                FileInputFormat.setInputPaths(job, new Path("hdfs://devcitibank:9000/flowCountJob/srcdata"));
                FileOutputFormat.setOutputPath(job, new Path("hdfs://devcitibank:9000/flowCountJob/outputdatapar"));
                job.waitForCompletion(true);
        }
}
3. 我们通过Eclipse将我们的程序打成一个Jar包,打到/root目录下面。Jar包的名字我们命名为flowpartitioncount.jar。

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(三)_第1张图片

4. ok, 我们来验证下在/root/目录下是否存在我们的Jar包。

5. 验证hadoop集群是否启动。

6. 验证我们在集群中的/flowCountJob/srcdata目录下面是否有我们需要处理的文件。

7.提交flowpartitioncount.jar到hadoop集群中去处理。


8. 执行成功后,我们去hadoop集群中去查看结果。我们有5个Reducer实例,所以产生0~4个处理文件。

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(三)_第2张图片

分别查看这个5个文件。

part-r-00000

part-r-00001

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(三)_第3张图片

part-r-00002

part-r-00003

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(三)_第4张图片

part-r-00004


9. 源代码已托管到GitHub上面: https://github.com/HadoopOrganization/HadoopMapReducerFlowCount

你可能感兴趣的:(Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(三))