Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(一)

1.环境:Centos 6.5  32位, 在linux环境中开发。

2.核心代码如下:

因为我们需要统计用户的上行流量和下行流量以及总流量,所以很容易的想到Reduce的输 出的值应该是用一个Bean来表示。我们以用户的手机号码来作为key。

而这个bean需要在网络中传输,则需要被序列化,需要继承Writable,并且重写write(DataOutput out)和readFields(DataInput in)方法,write是用来序列化的,

readFields是用来反序列化的,并且write里面的序列化顺序和readFields里面的反序列化必须是一致的。

2.1 Mapper类。

package com.npf.hadoop;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class FlowCountMapper extends Mapper<LongWritable, Text, Text, FlowBean>{

        private FlowBean flowBean = new FlowBean();

        private Text phoneNumKey = new Text();

        @Override
        protected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {
                String line = value.toString();
                String[] fields = StringUtils.split(line, "\t");
                String phoneNumber = fields[1];
                long upFlow = Long.valueOf(fields[fields.length - 3]);
                long downFlow = Long.valueOf(fields[fields.length - 2]);
                flowBean.setPhoneNumber(phoneNumber);
                flowBean.setUpFlow(upFlow);
                flowBean.setDownFlow(downFlow);
                flowBean.setSumFlow(upFlow + downFlow);
                phoneNumKey.set(phoneNumber);
                context.write(phoneNumKey, flowBean);
        }
}
2.2 Reducer类。

package com.npf.hadoop;

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class FlowCountReducer extends Reducer<Text, FlowBean, Text, FlowBean> {

        private FlowBean flowBean = new FlowBean();

        @Override
        protected void reduce(Text key, Iterable<FlowBean> iterable,Context context)throws IOException, InterruptedException {
                long upFlow = 0L;
                long downFlow = 0L;
                long sumFlow = 0L;
                for(FlowBean bean : iterable){
                        upFlow = upFlow + bean.getUpFlow();
                        downFlow = downFlow + bean.getDownFlow();
                }
                sumFlow = upFlow + downFlow;
                flowBean.setPhoneNumber(key.toString());
                flowBean.setDownFlow(downFlow);
                flowBean.setUpFlow(upFlow);
                flowBean.setSumFlow(sumFlow);
                context.write(key, flowBean);
        }
}
2.3 FlowBean类。
package com.npf.hadoop;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class FlowBean implements Writable{

        private String phoneNumber;

        private long upFlow;

        private long downFlow;

        private long sumFlow;

        public String getPhoneNumber() {
                return phoneNumber;
        }

        public void setPhoneNumber(String phoneNumber) {
                this.phoneNumber = phoneNumber;
        }

        public long getUpFlow() {
                return upFlow;
        }

        public void setUpFlow(long upFlow) {
                this.upFlow = upFlow;
        }

        public long getDownFlow() {
                return downFlow;
        }

        public void setDownFlow(long downFlow) {
                this.downFlow = downFlow;
        }

        public long getSumFlow() {
                return sumFlow;
        }

        public void setSumFlow(long sumFlow) {
                this.sumFlow = sumFlow;
        }

        @Override
        public void write(DataOutput out) throws IOException {
                out.writeUTF(phoneNumber);
                out.writeLong(upFlow);
                out.writeLong(downFlow);
                out.writeLong(sumFlow);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
                phoneNumber = in.readUTF();
                upFlow = in.readLong();
                downFlow = in.readLong();
                sumFlow = in.readLong();
        }
}
2.4 runner主程序入口。

package com.npf.hadoop;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class FlowCountRunner {

        public static void main(String[] args) throws Exception {

                Configuration configuration = new Configuration();
                Job job = Job.getInstance(configuration);
                job.setJarByClass(FlowCountRunner.class);

                job.setMapperClass(FlowCountMapper.class);
                job.setMapOutputKeyClass(Text.class);
                job.setMapOutputValueClass(FlowBean.class);

                job.setReducerClass(FlowCountReducer.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(FlowBean.class);

                job.setInputFormatClass(TextInputFormat.class);
                job.setOutputFormatClass(TextOutputFormat.class);

                FileInputFormat.setInputPaths(job, new Path("hdfs://devcitibank:9000/flowCountJob/srcdata"));
                FileOutputFormat.setOutputPath(job, new Path("hdfs://devcitibank:9000/flowCountJob/outputdata"));
                job.waitForCompletion(true);
        }
}

3. 我们通过Eclipse将我们的程序打成一个Jar包,打到/root目录下面。Jar包的名字我们命名为flowcount.jar。

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(一)_第1张图片

4. ok, 我们来验证下在/root/目录下是否存在我们的Jar包。

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(一)_第2张图片

5. 验证hadoop集群是否启动。


6. 验证我们在集群中的/flowCountJob/srcdata目录下面是否有我们需要处理的文件。


7.提交flowcount.jar到hadoop集群中去处理。

Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(一)_第3张图片

8. 执行成功后,我们去hadoop集群中去查看结果。


在这里,我们发现,结果并不是我们想要的形式,那是因为我们在Java里面的Bean(FlowBean)没有重写toString()方法。下面我们重写一下FlowBean的toString()方法。

       @Override
        public String toString() {
                return upFlow+"  "+downFlow+"  "+sumFlow;
        }
然后我们重新打Jar包,重新提交到Hadoop集群中,查看的效果如下:


9. 源代码已托管到GitHub上面:https://github.com/HadoopOrganization/HadoopMapReducerFlowCount

你可能感兴趣的:(Hadoop2.4.1 简单的用户手机流量统计的MapReduce程序(一))