gongxiaojiucom足球资讯网

hadoop系列三:mapreduce的使用(一)

一：说明

此为大数据系列的一些博文，有空的话会陆续更新，包含大数据的一些内容，如hadoop,spark,storm,机器学习等。

当前使用的hadoop版本为2.6.4

上一篇:hadoop系列二：HDFS文件系统的命令及JAVA客户端API

在下面可以看到统计一本小说(斗破苍穹)哪些词语出现了最多。

本来mapreducer只想写一篇的，可是发现写一篇太长了，所以就进行了拆分。

所有的部分都提供代码下载

目录可以在右侧查看，点击目录跳转到相应的位置

一：说明
二：wordcount字数统计功能
2.1:准备文件
2.2:编写Mapper的代码
2.3编写Reduce的代码
2.4:编写main方法执行这个mapreduce
2.5:把代码放在hadoop中运行
三：自定义序列化的类
3.1:自定义一个序列化的输出bean
3.2:编写mapper
3.3:编写reducer
 3.4:编写main方法
3.5:在hadoop中运行
四:数据分区(按照不同类型输出到不同的位置)
4.1：分区规则的代码
4.2:设置分区代码
4.3:分区的完整代码
4.4:在hadoop运行分区代码
五：数据排序及对象的重用
5.1:编写排序代码
5.2:编写mapper(对象的复用)
5.3:编写reducer
5.4:编写启动类
5.5:完整的代码
5.6:在hadoop中执行排序
六:统计一本小说中出现的词汇（包含Combiner）
6.1:准备工作
6.2:配置maven打包包含分词的依赖
6.3:数据汇总(Combiner)
6.4:排序阶段

二：wordcount字数统计功能

相应的代码在:代码地址--点我跳转

2.1:准备文件

既然是要统计字数，那么肯定是要有相应的文档，我们先准备一些这样的文档，我们准备两个文档，分别叫text1.txt和text2.txt

text1.txt

hello zhangsan
lisi nihao
hai zhangsan
nihao lisi
x xiaoming

text2.txt

 
           zhangsan a 
          
           lisi b 
          
           wangwu c 
          
           jiji 7 
          
           haha xiaoming 
          
           xiaoming  
           is 
           gril

我们生成这样两个文件，待会去统计每个单词分别出现了多少次

2.2:编写Mapper的代码

直接贴上代码，相应的解释在注释中

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

/**
 * 这部分的输入是由mapreduce自动读取进来的
 * 简单的统计单词出现次数

 * KEYIN 默认情况下，是mapreduce所读取到的一行文本的起始偏移量，Long类型，在hadoop中有其自己的序列化类LongWriteable
 * VALUEIN 默认情况下，是mapreduce所读取到的一行文本的内容，hadoop中的序列化类型为Text
 * KEYOUT 是用户自定义逻辑处理完成后输出的KEY，在此处是单词，String
 * VALUEOUT 是用户自定义逻辑输出的value，这里是单词出现的次数，Long
 * @author Administrator
 *
 */
public class WordCountMapper extends Mapper{

    @Override
    protected void map(LongWritable key, Text value, Mapper.Context context)
            throws IOException, InterruptedException {
        //这是mapreduce读取到的一行字符串
        String line = value.toString();
        String[] words = line.split(" ");
        
        for (String word : words) {
            //将单词输出为key，次数输出为value，这行数据会输到reduce中
            context.write(new Text(word), new LongWritable(1));
        }
    }
}

2.3编写Reduce的代码

同样直接上代码

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

/**
 * 第一个Text: 是传入的单词名称，是Mapper中传入的
 * 第二个：LongWritable 是该单词出现了多少次，这个是mapreduce计算出来的，比如 hello出现了11次
 * 第三个Text: 是输出单词的名称 ，这里是要输出到文本中的内容
 * 第四个LongWritable： 是输出时显示出现了多少次，这里也是要输出到文本中的内容
 * @author Administrator
 *
 */
public class WordCountReduce extends Reducer {

    @Override
    protected void reduce(Text key, Iterable values,
            Reducer.Context context) throws IOException, InterruptedException {
        long count = 0;
        for (LongWritable num : values) {
            count += num.get();
        }
        context.write(key, new LongWritable(count));
    }
}

2.4:编写main方法执行这个mapreduce

写了mapper与reduce的代码，自然是需要一个main方法来把这些代码运行起来的，所以编写如下代码

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


/**
 * 相当于运行在yarn中的客户端
 * @author Administrator
 *
 */
public class WordCountDriver {

    public static void main(String[] args) throws IOException {
        Configuration conf = new Configuration();
        //如果是打包在linux上运行，则不需要写这两行代码
/*        //指定运行在yarn中
        conf.set("mapreduce.framework.name", "yarn");
        //指定resourcemanager的主机名
        conf.set("yarn.resourcemanager.hostname", "server1");*/
        Job job = Job.getInstance(conf);
        
        //使得hadoop可以根据类包，找到jar包在哪里
        job.setJarByClass(WordCountDriver.class);
        
        //指定Mapper的类
        job.setMapperClass(WordCountMapper.class);
        //指定reduce的类
        job.setReducerClass(WordCountReduce.class);
        
        //设置Mapper输出的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);
        
        //设置最终输出的类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(LongWritable.class);
        
        //指定输入文件的位置，这里为了灵活，接收外部参数
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        //指定输入文件的位置，这里接收启动参数
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
        //将job中的参数，提交到yarn中运行
        //job.submit();
        try {
            job.waitForCompletion(true);
            //这里的为true,会打印执行结果
        } catch (ClassNotFoundException | InterruptedException e) {
            e.printStackTrace();
        }
    }
}

2.5:把代码放在hadoop中运行

代码写完了，要怎么运行呢？

（1）首先，肯定不是直接执行main方法运行，因为目前的代码，并不知道hadoop部署在哪里，我们要做的是，把这个项目打包，如果是maven项目，则使用maven package命令打包，把相应的jar包，上传到服务器中。

（2）其次，需要把之前的两个文本文件，text1.txt和text2.txt上传到hdfs中，因为既然是大数据，那么在实际环境中，肯定不可能是这么小的数据来进行计算，肯定是有着大量的数据，而这些数据，靠一台服务器肯定是放不下去的，也只有像hdfs这种大文件存储，或者一些其它的专门存放大数据的地方，才能存放了，我们使用如下的命令，把文件上传到hdfs中，如果这些命令看不懂，可以先看上一章节，hdfs的使用。

//创建一个目录
hadoop fs -mkdir -p /wordcount/input
//上传文件
hadoop fs -put text1.txt text2.txt /wordcount/input

（3）运行代码，带有main方法的代码，是可以使用java命令运行的，但是因为hadoop依赖了很多别的jar包，这样子运行代码，需要添加大量的依赖，写的命令很复杂，hadoop提供了这样的一个命令来执行代码

hadoop jar wordcount.jar com.zxj.hadoop.demo.mapreduce.wordcount.WordCountDriver /wordcount/input /wordcount/output

这里来解释一下这条命令的意思，jar说明使用hadoop中内置的jar命令，也就是执行一个jar包。wordcount.jar 这个是上传的代码，也就是我们之前写的代码，打包之后上传到服务器中的名字。com.zxj.hadoop.demo.mapreduce.wordcount.WordCountDriver是需要运行哪个类，因为一个jar包中有可能有多个main方法，这样可以指定使用哪个类启动。最后两个参数 /wordcount/input 和 /wordcount/output，这是我们的代码中自定义的两个参数，第一个是文件的目录（意味着可以读取一整个目录中的多个文件），第二个是输出结果的目录。

执行完成之后，会有如下结果，如果没有抛出异常，或者写明失败，带有success的就是成功了。

现在我们可以去看一下输出结果

查看输出的文件

hadoop fs -ls /wordcount/output

第一个文件代表执行成功，第二个文件是输出结果文件，执行如下命令查看

从上图发现，zhangsan出现了3次，xiaoming出现了3次，nihao出现了2次，其它的是1次

三：自定义序列化的类

代码地址：下载代码

当输出的结果比较复杂的时候，就没办法使用Text,LongWritable这种类型来输出，这个时候我们可以自定义一个序列化的类，这个序列化不是jdk的序列化，而是hadoop自已的序列化，我们需要实现它

如下文档，保存并命名为staff.txt:

张三    江西    打车    200
李四    广东    住宿    600
王五    北京    伙食    320
张三    江西    话费    50
张三    湖南    打车    900
周六    上海    采购    3000
李四    西藏    旅游    1000
王五    北京    借款    500
李四    上海    话费    50
周六    北京    打车    600
张三    广东    租房    3050

3.1:自定义一个序列化的输出bean

之前我们一直使用LongWriteable或者Text来作为输入的内容，但是如果看这两个对象的源码，它们都是实现了Writable接口的，这是一个hadoop自带的序列化接口。

现在我们要输出一些信息，单单靠一个Text已经无法达到我们的效果的时候，我们就可以自定义一个对象，然后实现Writable接口

如下的代码，就是自定义一个可序列化的bean

    /**
     * 封装的bean
     */
    public static class SpendBean implements Writable{

        private Text userName;

        private IntWritable money;

        public SpendBean(Text userName, IntWritable money) {
            this.userName = userName;
            this.money = money;
        }

        /**
         * 反序列化时必须有一个空参的构造方法
         */
        public SpendBean(){}

        /**
         * 序列化的代码
         * @param out
         * @throws IOException
         */
        @Override
        public void write(DataOutput out) throws IOException {
            userName.write(out);
            money.write(out);
        }

        /**
         * 反序列化的代码
         * @param in
         * @throws IOException
         */
        @Override
        public void readFields(DataInput in) throws IOException {
            userName = new Text();
            userName.readFields(in);
            money = new IntWritable();
            money.readFields(in);
        }

        public Text getUserName() {
            return userName;
        }

        public void setUserName(Text userName) {
            this.userName = userName;
        }

        public IntWritable getMoney() {
            return money;
        }

        public void setMoney(IntWritable money) {
            this.money = money;
        }

        @Override
        public String toString() {
            return userName.toString() + "," + money.get();
        }
    }

3.2:编写mapper

编写mapper

    /**
     * Mapper
     */
    public static class GroupUserMapper extends Mapper{

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String val = value.toString();
            String[] split = val.split("\t");
            //这里就不作字符串异常的处理了，核心代码简单点
            String name = split[0];
            String province = split[1];
            String type = split[2];
            int money = Integer.parseInt(split[3]);
            SpendBean groupUser = new SpendBean();
            groupUser.setUserName(new Text(name));
            groupUser.setMoney(new IntWritable(money));
            context.write(new Text(name),groupUser);
        }
    }

3.3:编写reducer

编写reducer

/**
     * reducer
     */
    public static class GroupUserReducer extends Reducer {
        /**
         * 姓名
         * @param key
         * @param values
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            int money = 0;//消费金额
            //遍历
            for(SpendBean bean : values){
                money += bean.getMoney().get();
            }
            //输出汇总结果
            context.write(key,new SpendBean(key,new IntWritable(money)));
        }
    }

3.4:编写main方法

编写main方法

    /**
     * 编写启动类
     * @param args
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        job.setJarByClass(GroupUser.class); //设置jar中的启动类，可以根据这个类找到相应的jar包

        job.setMapperClass(GroupUserMapper.class); //设置mapper的类
        job.setReducerClass(GroupUserRecuder.class); //设置reducer的类

        job.setMapOutputKeyClass(Text.class); //mapper输出的key
        job.setMapOutputValueClass(SpendBean.class); //mapper输出的value

        job.setOutputKeyClass(Text.class); //最终输出的数据类型
        job.setOutputValueClass(SpendBean.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));//输入的文件位置
        FileOutputFormat.setOutputPath(job,new Path(args[1]));//输出的文件位置

        boolean b = job.waitForCompletion(true);//等待完成，true，打印进度条及内容
        if(b){
            //success
        }

    }

完整的代码如下，这里把几个类都写在一起了。

package com.zxj.hadoop.demo.mapreduce.staffspend.groupuser;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-23 .16:33
 * 说明 ...
 */
public class GroupUser {
    /**
     * Mapper
     */
    public static class GroupUserMapper extends Mapper{

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String val = value.toString();
            String[] split = val.split("\t");
            //这里就不作字符串异常的处理了，核心代码简单点
            String name = split[0];
            String province = split[1];
            String type = split[2];
            int money = Integer.parseInt(split[3]);
            SpendBean groupUser = new SpendBean();
            groupUser.setUserName(new Text(name));
            groupUser.setMoney(new IntWritable(money));
            context.write(new Text(name),groupUser);
        }
    }

    /**
     * reducer
     */
    public static class GroupUserReducer extends Reducer {
        /**
         * 姓名
         * @param key
         * @param values
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            int money = 0;//消费金额
            //遍历
            for(SpendBean bean : values){
                money += bean.getMoney().get();
            }
            //输出汇总结果
            context.write(key,new SpendBean(key,new IntWritable(money)));
        }
    }

    /**
     * 封装的bean
     */
    public static class SpendBean implements Writable{

        private Text userName;

        private IntWritable money;

        public SpendBean(Text userName, IntWritable money) {
            this.userName = userName;
            this.money = money;
        }

        /**
         * 反序列化时必须有一个空参的构造方法
         */
        public SpendBean(){}

        /**
         * 序列化的代码
         * @param out
         * @throws IOException
         */
        @Override
        public void write(DataOutput out) throws IOException {
            userName.write(out);
            money.write(out);
        }

        /**
         * 反序列化的代码
         * @param in
         * @throws IOException
         */
        @Override
        public void readFields(DataInput in) throws IOException {
            userName = new Text();
            userName.readFields(in);
            money = new IntWritable();
            money.readFields(in);
        }

        public Text getUserName() {
            return userName;
        }

        public void setUserName(Text userName) {
            this.userName = userName;
        }

        public IntWritable getMoney() {
            return money;
        }

        public void setMoney(IntWritable money) {
            this.money = money;
        }

        @Override
        public String toString() {
            return userName.toString() + "," + money.get();
        }
    }


    /**
     * 编写启动类
     * @param args
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        job.setJarByClass(GroupUser.class); //设置jar中的启动类，可以根据这个类找到相应的jar包

        job.setMapperClass(GroupUserMapper.class); //设置mapper的类
        job.setReducerClass(GroupUserReducer.class); //设置reducer的类

        job.setMapOutputKeyClass(Text.class); //mapper输出的key
        job.setMapOutputValueClass(SpendBean.class); //mapper输出的value

        job.setOutputKeyClass(Text.class); //最终输出的数据类型
        job.setOutputValueClass(SpendBean.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));//输入的文件位置
        FileOutputFormat.setOutputPath(job,new Path(args[1]));//输出的文件位置

        boolean b = job.waitForCompletion(true);//等待完成，true，打印进度条及内容
        if(b){
            //success
        }

    }
}

3.5:在hadoop中运行

然后执行maven clean package命令，重新打包，并且上传到服务器中。

我们也创建一个目录，来存放之前的员工消费信息

hadoop fs -mkdir -p /staffspend/input

把之前准备好的员工文件上传到这个目录

hadoop fs -put staff.txt /staffspend/input

然后准备执行任务

hadoop jar hadoop-mapreduce-1.0.jar com.zxj.hadoop.demo.mapreduce.staffspend.groupuser.GroupUser /staffspend/input /staffspend/output

执行成功后，查看输出文件

hadoop fs -cat /staffspend/output/part-r-00000

四:数据分区(按照不同类型输出到不同的位置)

下载代码：点我下载

这样的需求也经常会有，我可能并不是仅仅需要总的数据查看，我还可能要查看每一个类型，比如第三部分的文件中，我可能想分别查看每个省中，每个人分别用了多少钱。

这个时候我们对上第三部分的代码进行修改

我们要增加输出bean中的省份字段，红色位置是修改过的部分

/**
     * 封装的bean
     */
    public static class SpendBean implements Writable{

        private Text userName;

        private IntWritable money;

        private Text province;


        public SpendBean(Text userName, IntWritable money, Text province) {
            this.userName = userName;
            this.money = money;
            this.province = province;
        }

        /**
         * 反序列化时必须有一个空参的构造方法
         */
        public SpendBean(){}

        /**
         * 序列化的代码
         * @param out
         * @throws IOException
         */
        @Override
        public void write(DataOutput out) throws IOException {
            userName.write(out);
            money.write(out);
            province.write(out);
        }

        /**
         * 反序列化的代码
         * @param in
         * @throws IOException
         */
        @Override
        public void readFields(DataInput in) throws IOException {
            userName = new Text();
            userName.readFields(in);
            money = new IntWritable();
            money.readFields(in);
            province = new Text();
            province.readFields(in);
        }

        public Text getUserName() {
            return userName;
        }

        public Text getProvince() {
            return province;
        }

        public void setProvince(Text province) {
            this.province = province;
        }

        public void setUserName(Text userName) {
            this.userName = userName;
        }

        public IntWritable getMoney() {
            return money;
        }

        public void setMoney(IntWritable money) {
            this.money = money;
        }

        @Override
        public String toString() {
            return "SpendBean{" +
                    "userName=" + userName +
                    ", money=" + money +
                    ", province=" + province +
                    '}';
        }
    }

可以看到，上面的bean并没有改动什么特别的东西，完全是加了一个省份字段而已。

4.1：分区规则的代码

首先，如果要按照数据进行分区，我们肯定需要写分区的代码来告诉hadoop，我们写一个分区的类来继承org.apache.hadoop.mapreduce.Partitioner

hadoop中的分区，是在mapper结束后的reducer中，所以下面的代码是在reducer时运行的，我们对不同的省份进行规则划分，比如说江西就是对应的0分区

具体代码如下:

package com.zxj.hadoop.demo.mapreduce.staffspend.groupuser;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

import java.util.HashMap;
import java.util.Map;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .11:14
 * 说明
 * key ,value是mapper中输出的类型，因为分区是在mapper完成之后进行的
 */
public class ProvincePartitioner extends Partitioner {
    private static Map provinces = new HashMap<>();
    static {
        //这里给每一个省份编制一个分区
        provinces.put("江西",0);
        provinces.put("广东",1);
        provinces.put("北京",2);
        provinces.put("湖南",3);
        provinces.put("上海",4);
        provinces.put("西藏",5);
    }

    /**
     * 给指定的数据一个分区
     * @param text
     * @param spendBean
     * @param numPartitions
     * @return
     */
    @Override
    public int getPartition(Text text, GroupUser.SpendBean spendBean, int numPartitions) {
        Integer province = provinces.get(spendBean.getProvince().toString());
        province = province == null ? 6 : province;  //如果在省份列表中找不到，则指定一个默认的分区
        return province;
    }
}

很简单的代码，我们划分了6个分区，如果有的省份在这6个分区中找不到，那余下的就会进入第7个分区中。

4.2:设置分区代码

分区的代码既然写完了，那么就需要在运行的时候，指定这分区的规则是我们刚才写的代码，位置在运行的main方法中，如下:

红色部分是重点部分，也是改过的部分

/**
     * 编写启动类
     * @param args
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        job.setJarByClass(GroupUser.class); //设置jar中的启动类，可以根据这个类找到相应的jar包

        job.setMapperClass(GroupUserMapper.class); //设置mapper的类
        job.setReducerClass(GroupUserReducer.class); //设置reducer的类

        job.setPartitionerClass(ProvincePartitioner.class);//指定数据分区规则，不是必须要的，根据业务需求分区
        job.setNumReduceTasks(7); //设置相应的reducer数量，这个数量要与分区的大最数量一致

        job.setMapOutputKeyClass(Text.class); //mapper输出的key
        job.setMapOutputValueClass(SpendBean.class); //mapper输出的value

        job.setOutputKeyClass(Text.class); //最终输出的数据类型
        job.setOutputValueClass(SpendBean.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));//输入的文件位置
        FileOutputFormat.setOutputPath(job,new Path(args[1]));//输出的文件位置

        boolean b = job.waitForCompletion(true);//等待完成，true，打印进度条及内容
        if(b){
            //success
        }

    }

这里再说明一下

job.setNumReduceTasks(7);如果

如果这个数值是1，那么所有的数据全部会输出到一个文件中。

假如是2，那么将会报错。

假如超出分区大小，比如写一个10，那么多出来的文件将会为空。所以一般是按最大需要分区数量写。

4.3:分区的完整代码

下面贴出完整的代码

分区代码:

package com.zxj.hadoop.demo.mapreduce.staffspend.groupuser;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

import java.util.HashMap;
import java.util.Map;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .11:14
 * 说明
 * key ,value是mapper中输出的类型，因为分区是在mapper完成之后进行的
 */
public class ProvincePartitioner extends Partitioner {
    private static Map provinces = new HashMap<>();
    static {
        //这里给每一个省份编制一个分区
        provinces.put("江西",0);
        provinces.put("广东",1);
        provinces.put("北京",2);
        provinces.put("湖南",3);
        provinces.put("上海",4);
        provinces.put("西藏",5);
    }

    /**
     * 给指定的数据一个分区
     * @param text
     * @param spendBean
     * @param numPartitions
     * @return
     */
    @Override
    public int getPartition(Text text, GroupUser.SpendBean spendBean, int numPartitions) {
        Integer province = provinces.get(spendBean.getProvince().toString());
        province = province == null ? 6 : province;  //如果在省份列表中找不到，则指定一个默认的分区
        return province;
    }
}

其它代码，这些代码是写在一个文件中了

package com.zxj.hadoop.demo.mapreduce.staffspend.groupuser;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-23 .16:33
 * 说明 ...
 */
public class GroupUser {
    /**
     * Mapper
     */
    public static class GroupUserMapper extends Mapper{

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String val = value.toString();
            String[] split = val.split("\t");
            //这里就不作字符串异常的处理了，核心代码简单点
            String name = split[0];
            String province = split[1];
            String type = split[2];
            int money = Integer.parseInt(split[3]);
            SpendBean groupUser = new SpendBean();
            groupUser.setUserName(new Text(name));
            groupUser.setMoney(new IntWritable(money));
            groupUser.setProvince(new Text(province));
            context.write(new Text(name),groupUser);
        }
    }

    /**
     * reducer
     */
    public static class GroupUserReducer extends Reducer {
        /**
         * 姓名
         * @param key
         * @param values
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
            int money = 0;//消费金额
            //遍历
            Text province = null;
            for(SpendBean bean : values){
                money += bean.getMoney().get();
                province = bean.getProvince();
            }
            //输出汇总结果
            context.write(key,new SpendBean(key,new IntWritable(money),province));
        }
    }

    /**
     * 封装的bean
     */
    public static class SpendBean implements Writable{

        private Text userName;

        private IntWritable money;

        private Text province;


        public SpendBean(Text userName, IntWritable money, Text province) {
            this.userName = userName;
            this.money = money;
            this.province = province;
        }

        /**
         * 反序列化时必须有一个空参的构造方法
         */
        public SpendBean(){}

        /**
         * 序列化的代码
         * @param out
         * @throws IOException
         */
        @Override
        public void write(DataOutput out) throws IOException {
            userName.write(out);
            money.write(out);
            province.write(out);
        }

        /**
         * 反序列化的代码
         * @param in
         * @throws IOException
         */
        @Override
        public void readFields(DataInput in) throws IOException {
            userName = new Text();
            userName.readFields(in);
            money = new IntWritable();
            money.readFields(in);
            province = new Text();
            province.readFields(in);
        }

        public Text getUserName() {
            return userName;
        }

        public Text getProvince() {
            return province;
        }

        public void setProvince(Text province) {
            this.province = province;
        }

        public void setUserName(Text userName) {
            this.userName = userName;
        }

        public IntWritable getMoney() {
            return money;
        }

        public void setMoney(IntWritable money) {
            this.money = money;
        }

        @Override
        public String toString() {
            return "SpendBean{" +
                    "userName=" + userName +
                    ", money=" + money +
                    ", province=" + province +
                    '}';
        }
    }


    /**
     * 编写启动类
     * @param args
     */
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        job.setJarByClass(GroupUser.class); //设置jar中的启动类，可以根据这个类找到相应的jar包

        job.setMapperClass(GroupUserMapper.class); //设置mapper的类
        job.setReducerClass(GroupUserReducer.class); //设置reducer的类

        job.setPartitionerClass(ProvincePartitioner.class);//指定数据分区规则，不是必须要的，根据业务需求分区
        job.setNumReduceTasks(7); //设置相应的reducer数量，这个数量要与分区的大最数量一致

        job.setMapOutputKeyClass(Text.class); //mapper输出的key
        job.setMapOutputValueClass(SpendBean.class); //mapper输出的value

        job.setOutputKeyClass(Text.class); //最终输出的数据类型
        job.setOutputValueClass(SpendBean.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));//输入的文件位置
        FileOutputFormat.setOutputPath(job,new Path(args[1]));//输出的文件位置

        boolean b = job.waitForCompletion(true);//等待完成，true，打印进度条及内容
        if(b){
            //success
        }

    }
}

4.4:在hadoop运行分区代码

我们重新打包项目后，重新上传到服务器中，直接执行命令运行

hadoop jar hadoop-mapreduce-1.0.jar com.zxj.hadoop.demo.mapreduce.staffspend.groupuser.GroupUser /staffspend/input /staffspend/output2

结果会发现reducer的过程，明显慢了下来，因为是在reducer中分区，所以自然会慢了一些。

执行完成后，我们查看输出列表

hadoop fs -ls /staffspend/output2

可以看到，这里有7个文件，对应着7个分区，执行命令查看内容

可以看到，这其中的数据，就是在一个省份中，每个人分别花了多少钱

五：数据排序及对象的重用

下载代码：点我下载

这一部分会讲到数据的排序，这种需求也是会经常会有的，比如上面的例子中，我就想知道公司哪个员工的经费是最多的。

其次就是对象的重用，既然是大数据，那么map的次数远远不止上亿这么简单，我们每次都要重复创建一个bean吗？

先准备一些数据，我们也可以用之前计算出来的数据，但是由于之前打印的格式不好，是toString()的默认格式，所以我这里再准备一份数据

我们开始编码

5.1:编写排序代码

首先再准备一份bean，这个bean和以前不一样，需要实现排序接口

/**
     * 我们需要实现一个新的接口，这个接口包含了排序接口以及序列化接口
     */
    public static class Spend implements WritableComparable{
        private Text name; //姓名
        private IntWritable money; //花费

        public Spend(){}

        public Spend(Text name, IntWritable money) {
            this.name = name;
            this.money = money;
        }

        public void set(Text name, IntWritable money) {
            this.name = name;
            this.money = money;
        }
        @Override
        public int compareTo(Spend o) {
            return o.getMoney().get() - this.money.get();
        }

        @Override
        public void write(DataOutput out) throws IOException {
            name.write(out);
            money.write(out);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            name = new Text();
            name.readFields(in);
            money = new IntWritable();
            money.readFields(in);
        }


        public Text getName() {
            return name;
        }

        public void setName(Text name) {
            this.name = name;
        }

        public IntWritable getMoney() {
            return money;
        }

        public void setMoney(IntWritable money) {
            this.money = money;
        }

        @Override
        public String toString() {
            return name.toString() + "\t" + money.get();
        }
    }

其实这个排序接口就是jdk自带的一个排序接口，使用方法与jdk的一致，所以就不讲的太深入，主要就是靠这个接口来进行排序。

5.2:编写mapper(对象的复用)

这部分的mapper很简单，没有什么特殊要讲的内容

    public static class SortMapper extends Mapper{
        private Spend spend = new Spend();
        private IntWritable moneyWritable = new IntWritable();
        private Text text = new Text();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split("\t");//这里就不做异常处理了，只写核心逻辑
            String name = split[0];
            int money = Integer.parseInt(split[1]);
            text.set(name);
            moneyWritable.set(money);
            spend.set(text, moneyWritable);
            context.write(spend,text);
        }
    }

代码逻辑上并没有什么可说的，因为数据已经是汇总的数据了，只是进行一个排序而已，而排序的代码又写在bean中实现的接口上了，这里主要就是讨论一下对象的复用。

因为大数据动则数十亿上百亿的数据，如果重复创建这么多对象，那么将增加GC的工作，我们可以复用它，就是把它定义在上方，在调用它的set方法，可以更新这个对象的值。

可能有人会觉得，在第二次操作这个对象的时候，那不是会改变这个对象的值吗？没错的，是会改变。那么第一次操作这方法时创建的对象，保留的引用不是也会更新值吗？答案是不会的，生成的bean一经写出，就会序列化出去，这个时候已经是一个序列化的数据了，序列化的数据在reducer中将会反序列化，这个时候，和这个对象已经没有关系了。

5.3:编写reducer

reducer平淡出奇，实在是没有什么可说的，直接输出结果就行

public static class SortReducer extends Reducer{
        /**
         * 因为在这之前已经是汇总的结果了，所以这里直接输出就行了
         * @param key
         * @param values  这里面只有一个，就是姓名
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void reduce(Spend key, Iterable values, Context context) throws IOException, InterruptedException {
            context.write(values.iterator().next(),key);
        }
    }

5.4:编写启动类

启动类与也是一样的，只不过不需要加上分区的代码

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration config = new Configuration();

        Job job = Job.getInstance(config);

        job.setJarByClass(SortGroupUser.class);

        job.setMapperClass(SortMapper.class);
        job.setReducerClass(SortReducer.class);

        job.setMapOutputKeyClass(Spend.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Spend.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        boolean b = job.waitForCompletion(true);
        if(b){
            //success
        }

    }

这里的代码就没有注释了，想看注释的可以看上面部分的代码

5.5:完整的代码

为了防止强迫证的同学，贴出完整的代码

package com.zxj.hadoop.demo.mapreduce.staffspend.groupuser;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .15:48
 * 说明 带有排序功能的统计，
 */
public class SortGroupUser {

    public static class SortMapper extends Mapper{
        private Spend spend = new Spend();
        private IntWritable moneyWritable = new IntWritable();
        private Text text = new Text();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] split = value.toString().split("\t");//这里就不做异常处理了，只写核心逻辑
            String name = split[0];
            int money = Integer.parseInt(split[1]);
            text.set(name);
            moneyWritable.set(money);
            spend.set(text, moneyWritable);
            context.write(spend,text);
        }
    }

    public static class SortReducer extends Reducer{
        /**
         * 因为在这之前已经是汇总的结果了，所以这里直接输出就行了
         * @param key
         * @param values  这里面只有一个，就是姓名
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        @Override
        protected void reduce(Spend key, Iterable values, Context context) throws IOException, InterruptedException {
            context.write(values.iterator().next(),key);
        }
    }


    /**
     * 我们需要实现一个新的接口，这个接口包含了排序接口以及序列化接口
     */
    public static class Spend implements WritableComparable{
        private Text name; //姓名
        private IntWritable money; //花费

        public Spend(){}

        public Spend(Text name, IntWritable money) {
            this.name = name;
            this.money = money;
        }

        public void set(Text name, IntWritable money) {
            this.name = name;
            this.money = money;
        }
        @Override
        public int compareTo(Spend o) {
            return o.getMoney().get() - this.money.get();
        }

        @Override
        public void write(DataOutput out) throws IOException {
            name.write(out);
            money.write(out);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            name = new Text();
            name.readFields(in);
            money = new IntWritable();
            money.readFields(in);
        }


        public Text getName() {
            return name;
        }

        public void setName(Text name) {
            this.name = name;
        }

        public IntWritable getMoney() {
            return money;
        }

        public void setMoney(IntWritable money) {
            this.money = money;
        }

        @Override
        public String toString() {
            return name.toString() + "\t" + money.get();
        }
    }


    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration config = new Configuration();

        Job job = Job.getInstance(config);

        job.setJarByClass(SortGroupUser.class);

        job.setMapperClass(SortMapper.class);
        job.setReducerClass(SortReducer.class);

        job.setMapOutputKeyClass(Spend.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Spend.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        boolean b = job.waitForCompletion(true);
        if(b){
            //success
        }

    }
}

mapper与reducer都写在这里面了。

5.6:在hadoop中执行排序

我们把新准备的数据命令为all.txt，然后上传到服务器，再上传到hadoop的hdfs中

创建目录

hadoop fs -mkdir -p /staffsort/input

上传文件

hadoop fs -put all.txt /staffsort/input

执行运算

hadoop jar hadoop-mapreduce-1.0.jar com.zxj.hadoop.demo.mapreduce.staffspend.groupuser.SortGroupUser /staffsort/input /staffsort/output

查看输出

hadoop fs -ls /staffsort/output
hadoop fs -cat /staffsort/output/part-r-00000

OK完成

六:统计一本小说中出现的词汇（包含Combiner）

下载代码：点我下载

本部分涵盖了Combiner的知识点，以及在应用场景上是计算了斗破苍穹中哪些词汇出现的次数最多，达到这样一个效果，需要进行两次mapreducer，第一次是汇总，第二次是排序

6.1:准备工作

1：斗破苍穹.txt(自行下载)

2：中文分词器 ansj(也可以用别的)

        
            org.ansj
            ansj_seg
            5.1.1

6.2:配置maven打包包含分词的依赖

我们的代码是要打成jar包到hadoop中运行的，之前的代码中，我们并没有依赖其它的东西，这次我们要依赖分词器，因为hadoop中是不带有这个东西的，所以我们打包的时候，也要把这个分词器打包进来，所以我们使用maven-assembly-plugin插件。这个插件可能很多人都用过，可是你们觉得仅仅是配置打包其它的依赖这么简单吗？no!no!no!我们要打出来的包，只包含分词器呀，因为在pom文件中，还包含了hadoop的jar包，我们不需要hadoop的jar包也打进来，因为在hadoop运行环境中，这些代码是在hadoop中存在的，而且加上hadooop的jar后，打出来的包会变的特别大。

我们现在要做的是打现来的包，只包含我们自己的代码加上分词器的jar。

我们看一下怎么做，如果朋友们有更好的方案，请在评论中指点，不胜感激

            
                maven-assembly-plugin
                2.4.1
                
                    
                        make-jar
                        
                        package
                        
                            
                            single
                        
                        
                             
                                src/main/assemble/package.xml

上面是pom文件中的配置，但是上面依赖了一个其它的配置文件，我们把它建在了相应的目录，具体内容如下

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    
    <id>aid>
    
    <includeBaseDirectory>falseincludeBaseDirectory>
    <formats>
        <format>jarformat>
    formats>

    <fileSets>
        
        <fileSet>
            
            <directory>${project.build.directory}/classesdirectory>
            <outputDirectory>outputDirectory>
        fileSet>
    fileSets>


    <dependencySets>
        <dependencySet>
            <useProjectArtifact>trueuseProjectArtifact>
            <useProjectAttachments>trueuseProjectAttachments>
            
            <outputDirectory>outputDirectory>
            
            <unpack>trueunpack>
            <includes>
                
                <include>org.ansj:ansj_seginclude>
                <include>org.nlpcn:nlp-langinclude>
                <include>org.nutz:nutzinclude>
            includes>
        dependencySet>
    dependencySets>
assembly>

如上就配置完了

6.3:数据汇总(Combiner)

第一步，我们要对数据进行汇总，不然怎么排序呢？汇总的代码与之前wordcount差不多，但是数据量就多了，毕竟那不是我随意编写的测试数据，而是一本小说，所以这里我们用到Combiner。

简要的说一个Combiner的作用，Combiner就是在map的阶段，先进行一步汇总，减少reducer的汇总的数据量。这个马上会讲到。

现在先来准备一个Mapper，因为输出的就是词汇和数量，所以也不需要自定义bean

package com.zxj.hadoop.demo.mapreduce.story;

import org.ansj.domain.Result;
import org.ansj.domain.Term;
import org.ansj.splitWord.analysis.ToAnalysis;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.Iterator;
import java.util.List;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .19:00
 * 说明 统计一本小说哪些词出现的次数最多
 */
public class StoryMapper extends Mapper {
    private Text text = new Text();
    private LongWritable longWritable = new LongWritable();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString().trim();
        //剔除空的一行
        if(!StringUtils.isBlank(line)){
            //分词的代码
            Result parse = ToAnalysis.parse(line);
            List terms = parse.getTerms();
            Iterator iterator = terms.iterator();
            while (iterator.hasNext()){
                Term term = iterator.next();
                longWritable.set(1);
                text.set(term.getName());
                context.write(text,longWritable);
            }
        }
    }
}

代码和以前不同的是，这里面加入了分词的代码，将每一个词，当作一个key输出。

reducer的代码

package com.zxj.hadoop.demo.mapreduce.story;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .19:10
 * 说明 统计小说
 */
public class StoryReducer extends Reducer {

    @Override
    protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
        Iterator iterator = values.iterator();
        long num = 0;
        while (iterator.hasNext()){
            LongWritable longWritable = iterator.next();
            num += longWritable.get();
        }
        context.write(new LongWritable(num),key);
    }
}

reducer的代码就是简单的汇总，然后将数据输出到文本中。

此时有必要说一个Combiner，我们先看一个怎么设置一个Combiner

Job job = ..
job.setCombinerClass(SortCombiner.class);//设置Combiner

再看一下Combiner中的需要传一个什么东西

  /**
   * Set the combiner class for the job.
   * @param cls the combiner to use
   * @throws IllegalStateException if the job is submitted
   */
  public void setCombinerClass(Classextends Reducer> cls
                               ) throws IllegalStateException {
    ensureState(JobState.DEFINE);
    conf.setClass(COMBINE_CLASS_ATTR, cls, Reducer.class);
  }

是不是很奇怪，这里竟然是接收一个reducer。那我们能不能直接设置为reducer的类呢？答案是不行的，因为阶段不一样，Combiner是在运行完map后，自行汇总了一次，而Combiner汇总完之后，会再传到reducer进行大汇总。从流程上面来说，是这样子的，我草草画了一个图，可以看一下

这个是原来没有Combiner的图

这是加有Combiner的图

从流程上面看到Mapper后，如果有Combiner，会进行Combiner，再进行Reducer，也就意味着，Mapper的输出，成为了Combiner的输出，且Combiner的输出，成为了Reducer的输入。

但是Combiner需要遵循一个规则。Combiner需要作为一个可插拔的插件，可有可无，就算移除Combiner，也不会对结果造成任何影响。

为什么要使用Combiner呢？就是在各个map中预先进行一次，然后减少在reducer阶段的数据量，这样能提升很高的效率。

贴出Combiner的代码

package com.zxj.hadoop.demo.mapreduce.story;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.Iterator;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .20:03
 * 说明 ...
 */
    public class SortCombiner extends Reducer {
    private LongWritable longWritable = new LongWritable();

    @Override
    protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
        Iterator iterator = values.iterator();
        long num = 0;
        while (iterator.hasNext()){
            LongWritable longWritable = iterator.next();
            num += longWritable.get();
        }
        longWritable.set(num);
        context.write(key,longWritable);
    }
}

可以看到，这里的逻辑与reducer中差不多，其实就是在map阶段进行了一步汇总而已，值得关注的是，输出与输入是一样的，因为Combiner汇总后还是要交给reducer进行大汇总的。

最后看main方法，main方法也差不多，就是加上了设置Combiner的代码而已

package com.zxj.hadoop.demo.mapreduce.story;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .19:14
 * 说明 ...
 */
public class StoryDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        job.setJarByClass(StoryDriver.class);

        job.setMapperClass(StoryMapper.class);
        job.setReducerClass(StoryReducer.class);

        job.setCombinerClass(SortCombiner.class);//设置Combiner

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        boolean b = job.waitForCompletion(true);
        if(b){

        }

    }
}

把小说命名为dpcq.txt,上传到hadoop中，记得文件编码哦，最好是utf-8编码

hadoop fs -mkdir -p /story/input
hadoop fs -put dpcq.txt /story/input

然后打包后，把包含分词器的jar上传到服务器并且在hadoop中运行

hadoop jar hadoop-mapreduce-1.0-a.jar com.zxj.hadoop.demo.mapreduce.story.StoryDriver /story/input /story/output

执行结果如下

但是这并不是我们想要的结果，我们需要它对词汇出现的数量进行排序，所以我们还要进行一个排序的mapreducer

6.4:排序阶段

通过上面的汇总，我们已经得到了每个词分别出现了多少次，这一部分我们要对其进行排序，这一部分极其简单，我们之前也看过排序是怎么做的，实现一个Comparable接口而已，但是实际上我们这里并不需要实现，因为我们是根据词汇出现的次数来排序，我们来看一个LongWritable的源码

可以想象，LongWritable已经实现了排序接口，不需要我们去处理，不过LongWritable实现的是一个正序的排序，我们要拉到最底下才能看到哪个词汇出现了最多，如果我们要看倒序排的话，我们就要自己实现咯，如下就让long类型的数据是倒序排的

package com.zxj.hadoop.demo.mapreduce.story.sort;

import org.apache.hadoop.io.LongWritable;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .21:00
 * 说明 一个倒序的Long
 */
public class MyLongWritable extends LongWritable {
    @Override
    public int compareTo(LongWritable o) {
        if(o.get() > this.get()){
            return 1;
        }else if (o.get() == this.get()){
            return 0;
        }else{
            return -1;
        }
    }
}

这里直接继承了LongWritable，重写了它的排序代码，不过留一个悬念，为什么实现的代码不直接使用

return (int)(o.get() - this.get())

这不是会简单好多吗？为什么不使用呢？大家可以在评论里面回答哈！

好，我们已经定义了一个倒序的MyLongWribable，排序的时候，我们就用它好了

其它的代码就特别简单了，看mapper如下

package com.zxj.hadoop.demo.mapreduce.story.sort;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .20:43
 * 说明 ...
 */
public class SortMapper extends Mapper {
    private Text text = new Text();
    private MyLongWritable longWritable = new MyLongWritable();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String content = value.toString().trim();
        if(!StringUtils.isBlank(content)){
            String[] split = content.split("\t");
            if(split.length == 2){
                long number = Long.parseLong(split[0]);//出现的次数
                String word = split[1];  //词汇
                longWritable.set(number);
                text.set(word);
                context.write(longWritable,text);
            }
        }
    }
}

如果你看明白了上面的一些说明，那么对于这里的代码，肯定是能看的懂的，否则自行回去复习哈。这里为什么输出的key是LongWritable呢？不是写自定义的MyLongWritable呢？因为这个key是hadoop传入的，这里面的值是代码着读取文件的位置，所以我们不能用我们自定义的排序Long，但是其它地方，就可以用了，比如在输出的地方

再看reducer的代码

package com.zxj.hadoop.demo.mapreduce.story.sort;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .20:49
 * 说明 ...
 */
public class SortReducer extends Reducer {
    @Override
    protected void reduce(MyLongWritable key, Iterable values, Context context) throws IOException, InterruptedException {
        context.write(values.iterator().next(),key);
    }
}

再看reducer的代码，那简直是简单到没话说了，给啥就输出啥，现在我们也知道，排序是按照reducer的输入key来进行排序的，那么它就会根据我们自定义的排序规则进行排序。

再看main方法，我甚至都有不想贴main方法的冲动了，没什么可写的嘛。

package com.zxj.hadoop.demo.mapreduce.story.sort;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * @Author 朱小杰
 * 时间 2017-07-29 .20:50
 * 说明 ...
 */
public class SortDriver {

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        job.setJarByClass(SortDriver.class);

        job.setMapperClass(SortMapper.class);
        job.setReducerClass(SortReducer.class);


        job.setMapOutputKeyClass(MyLongWritable.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(MyLongWritable.class);

        FileInputFormat.setInputPaths(job,new Path(args[0]));
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        boolean b = job.waitForCompletion(true);
        if(b){
            //success
        }
    }
}

代码写完了，我们就把它放到mapreducer中运行，打包上传到服务器，直接执行命令

hadoop jar  hadoop-mapreduce-1.0-a.jar com.zxj.hadoop.demo.mapreduce.story.sort.SortDriver /story/output /story/output2

自定义的参数中，第一个/story/output是上一次对小说进行词汇汇总的输出目录，因为我们排序就是要对这个输出结果进行排序，并不是乱写的哦。

执行完成之后，查看结果

出现最多的是逗号，好吧，我们应该排除标点符号的

这些词汇都是分词器进行划分的，与hadoop并无关系，如果觉得词汇表达不准，也可以换一个分词器，或者自己自定义一些词汇。

你可能感兴趣的:(hadoop系列三:mapreduce的使用(一))

java list 按照某个字段排序 csdn2015_ java 开发语言
可以使用Collections.sort()方法对JavaList按照某个字段排序。假设有一个名为personList的List，其中的元素为Person对象，Person对象的某个字段为age，可以按照age字段来排序。第一种方法是通过实现Comparator接口来定义排序规则，然后使用Collections.sort()方法进行排序：publicclassPersonComparatorimp
利用github部署项目浪裡遊 github 前端持续部署
挂载GitHubPages的方法基本步骤创建仓库：在GitHub上创建一个新的仓库。如果使用自定义域名，则仓库名应为.github.io；否则可以是任意名称。启用GitHubPages：进入仓库的设置页面，在“Pages”部分选择要发布的分支（通常是main或master），然后保存更改。上传网站内容：将你的HTML、CSS、JavaScript等文件上传到仓库中指定的分支。什么是SSHSSH是一
Harmony------应用程序框架小码狼 Harmony 学习 harmonyos arkts 鸿蒙
1.应该程序框架基础多Module设计机制模块化开发：一个应用多个功能，每个功能可作为一个模块，Module中可以包含源代码、资源文件、第三方库、配置文件等，每一个Module可以独立编译，实现特定的功能支持多设备：每个Module都会标注所支持的设备类型，可以根据需要组合Module类型ability类型的module：用于实现应用的功能和特性，每个ability类型的module编译后，会生成
easyexcel操作文件读取中某列为null 温馨提示   java
使用EasyExcel.read()方法读取出来某列为空的解决办法最近使用easyexcel操作文件，发现第一列取出来都是null值，排查了半天终于发现了问题，就是Data中属性值严格要求驼峰命名规则，否则识别不出来。@DatapublicclassExcelDateSubject{@ExcelProperty(value="一级标题",index=0)privateStringOneSubjec
C#原型模式：通过克隆对象来优化创建过程江沉晚呤时设计模式 java 开发语言 .netcore microsoft asp.net c#
在软件开发中，创建对象是非常常见的操作。然而，在某些情况下，构造对象的过程可能非常复杂或耗时，特别是当对象的创建涉及多个步骤或者需要初始化大量数据时。为了解决这个问题，**原型模式（PrototypePattern）**应运而生。它允许通过复制一个已有的对象来创建新的对象，从而避免了重复的创建成本和复杂的初始化过程。什么是原型模式？原型模式是一种创建型设计模式，它通过克隆一个现有的对象来生成新对象
适配器模式（Adapter Pattern）详解：如何通过适配器解决接口不兼容的问题江沉晚呤时设计模式服务器 .net 运维 c#.netcore 开发语言适配器模式
在软件开发中，我们经常遇到不同系统或组件之间的接口不兼容的问题。如何将这些不兼容的接口进行整合，使得它们能够协同工作，成为一个常见的挑战。适配器模式（AdapterPattern）正是为了解决这个问题而设计的一种结构型设计模式。适配器模式通过提供一个中间层，将不兼容的接口适配成客户端需要的接口，从而让它们能够正常协作。本文将深入探讨适配器模式的概念、工作原理以及在C#中的实现，帮助开发者在实际项目
C# 建造者模式（Builder Pattern）详细讲解江沉晚呤时设计模式 C#java 数据库开发语言 .netcore c#net javascript
一、什么是建造者模式？建造者模式（BuilderPattern）是一种创建型设计模式，它通过将一个复杂对象的构建过程与其表示分离，使得同样的构建过程可以创建不同的表示。这个模式主要应用于那些构建过程复杂且涉及多个步骤的场景，特别适合于需要灵活配置且逐步构建的对象。1.1.设计模式分类设计模式（DesignPatterns）分为三大类：创建型模式（CreationalPatterns）：关注如何创建
CSMA/CD协议原理与例题苏雨流丰考研复习#计算机网络计算机网络
CSMA/CD协议CSMA/CD（CarrierSenseMultipleAccess/collisiondetection，带有冲突检测的载波侦听多路存取）是IEEE802.3使用的一种媒体访问控制方法。从逻辑上可以划分为两大部分：数据链路层的媒体访问控制子层（MAC）和物理层。它严格对应于ISO开放系统互连模式的最低两层。LLC子层和MAC子层在一起完成OSI模式的数据链路层的功能。CSMA/
测试工程师指南：基于需求文档构建本地安全知识库的完整实战 Python测试之道 python 测试提效安全知识库 python
需求文档是测试工程师日常工作的核心工具，如何快速检索需求文档中的关键信息（文本、表格、图片等），并将其转化为可供AI查询的知识库，是提升工作效率的重要手段。本文将通过对需求文档（docx格式）的处理，详细讲解如何构建一个安全的本地知识库，并通过代码实现具体操作，确保每一步都可落地。一、本地知识库的安全性与连接方案在构建本地知识库时，安全性是首要考虑的因素，尤其是对于需求文档这样的敏感数据。以下是本
python-git- GitHub 45度看我 github
python之git-GitHub一：github原文链接二：WhatisGitHub1>创建仓库2>创建分支3>提交修改4>发起PullRequest三：理解GitHub流四：创建你的GitHub主页1>setting-->“Commitchanges”按钮五：典型的项目1>社区（TheCommunity）2>文档（TheDocs）3>Issue创建一个问题单4>PullRequest六：Git
Python基础语法（二）：条件、循环与运算符算法工程师y python 开发语言
本篇Python基础语法（二）将深入讲解编程中至关重要的条件判断、循环结构和运算符，它们是实现复杂逻辑的基石。一、条件语句（if-elif-else）条件语句用于根据不同的条件执行不同的代码块。Python中用if、elif（elseif的缩写）和else实现。1.基本语法age=18ifage（大于）、大于10>5→True=大于等于5>=5→True3)and(2<4)→Trueor任一条件为
Linux losetup循环设备小米人er 我的博客 losetup linux nuttx
好的，以下是命令的中文解释和使用步骤：命令解释：losetup-r/dev/loop0/system/app.bin：losetup是一个用于将文件与循环设备（loopdevice）关联的命令。-r选项表示将循环设备设置为只读模式。/dev/loop0是使用的循环设备。/system/app.bin是要与循环设备关联的文件。这条命令的作用是将/system/app.bin文件的内容通过/dev/l
上传本地项目到GitHub shi_jiaye 笔记 github
一、在GitHub上创建仓库1.点击右上角头像–>点击Yourrepositories2.点击New3.创建仓库网址复制一下，在后面git上传时会用到二、打开GitBash1.cd进入项目所在路径2.输入gitinit在当前项目的目录中生成本地的git管理（当前目录下出现.git文件夹）3.输入gitadd.将项目上的文件添加到仓库中注意add后面有个空格，后面还有个点4.输入gitcommit-
NET Markdown 解析神器--Markdig dotNET跨平台
Markdig是一款快速、强大、符合CommonMark标准、可扩展的.NETMarkdown处理器。Markdig是一个为.NET平台设计的快速、强大且易于扩展的Markdown处理器，它完全符合CommonMark标准。这个库以其卓越的性能和丰富的功能集而著称，包括一个无需正则表达式的快速解析器和HTML渲染器，以及对垃圾回收压力的最小化。核心特性Markdig的一些显著特性包括：1.极速性能
Vue+EasyExcel来操作文件上传下载 ☆夜幕星河℡ SpringBoot Excel EasyExcel java vue spring boot
Excel导入导出在管理一个系统时，总会有许多的数据，为了方便浏览查看数据，系统总会提供「导出Excel」的功能；有导出就有导入，在要向数据库中插入大量的数据时，我们向程序提供准备好的Excel，然后程序读取表格内容，并将数据添加到数据库中。实现这个「导入/导出Excel」的功能也不复杂，我们使用第三方的类库即可实现。比如：Apachepoi、JavaExcel（JXL）和阿里巴巴开源的Easye
大学生论文“AI味儿”渐浓？使用AI工具各高校“立规矩”了→央视新闻2025-02-26 18:39——大家觉得ai到底好不好——会不会像手机一样，也是一把双刃剑——大家要好好利用即可，不是猛兽！分享是一种传递，一种快乐杂学百货铺-啥都学人工智能
大学生论文“AI味儿”渐浓？使用AI工具各高校“立规矩”了→央视新闻2025-02-2618:39——大家觉得ai到底好不好——会不会像手机一样，也是一把双刃剑——大家要好好利用即可，不是猛兽！百度首页大学生论文“AI味儿”渐浓？使用AI工具各高校“立规
西门子PLC转MQTT协议OPC协议网关应用案例钡铼技术网关工业物联网关钡铼PLC采集网关 PLC物联网关物联网 PLC转MQTT网关 PLC采集网关嵌入式OPC UA网关
BL102是一款采集西门子、三菱、欧姆龙、台达、AB、施耐德等各种PLC数据转换为ModbusTCP、OPCUA、MQTT、华为云IoT、亚马逊云IoT、阿里云IoT、金鸽云等协议的网关。BL102下行支持：西门子、三菱、欧姆龙、台达、AB、施耐德等各种PLC。BL102上行支持：ModbusTCP、MQTT、OPCUA、华为云IoT、阿里云IoT、AWSIoT、金鸽云等协议。BL102支持OPC
一次Linux下 .net 调试经历 norsd C#VC Debug linux .net
背景：Xt160Api,之前在windows下用.net调用，没有任何问题。但是移植到Linux去后，.net程序调用init(config_path)总是报错/root/test找不到traderApi.ini(/root/test是程序目录)然后退出程序解决过程:于是考虑是不是参数传错了，但是无论这个参数是什么，报错内容始终如此。甚至某些情况下，比如加了几句Console.WriteLIne(
每日一题之地宫取宝 Ace＇算法数据结构
题目描述X国王有一个地宫宝库。是n×m个格子的矩阵。每个格子放一件宝贝。每个宝贝贴着价值标签。地宫的入口在左上角，出口在右下角。小明被带到地宫的入口，国王要求他只能向右或向下行走。走过某个格子时，如果那个格子中的宝贝价值比小明手中任意宝贝价值都大，小明就可以拿起它（当然，也可以不拿）。当小明走到出口时，如果他手中的宝贝恰好是k件，则这些宝贝就可以送给小明。请你帮小明算一算，在给定的局面下，他有多少
【技术解密】本地部署 DeepSeek-V3：完整指南海棠AI实验室 “智元启示录“-AI发展的深度思考与未来展望人工智能深度学习 DeepSeek
目录引言运行环境需求下载与安装推理部署总结参考资源引言随着人工智能的快速发展，开源大模型正逐步改变着技术生态。DeepSeek-V3作为最新的开源大模型之一，不仅提供了强大的推理能力，同时也支持本地部署，使开发者可以灵活地进行自定义优化。本文将详细介绍如何在本地部署DeepSeek-V3，涵盖系统要求、安装步骤、模型转换及不同推理框架的应用。1.运行环境需求1.1硬件要求✅NVIDIAGPU（支持
JavaScript松散比较与严格比较 hzw0510 前端开发 javascript 开发语言 ecmascript
在JavaScript中，==（双等号）和===（三等号）都用于比较两个值，但它们的比较方式有显著区别。以下是它们的详细对比：1.==（双等号）名称:松散相等（LooseEquality）行为:在比较之前会尝试进行类型转换，将两个值转换为相同类型后再比较。示例:console.log(5=="5")
Markdig：强大的.NET Markdown处理器牧爱颖Kelvin
Markdig：强大的.NETMarkdown处理器markdigAfast,powerful,CommonMarkcompliant,extensibleMarkdownprocessorfor.NET项目地址:https://gitcode.com/gh_mirrors/ma/markdig项目基础介绍和主要编程语言Markdig是一个快速、强大且符合CommonMark标准的可扩展Markd
推荐文章：GPU 基于顶点着色器的高效动画系统 for Unity.Entities 劳治亮
推荐文章：GPU基于顶点着色器的高效动画系统forUnity.Entities去发现同类优质开源项目:https://gitcode.com/1、项目介绍该项目是一个专为Unity.Entities设计的轻量级但极快的GPU顶点着色器动画系统。灵感源自Nordeus和Unity的合作，经过优化和调整，以适应最新的实体系统。它特别适用于需要大量角色各自独特动画的情况，且动画状态处理极其简单。2、项目
人工智能伦理与可持续发展 CarlowZJ 人工智能
前言人工智能（AI）技术正在深刻地改变我们的生活和工作方式。从自动驾驶汽车到智能医疗系统，从个性化推荐到自动化决策，AI的应用无处不在。然而，随着技术的快速发展，其伦理和社会影响也引发了广泛的关注。人工智能伦理不仅涉及技术本身的公平性、透明性和安全性，还涉及到更广泛的社会、经济和环境影响。本文将探讨人工智能伦理的核心问题，并从可持续发展的角度提出应对策略。一、人工智能伦理的核心问题1.1数据隐私与
[Unity] GPU动画实现（四）——生成动画数据 Zhidai_ Unity unity 动画游戏引擎
目前使用的方法有一个很大缺陷在于基于顶点生成的动画占用的空间很大，一个理想的情况是基于骨骼数据，本文权当抛砖引玉，后续有时间考虑尝试一下基于骨骼数据生成动画。本文内容大量参考自白菊花瓣丶的视频，感谢！生成动画数据需要用到ComputeShader来提高运行的效率，首先在Resources下创建这样一个computeshader，在这里我将其命名为"AnimVertices"。#pragmakern
[Unity] GPU动画实现（一）——介绍 Zhidai_ Unity unity 动画游戏引擎游戏开发
当谈到戴森球计划的时候，我师兄说里面的动画都是一个叫GPU动画的东西来实现的，几乎一切图形功能名字扯到GPU的时候，通常都是高性能的体现，让我不禁好奇GPU动画是什么东西。定义首先什么是GPU动画，GPU动画是将动画的顶点信息记录下来，通过Shader在顶点着色器阶段重新设置顶点的位置，从而渲染出动画。因此GPU动画是典型的空间换时间的方案。实现步骤本系列主要目标是实现GPU动画，围绕这个目标要做
unity 判断当前设备是否是模拟器（安卓） QO_GQ unity android unity 游戏引擎
最近有个需求，需要判断当前设备是否是模拟器，网上查了一下，发现基本上都是使用特征字符串进行检索，类似这种：if(SystemInfo.deviceModel.Contains("Emulator")||SystemInfo.deviceModel.Contains("AndroidSDK"))returntrue;if(SystemInfo.deviceName.Contains("Android
异步加载回调问题，多个资源异步加载，让其全部加载完回调 QO_GQ lua 开发语言
刚刚遇到一个bug，场景大图需要提前加载，所有场景组件就需要等图加载好后再处理，但是由于场景背景图是异步加载的，并且是同时for循环进行异步加载的，所以无法确认哪一个图是最后加载出来的，抱着试一试的心态，这样写了一下，发现没用：---记载所有背景图functionLoadBgRawImage(mapID,func)fori=1,lendo...localpath="xxx\xxx\xxx.jpg"
unity shader卡通渲染（描边）+阴影+多光源处理 QO_GQ shader shader unity3d
说道卡通渲染，应该都会想到描边：我所学的描边有三种：一种是计算边缘深度检测描边一种是色差检测描边一种是利用顶点法线向外扩展返回单色pass，使用正面裁剪我用的第三种：pass{//剔除前面（朝向摄像机的面）保留内部渲染CullFrontCGPROGRAM#pragmafragmentfragfloat4_LineColor;float_Line;structa2v{float4pos:POSITI
TRS收益互换系统开发为何敢称“无限拓展”？模块化架构+弹性集群揭秘！ Ashlee_code 架构 python java c++c语言
《【券商震惊】传统询价3小时→TRS黑科技10分钟！盈立证券交易量暴增150%背后秘密》开篇：询价耗时3小时？券商正在被低效“慢性杀死”电话询价、邮件比价、Excel汇总——传统场外交易中，一次询价流程动辄数小时，客户流失率高达40%！TRS收益互换平台，依托DeepSeek动态定价算法与多发行方实时比价引擎，将询价响应时间从3小时压缩至10分钟，助力盈立证券交易量飙升150%，彻底改写行业游戏规
基本数据类型和引用类型的初始值 3213213333332132 java基础
package com.array; /** * @Description 测试初始值 * @author FuJianyong * 2015-1-22上午10:31:53 */ public class ArrayTest { ArrayTest at; String str; byte bt; short s; int i; long
摘抄笔记--《编写高质量代码：改善Java程序的151个建议》白糖_ 高质量代码
记得3年前刚到公司，同桌同事见我无事可做就借我看《编写高质量代码：改善Java程序的151个建议》这本书，当时看了几页没上心就没研究了。到上个月在公司偶然看到，于是乎又找来看看，我的天，真是非常多的干货，对于我这种静不下心的人真是帮助莫大呀。看完整本书，也记了不少笔记
【备忘】Django 常用命令及最佳实践 dongwei_6688 django
注意：本文基于 Django 1.8.2 版本生成数据库迁移脚本（python 脚本） python manage.py makemigrations polls 说明：polls 是你的应用名字，运行该命令时需要根据你的应用名字进行调整查看该次迁移需要执行的 SQL 语句（只查看语句，并不应用到数据库上）： python manage.p
阶乘算法之一N! 末尾有多少个零周凡杨 java 算法阶乘面试效率
&n
spring注入servlet g21121 Spring注入
传统的配置方法是无法将bean或属性直接注入到servlet中的，配置代理servlet亦比较麻烦，这里其实有比较简单的方法，其实就是在servlet的init()方法中加入要注入的内容： ServletContext application = getServletContext(); WebApplicationContext wac = WebApplicationContextUtil
Jenkins 命令行操作说明文档 510888780 centos
假设Jenkins的URL为http://22.11.140.38:9080/jenkins/ 基本的格式为 java 基本的格式为 java -jar jenkins-cli.jar [-s JENKINS_URL] command [options][args] 下面具体介绍各个命令的作用及基本使用方法 1. &nb
UnicodeBlock检测中文用法布衣凌宇 UnicodeBlock
/** * 判断输入的是汉字 */ public static boolean isChinese(char c) { Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
java下实现调用oracle的存储过程和函数 aijuans java orale
1.创建表：STOCK_PRICES 2.插入测试数据： 3.建立一个返回游标： PKG_PUB_UTILS 4.创建和存储过程：P_GET_PRICE 5.创建函数： 6.JAVA调用存储过程返回结果集 JDBCoracle10G_INVO
Velocity Toolbox antlove 模板 tool box velocity
velocity.VelocityUtil package velocity; import org.apache.velocity.Template; import org.apache.velocity.app.Velocity; import org.apache.velocity.app.VelocityEngine; import org.apache.velocity.c
JAVA正则表达式匹配基础百合不是茶 java 正则表达式的匹配
正则表达式;提高程序的性能,简化代码,提高代码的可读性,简化对字符串的操作正则表达式的用途; 字符串的匹配字符串的分割字符串的查找字符串的替换正则表达式的验证语法 [a] //[]表示这个字符只出现一次 ,[a] 表示a只出现一
是否使用EL表达式的配置 bijian1013 jsp web.xml EL EasyTemplate
今天在开发过程中发现一个细节问题，由于前端采用EasyTemplate模板方法实现数据展示，但老是不能正常显示出来。后来发现竟是EL将我的EasyTemplate的${...}解释执行了，导致我的模板不能正常展示后台数据。网
精通Oracle10编程SQL(1-3)PLSQL基础 bijian1013 oracle 数据库 plsql
--只包含执行部分的PL/SQL块 --set serveroutput off begin dbms_output.put_line('Hello,everyone!'); end; select * from emp; --包含定义部分和执行部分的PL/SQL块 declare v_ename varchar2(5); begin select
【Nginx三】Nginx作为反向代理服务器 bit1129 nginx
Nginx一个常用的功能是作为代理服务器。代理服务器通常完成如下的功能：接受客户端请求将请求转发给被代理的服务器从被代理的服务器获得响应结果把响应结果返回给客户端实例本文把Nginx配置成一个简单的代理服务器对于静态的html和图片，直接从Nginx获取对于动态的页面，例如JSP或者Servlet，Nginx则将请求转发给Res
Plugin execution not covered by lifecycle configuration: org.apache.maven.plugin blackproof maven 报错
转：http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin maven报错： Plugin execution not covered by lifecycle configuration:
发布docker程序到marathon ronin47 docker 发布应用
1 发布docker程序到marathon 1.1 搭建私有docker registry 1.1.1 安装docker regisry docker pull docker-registry docker run -t -p 5000:5000 docker-registry 下载docker镜像并发布到私有registry docker pull consol/tomcat-8.0
java-57-用两个栈实现队列&&用两个队列实现一个栈 bylijinnan java
import java.util.ArrayList; import java.util.List; import java.util.Stack; /* * Q 57 用两个栈实现队列 */ public class QueueImplementByTwoStacks { private Stack<Integer> stack1; pr
Nginx配置性能优化 cfyme nginx
转载地址：http://blog.csdn.net/xifeijian/article/details/20956605 大多数的Nginx安装指南告诉你如下基础知识——通过apt-get安装，修改这里或那里的几行配置，好了，你已经有了一个Web服务器了。而且，在大多数情况下，一个常规安装的nginx对你的网站来说已经能很好地工作了。然而，如果你真的想挤压出Nginx的性能，你必
[JAVA图形图像]JAVA体系需要稳扎稳打,逐步推进图像图形处理技术 comsci java
对图形图像进行精确处理，需要大量的数学工具，即使是从底层硬件模拟层开始设计，也离不开大量的数学工具包，因为我认为，JAVA语言体系在图形图像处理模块上面的研发工作，需要从开发一些基础的，类似实时数学函数构造器和解析器的软件包入手，而不是急于利用第三方代码工具来实现一个不严格的图形图像处理软件...... &nb
MonkeyRunner的使用 dai_lm android MonkeyRunner
要使用MonkeyRunner，就要学习使用Python，哎先抄一段官方doc里的代码作用是启动一个程序（应该是启动程序默认的Activity），然后按MENU键，并截屏 # Imports the monkeyrunner modules used by this program from com.android.monkeyrunner import MonkeyRun
Hadoop-- 海量文件的分布式计算处理方案 datamachine mapreduce hadoop 分布式计算
csdn的一个关于hadoop的分布式处理方案，存档。原帖：http://blog.csdn.net/calvinxiu/article/details/1506112。 Hadoop 是Google MapReduce的一个Java实现。MapReduce是一种简化的分布式编程模式，让程序自动分布到一个由普通机器组成的超大集群上并发执行。就如同ja
以資料庫驗證登入 dcj3sjt126com yii
以資料庫驗證登入由於 Yii 內定的原始框架程式, 採用綁定在UserIdentity.php 的 demo 與 admin 帳號密碼: public function authenticate() { $users=array( &nbs
github做webhooks：[2]php版本自动触发更新 dcj3sjt126com github git webhooks
上次已经说过了如何在github控制面板做查看url的返回信息了。这次就到了直接贴钩子代码的时候了。工具/原料 git github 方法/步骤在github的setting里面的webhooks里把我们的url地址填进去。钩子更新的代码如下： error_reportin
Eos开发常用表达式蕃薯耀 Eos开发 Eos入门 Eos开发常用表达式
Eos开发常用表达式 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2014年8月18日 15:03:35 星期一 &
SpringSecurity3.X--SpEL 表达式 hanqunfeng SpringSecurity
使用 Spring 表达式语言配置访问控制，要实现这一功能的直接方式是在<http>配置元素上添加 use-expressions 属性： <http auto-config="true" use-expressions="true"> 这样就会在投票器中自动增加一个投票器：org.springframework
Redis vs Memcache IXHONG redis
1. Redis中，并不是所有的数据都一直存储在内存中的，这是和Memcached相比一个最大的区别。 2. Redis不仅仅支持简单的k/v类型的数据，同时还提供list，set，hash等数据结构的存储。 3. Redis支持数据的备份，即master-slave模式的数据备份。 4. Redis支持数据的持久化，可以将内存中的数据保持在磁盘中，重启的时候可以再次加载进行使用。 Red
Python - 装饰器使用过程中的误区解读 kvhur JavaScript jquery html5 css
大家都知道装饰器是一个很著名的设计模式，经常被用于AOP(面向切面编程)的场景，较为经典的有插入日志，性能测试，事务处理，Web权限校验， Cache等。原文链接：http://www.gbtags.com/gb/share/5563.htm Python语言本身提供了装饰器语法（@），典型的装饰器实现如下： @function_wrapper de
架构师之mybatis-----update 带case when 针对多种情况更新 nannan408 case when
1.前言. 如题. 2. 代码. <update id="batchUpdate" parameterType="java.util.List"> <foreach collection="list" item="list" index=&
Algorithm算法视频教程栏目记者 Algorithm 算法
课程：Algorithm算法视频教程百度网盘下载地址： http://pan.baidu.com/s/1qWFjjQW 密码: 2mji 程序写的好不好,还得看算法屌不屌！Algorithm算法博大精深。一、课程内容：课时1、算法的基本概念 + Sequential search 课时2、Binary search 课时3、Hash table 课时4、Algor
C语言算法之冒泡排序 qiufeihu c 算法
任意输入10个数字由小到大进行排序。代码： #include <stdio.h> int main() { int i,j,t,a[11]; /*定义变量及数组为基本类型*/ for(i = 1;i < 11;i++){ scanf("%d",&a[i]); /*从键盘中输入10个数*/ } for
JSP异常处理 wyzuomumu Web jsp
1.在可能发生异常的网页中通过指令将HTTP请求转发给另一个专门处理异常的网页中: <%@ page errorPage="errors.jsp"%> 2.在处理异常的网页中做如下声明： errors.jsp: <%@ page isErrorPage="true"%>，这样设置完后就可以在网页中直接访问exc