499.单词计数 (Map Reduce版本)

知识点
[https://blog.csdn.net/catoop/article/details/50630106](StringTokenizer 用法),在大数据操作时, StringTokenizer 性能要优于 split 和 substring 方法

代码

/**
 * Definition of OutputCollector:
 * class OutputCollector {
 *     public void collect(K key, V value);
 *         // Adds a key/value pair to the output buffer
 * }
 */
public class WordCount {

    public static class Map {
        public void map(String key, String value, OutputCollector output) {
            // Write your code here
            // Output the results into output buffer.
            // Ps. output.collect(String key, int value);
            StringTokenizer tokenizer = new StringTokenizer(value);
            while (tokenizer.hasMoreTokens()) {
                String word = tokenizer.nextToken();
                output.collect(word, 1);
            }
            
        }
    }

    // 将同一个 key 的所有 value 聚集到同一个集合中
    public static class Reduce {
        public void reduce(String key, Iterator values,
                           OutputCollector output) {
            // Write your code here
            // Output the results into output buffer.
            // Ps. output.collect(String key, int value);
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next();
            }
            output.collect(key, sum);
        }
    }
}```

你可能感兴趣的:(499.单词计数 (Map Reduce版本))