Word Count (Map Reduce)

Question

Using map reduce to count word frequency.

Example
chunk1: "Google Bye GoodBye Hadoop code"
chunk2: "lintcode code Bye"
Get MapReduce result:
Bye: 2

GoodBye: 1

Google: 1

Hadoop: 1

code: 2

lintcode: 1

Solution

MapReduce的map和reduce基本操作。

class WordCount:

    # @param {str} line a text, for example "Bye Bye see you next"
    def mapper(self, _, line):
        # Write your code here
        # Please use 'yield key, value'
        
        # 这个实际上就是单纯的统计词频,但是使用yield,结果会被buffer收集起来的
        for word in line.split():
            yield word, 1

    # @param key is from mapper
    # @param values is a set of value with the same key
    def reducer(self, key, values):
        # Write your code here
        # Please use 'yield key, value'
        
        # values 是一组数字,代表key在不同的mapper或者chunck里面出现的次数
        yield key, sum(values)
Word Count (Map Reduce)_第1张图片
image.png

你可能感兴趣的:(Word Count (Map Reduce))