用Python编写一个MapReduce程序

本文基于实验室已经搭建好的Hadoop平台。

参考http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

1.编写mapper.py

#!/usr/bin/python2.6
import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print '%s %s' % (word, 1)

2.编写reducer.py

#!/usr/bin/python

import sys
from operator import itemgetter

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split(' ', 1)

    try:
        count = int(count)
    except ValueError:
        continue

    if current_word == word:
        current_count += count
    else:
        if current_word:
            print '%s %s' % (current_word, current_count)
        current_word = word
        current_count = count

if current_word == word:
    print '%s %s' % (current_word, current_count)

 

3.将mapper.py和reducer.py上传到HadoopMaster上/home/hduser目录下

143234_YnkJ_2704218.png

确保这两个文件具有执行权限:chmod +x /home/hduser/mapper.py

                                                      chmod +x /home/hduser/reducer.py

注意若执行时出现如下错误: /usr/bin/python^M: bad interpreter: No such file or directory

原因是:在Windows下编写的文件格式是dos,将文件上传到HadoopMaster(Linux系统)后,需要将文件格式修改为unix            

vi filename      # 打开文件
:set ff          # 查看文件格式
:set ff=unix     # 设置文件格式
:wq              # 保存并退出

 

4.通过Hue将测试文件上传至HDFS上

用Python编写一个MapReduce程序_第1张图片

 

5.切换之hdfs用户,执行hadoop jar 命令

144916_TPgG_2704218.png

用Python编写一个MapReduce程序_第2张图片

 

6.执行结果

用Python编写一个MapReduce程序_第3张图片

 

 

 

 

转载于:https://my.oschina.net/wolfoxliu/blog/901912

你可能感兴趣的:(python,大数据,操作系统)