hadoop-streaming + python 执行hadoop处理程序

命令如下:

hadoop jar /usr/local/hadoop/hadoop-streaming-0.23.6.jar \

-input /hdfs/input/path -output /hdfs/output/path \

-mapper "python mapper.py" -reducer "python reducer.py" \

-file mapper.py -file reducer.py

注意事项:

hdfs用户执行;

-input和-output为hdfs路径,且output路径应该为不存在的路径;

-mapper和-reducer中py需加python *.py

-file为必需项,将本地*.py文件打包放到集群上,供集群其他机器执行;

你可能感兴趣的:(大数据集群,MapReduce,Python)