使用hadoop mapperreduce来统计大文件的行数

hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.0.1.jar -files count.sh -input /input/CHANGES.txt -output /output/test -mapper 'wc -l' -reducer "sh count.sh"

count.sh

#!/bin/bash
count=0
while read LINE;do
  count=$(($count+$LINE))
done
echo $count

运行成功后在hdfs目录下会产生结果文件

 hadoop fs -cat /output/test/part-00000


你可能感兴趣的:(hadoop,Streaming,mapperreduce)