hadoop2.2编程:用ruby跑hadoop的完整实例

Becareful! 
All nodes include  need to install ruby!
 1 #!/usr/bin/ruby

 2 # Ruby code for map.rb

 3 

 4 ARGF.each do |line|

 5 

 6    # remove any newline

 7    line = line.chomp

 8 

 9    # do nothing will lines shorter than 2 characters

10    next if ! line || line.length < 2

11 

12    # grab our key as the two-character prefix (lower-cased)

13    key = line[0,2].downcase

14 

15    # value is a count of 1 occurence

16    value = 1

17 

18    # output to STDOUT

19    # <key><tab><value><newline>

20    puts key + "\t" + value.to_s

21 

22 end
 1 #!/usr/bin/ruby 

 2 # Ruby code for reduce.rb

 3 

 4 prev_key = nil 

 5 key_total = 0 

 6 

 7 ARGF.each do |line|

 8 

 9    # remove any newline

10    line = line.chomp

11 

12    # split key and value on tab character

13    (key, value) = line.split(/\t/)

14 

15    # check for new key

16    if prev_key && key != prev_key && key_total > 0 

17 

18       # output total for previous key

19 

20       # <key><tab><value><newline>

21       puts prev_key + "\t" + key_total.to_s

22 

23       # reset key total for new key

24       prev_key = key 

25       key_total = 0 

26 

27    elsif ! prev_key

28       prev_key = key 

29 

30    end 

31 

32    # add to count for this current key

33    key_total += value.to_i

34 

35 end
 1 #!/bin/bash

 2 

 3 HADOOP_HOME=/home/grid/hadoop

 4 JAR=contrib/streaming/hadoop-0.20.2-streaming.jar

 5 

 6 HSTREAMING="$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/$JAR"

 7 

 8 $HSTREAMING \

 9  -mapper  'map.rb' \          # or -mapper 'ruby map.rb'

10  -reducer 'reduce.rb' \       # or -reducer 'reducer.rb'

11  -file map.rb \                     # file path does not need specify full path

12  -file reduce.rb \

13  -input '/user/grid/input/*' \

14  -output '/user/grid/output'
 cmd line:

%bin/hadoop jar ~/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -input NCDC/files -output output -mapper Map.rb -reducer Reduce.rb

 

你可能感兴趣的:(hadoop2)