新手发帖,很多方面都是刚入门,有错误的地方请大家见谅,欢送批评指正
前次给出了hadoop之测试KMeans(一):运行源码实例,这次来分析一下整个MapReduce的输出结果。测试数据文件依然是文一中提到的15组数据:
(20,30) (50,61) (20,32) (50,64) (59,67)(24,34) (19,39) (20,32) (50,65) (50,77) (20,30) (20,31) (20,32) (50,64) (50,67)
先上一张我懂得的这个程序的一个流程图,尤其注意数据<key, value>的输入输出方面。
现在开始分析输出结果,旁边用--***--的是我在程序中加的println输出的
--main::start--//开始进入KMeans中的Main函数
--CenterInitial::run--//开始进入CenterInitial.java,初始化聚类中央操纵:CenterInitial centerInitial = new CenterInitial();
CenterInitial::The initial centeris:(50,61) (50,64) (20,30)//初始时随机选择K个不同的中央点,存入HDFS中的center文件中
//初始化实现后启动job,进入Map-->Reduce进程
13/05/28 11:31:33 WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java classes where applicable
13/05/28 11:31:33 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.
13/05/28 11:31:33 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).
13/05/28 11:31:33 INFOinput.FileInputFormat: Total input paths to process : 1
13/05/28 11:31:33 WARN snappy.LoadSnappy:Snappy native library not loaded
13/05/28 11:31:33 INFO mapred.JobClient:Running job: job_local_0001
13/05/28 11:31:33 INFO util.ProcessTree:setsid exited with exit code 0
13/05/28 11:31:33 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6754d6
13/05/28 11:31:33 INFO mapred.MapTask:io.sort.mb = 100
13/05/28 11:31:33 INFO mapred.MapTask: databuffer = 79691776/99614720
13/05/28 11:31:33 INFO mapred.MapTask:record buffer = 262144/327680
//进入KMapper.java, 首先调用的是setup函数,实现开始初始化聚类中央的数据读入,存入KMapper类全局变量center中,至于为什么程序会自动调用setup函数,在hadoop API的文档中有说明:
/*The framework first calls setup(org.apache.hadoop.mapreduce.Mapper.Context),
followed by map(Object, Object, Context) for each key/value pair in the InputSplit.
Finally cleanup(Context) is called.*/
--Mapper::setup--start--
--Mapper::setup--end--
--Mapper::map--start--//setup函数结束后,调用map函数,这里通过调试可以看出,map的输入参数<key, value> = <0, 文件cluster的15组数据>,系统默许的读入<key, value>,通过map函数处理,输出的<key, value>对如下:
center[pos]:(20,30)outvalue:(20,30)
center[pos]:(50,61)outvalue:(50,61)
center[pos]:(20,30)outvalue:(20,32)
center[pos]:(50,64)outvalue:(50,64)
center[pos]:(50,64)outvalue:(59,67)
center[pos]:(20,30)outvalue:(24,34)
center[pos]:(20,30)outvalue:(19,39)
center[pos]:(20,30)outvalue:(20,32)
center[pos]:(50,64)outvalue:(50,65)
center[pos]:(50,64)outvalue:(50,77)
center[pos]:(20,30)outvalue:(20,30)
center[pos]:(20,30)outvalue:(20,31)
center[pos]:(20,30)outvalue:(20,32)
center[pos]:(50,64)outvalue:(50,64)
center[pos]:(50,64)outvalue:(50,67)
//从输出可以看出,输出的<key, value>对的value值是15个数据点,其key值是对应的到全部中央距离最小的中央值
--Mapper::map--end--//map结束
13/05/28 11:31:33 INFO mapred.MapTask:Starting flush of map output
13/05/28 11:31:33 INFO mapred.MapTask:Finished spill 0
13/05/28 11:31:33 INFO mapred.Task:Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/05/28 11:31:34 INFOmapred.JobClient: map 0% reduce 0%
13/05/28 11:31:36 INFOmapred.LocalJobRunner:
13/05/28 11:31:36 INFO mapred.Task: Task'attempt_local_0001_m_000000_0' done.
13/05/28 11:31:36 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@78bc3b
13/05/28 11:31:36 INFOmapred.LocalJobRunner:
13/05/28 11:31:36 INFO mapred.Merger:Merging 1 sorted segments
13/05/28 11:31:36 INFO mapred.Merger: Downto the last merge-pass, with 1 segments left of total size: 272 bytes
13/05/28 11:31:36 INFOmapred.LocalJobRunner:
--KReducer::reduce--start--(20,30)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@12c3327 //开始reduce操纵,从这里可以看出,这个reduce的输入参数的key是(20,30)这个聚类中央,value是这个聚类中央对应的map中计算的距离最小的8个数据点,一共有三聚类中央,有三个reduce
(20.375,32.5)//KReduce结束前的这组数据的新的中央
key:(20,30)outval+center:(20,30) (20,32)(24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)//这是reduce的输出<key, value>
--KReducer::reduce--end--//我懂得这个reduce是把输入key中对应的数据点进行合并,分为三个reduce进行合并,如果进行调试也可以看出分三个中央的reduce进行分离处理
--KReducer::reduce--start--(50,61)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@12c3327//如上,第二个reduce进程,合并中央为(50,61)的数据点
(50.0,61.0)
key:(50,61)outval+center:(50,61)(50.0,61.0)
--KReducer::reduce--end--
--KReducer::reduce--start--(50,64)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@12c3327//如上,第三个reduce进程,合并中央为(50,61)的数据点
(51.5,67.333336)
key:(50,64)outval+center:(50,65) (50,64)(59,67) (50,77) (50,67) (50,64) (51.5,67.333336)
--KReducer::reduce--end--
13/05/28 11:31:36 INFO mapred.Task:Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
13/05/28 11:31:36 INFOmapred.LocalJobRunner:
13/05/28 11:31:36 INFO mapred.Task: Taskattempt_local_0001_r_000000_0 is allowed to commit now
13/05/28 11:31:36 INFOoutput.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to hdfs://192.168.56.171:9000/ouput
13/05/28 11:31:37 INFOmapred.JobClient: map 100% reduce 0%
13/05/28 11:31:39 INFOmapred.LocalJobRunner: reduce > reduce
13/05/28 11:31:39 INFO mapred.Task: Task'attempt_local_0001_r_000000_0' done.
13/05/28 11:31:40 INFOmapred.JobClient: map 100% reduce 100%
13/05/28 11:31:40 INFO mapred.JobClient:Job complete: job_local_0001
13/05/28 11:31:40 INFO mapred.JobClient:Counters: 22
13/05/28 11:31:40 INFOmapred.JobClient: File Output FormatCounters
13/05/28 11:31:40 INFOmapred.JobClient: Bytes Written=187
13/05/28 11:31:40 INFOmapred.JobClient: FileSystemCounters
13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_READ=598
13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_READ=532
13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_WRITTEN=81540
13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=235
13/05/28 11:31:40 INFO mapred.JobClient: File Input Format Counters
13/05/28 11:31:40 INFOmapred.JobClient: Bytes Read=121
13/05/28 11:31:40 INFOmapred.JobClient: Map-Reduce Framework
13/05/28 11:31:40 INFOmapred.JobClient: Map outputmaterialized bytes=276
13/05/28 11:31:40 INFOmapred.JobClient: Map input records=1
13/05/28 11:31:40 INFOmapred.JobClient: Reduce shufflebytes=0
13/05/28 11:31:40 INFOmapred.JobClient: Spilled Records=30
13/05/28 11:31:40 INFOmapred.JobClient: Map outputbytes=240
13/05/28 11:31:40 INFOmapred.JobClient: Total committedheap usage (bytes)=258342912
13/05/28 11:31:40 INFOmapred.JobClient: CPU time spent(ms)=0
13/05/28 11:31:40 INFOmapred.JobClient: SPLIT_RAW_BYTES=107
13/05/28 11:31:40 INFO mapred.JobClient: Combine input records=0
13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputrecords=15//输入的记载组数15组
13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputgroups=3//三组reduce
13/05/28 11:31:40 INFOmapred.JobClient: Combine outputrecords=0
13/05/28 11:31:40 INFOmapred.JobClient: Physical memory(bytes) snapshot=0
13/05/28 11:31:40 INFOmapred.JobClient: Reduce outputrecords=3
13/05/28 11:31:40 INFOmapred.JobClient: Virtual memory(bytes) snapshot=0
13/05/28 11:31:40 INFO mapred.JobClient: Map output records=15
13/05/28 11:31:40 INFO mapred.JobClient:Running job: job_local_0001
13/05/28 11:31:40 INFO mapred.JobClient:Job complete: job_local_0001
13/05/28 11:31:40 INFO mapred.JobClient:Counters: 22
13/05/28 11:31:40 INFO mapred.JobClient: File Output Format Counters
13/05/28 11:31:40 INFOmapred.JobClient: Bytes Written=187
13/05/28 11:31:40 INFOmapred.JobClient: FileSystemCounters
13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_READ=598
13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_READ=532
13/05/28 11:31:40 INFOmapred.JobClient: FILE_BYTES_WRITTEN=81540
13/05/28 11:31:40 INFOmapred.JobClient: HDFS_BYTES_WRITTEN=235
13/05/28 11:31:40 INFOmapred.JobClient: File Input FormatCounters
13/05/28 11:31:40 INFOmapred.JobClient: Bytes Read=121
13/05/28 11:31:40 INFOmapred.JobClient: Map-Reduce Framework
13/05/28 11:31:40 INFOmapred.JobClient: Map outputmaterialized bytes=276
13/05/28 11:31:40 INFOmapred.JobClient: Map input records=1
13/05/28 11:31:40 INFOmapred.JobClient: Reduce shufflebytes=0
13/05/28 11:31:40 INFOmapred.JobClient: Spilled Records=30
13/05/28 11:31:40 INFOmapred.JobClient: Map outputbytes=240
13/05/28 11:31:40 INFOmapred.JobClient: Total committedheap usage (bytes)=258342912
13/05/28 11:31:40 INFOmapred.JobClient: CPU time spent(ms)=0
13/05/28 11:31:40 INFOmapred.JobClient: SPLIT_RAW_BYTES=107
13/05/28 11:31:40 INFOmapred.JobClient: Combine inputrecords=0
13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputrecords=15
13/05/28 11:31:40 INFOmapred.JobClient: Reduce inputgroups=3
13/05/28 11:31:40 INFOmapred.JobClient: Combine outputrecords=0
13/05/28 11:31:40 INFOmapred.JobClient: Physical memory(bytes) snapshot=0
13/05/28 11:31:40 INFOmapred.JobClient: Reduce outputrecords=3
13/05/28 11:31:40 INFOmapred.JobClient: Virtual memory(bytes) snapshot=0
13/05/28 11:31:40 INFO mapred.JobClient: Map output records=15
--NewCenter::run--start--//计算新的中央函数开始,这个函数首先从reduce的输出文件/part-r-00000中读取输出结果,即上面解释过的reduce的输出<key, value>,如下
(20,30) (20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)
(50,61) (50,61)(50.0,61.0)
(50,64) (50,65)(50,64) (59,67) (50,77) (50,67) (50,64) (51.5,67.333336)
//计算出新的聚类中央,并覆盖初始的聚类中央文件center
(20.375,32.5) (50.0,61.0) (51.5,67.333336)
--NewCenter::run--end--//计算新的聚类中央结束,返回主函数main中,并对中央的阈值进行判断,不满足要求,再做while循环,迭代进行map-->reduce操纵
13/05/28 11:31:40 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.
13/05/28 11:31:40 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).
13/05/28 11:31:40 INFOinput.FileInputFormat: Total input paths to process : 1
13/05/28 11:31:40 INFO mapred.JobClient:Running job: job_local_0002
13/05/28 11:31:40 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1884a40
13/05/28 11:31:40 INFO mapred.MapTask:io.sort.mb = 100
13/05/28 11:31:40 INFO mapred.MapTask: databuffer = 79691776/99614720
13/05/28 11:31:40 INFO mapred.MapTask:record buffer = 262144/327680
//下一轮map-->reduce的迭代开始
--Mapper::setup--start--
--Mapper::setup--end--
--Mapper::map--start--
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(50.0,61.0)outvalue:(50,61)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,61.0)outvalue:(50,64)
center[pos]:(51.5,67.333336)outvalue:(59,67)
center[pos]:(20.375,32.5)outvalue:(24,34)
center[pos]:(20.375,32.5)outvalue:(19,39)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(51.5,67.333336)outvalue:(50,65)
center[pos]:(51.5,67.333336)outvalue:(50,77)
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(20.375,32.5)outvalue:(20,31)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,61.0)outvalue:(50,64)
center[pos]:(51.5,67.333336)outvalue:(50,67)
--Mapper::map--end--
13/05/28 11:31:40 INFO mapred.MapTask:Starting flush of map output
13/05/28 11:31:40 INFO mapred.MapTask:Finished spill 0
13/05/28 11:31:40 INFO mapred.Task:Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
.......
--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@1712651
(20.375,32.5)
key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)
--KReducer::reduce--end--
--KReducer::reduce--start--(50.0,61.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@1712651
(50.0,63.0)
key:(50.0,61.0)outval+center:(50,61)(50,64) (50,64) (50.0,63.0)
--KReducer::reduce--end--
--KReducer::reduce--start--(51.5,67.333336)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@1712651
(52.25,69.0)
key:(51.5,67.333336)outval+center:(50,65)(59,67) (50,77) (50,67) (52.25,69.0)
--KReducer::reduce--end--
13/05/28 11:31:43 INFO mapred.Task:Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
13/05/28 11:31:43 INFOmapred.LocalJobRunner:
13/05/28 11:31:43 INFO mapred.Task: Taskattempt_local_0002_r_000000_0 is allowed to commit now
13/05/28 11:31:43 INFOoutput.FileOutputCommitter: Saved output of task'attempt_local_0002_r_000000_0' to hdfs://192.168.56.171:9000/ouput
.......
--NewCenter::run--start--
(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)
(50.0,61.0) (50,61)(50,64) (50,64) (50.0,63.0)
(51.5,67.333336) (50,65) (59,67) (50,77) (50,67) (52.25,69.0)
(20.375,32.5) (50.0,63.0) (52.25,69.0)
--NewCenter::run--end--
13/05/28 11:31:47 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.
13/05/28 11:31:47 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).
13/05/28 11:31:47 INFOinput.FileInputFormat: Total input paths to process : 1
13/05/28 11:31:47 INFO mapred.JobClient:Running job: job_local_0003
13/05/28 11:31:47 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@11ef443
13/05/28 11:31:47 INFO mapred.MapTask:io.sort.mb = 100
13/05/28 11:31:47 INFO mapred.MapTask: databuffer = 79691776/99614720
13/05/28 11:31:47 INFO mapred.MapTask:record buffer = 262144/327680
--Mapper::setup--start--
--Mapper::setup--end--
--Mapper::map--start--
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(50.0,63.0)outvalue:(50,61)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,63.0)outvalue:(50,64)
center[pos]:(52.25,69.0)outvalue:(59,67)
center[pos]:(20.375,32.5)outvalue:(24,34)
center[pos]:(20.375,32.5)outvalue:(19,39)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,63.0)outvalue:(50,65)
center[pos]:(52.25,69.0)outvalue:(50,77)
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(20.375,32.5)outvalue:(20,31)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,63.0)outvalue:(50,64)
center[pos]:(52.25,69.0)outvalue:(50,67)
--Mapper::map--end--
13/05/28 11:31:47 INFO mapred.MapTask:Starting flush of map output
13/05/28 11:31:47 INFO mapred.MapTask:Finished spill 0
13/05/28 11:31:47 INFO mapred.Task:Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting
13/05/28 11:31:48 INFOmapred.JobClient: map 0% reduce 0%
13/05/28 11:31:50 INFOmapred.LocalJobRunner:
13/05/28 11:31:50 INFO mapred.Task: Task'attempt_local_0003_m_000000_0' done.
.......
--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@35bb0f
(20.375,32.5)
key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)
--KReducer::reduce--end--
--KReducer::reduce--start--(50.0,63.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@35bb0f
(50.0,63.5)
key:(50.0,63.0)outval+center:(50,61)(50,65) (50,64) (50,64) (50.0,63.5)
--KReducer::reduce--end--
--KReducer::reduce--start--(52.25,69.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@35bb0f
(53.0,70.333336)
key:(52.25,69.0)outval+center:(59,67) (50,77)(50,67) (53.0,70.333336)
--KReducer::reduce--end--
13/05/28 11:31:50 INFO mapred.Task:Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting
13/05/28 11:31:50 INFOmapred.LocalJobRunner:
13/05/28 11:31:50 INFO mapred.Task: Taskattempt_local_0003_r_000000_0 is allowed to commit now
13/05/28 11:31:50 INFOoutput.FileOutputCommitter: Saved output of task'attempt_local_0003_r_000000_0' to hdfs://192.168.56.171:9000/ouput
13/05/28 11:31:51 INFOmapred.JobClient: map 100% reduce 0%
13/05/28 11:31:53 INFOmapred.LocalJobRunner: reduce > reduce
13/05/28 11:31:53 INFO mapred.Task: Task'attempt_local_0003_r_000000_0' done.
.......
13/05/28 11:31:54 INFOmapred.JobClient: Map outputrecords=15
--NewCenter::run--start--
(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)
(50.0,63.0) (50,61)(50,65) (50,64) (50,64) (50.0,63.5)
(52.25,69.0) (59,67) (50,77) (50,67) (53.0,70.333336)
(20.375,32.5) (50.0,63.5) (53.0,70.333336)
--NewCenter::run--end--
13/05/28 11:31:54 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.
13/05/28 11:31:54 WARN mapred.JobClient: Nojob jar file set. User classes may notbe found. See JobConf(Class) or JobConf#setJar(String).
13/05/28 11:31:54 INFOinput.FileInputFormat: Total input paths to process : 1
13/05/28 11:31:54 INFO mapred.JobClient:Running job: job_local_0004
13/05/28 11:31:54 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1958bf9
13/05/28 11:31:54 INFO mapred.MapTask:io.sort.mb = 100
13/05/28 11:31:54 INFO mapred.MapTask: databuffer = 79691776/99614720
13/05/28 11:31:54 INFO mapred.MapTask:record buffer = 262144/327680
--Mapper::setup--start--
--Mapper::setup--end--
--Mapper::map--start--
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(50.0,63.5)outvalue:(50,61)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,63.5)outvalue:(50,64)
center[pos]:(53.0,70.333336)outvalue:(59,67)
center[pos]:(20.375,32.5)outvalue:(24,34)
center[pos]:(20.375,32.5)outvalue:(19,39)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,63.5)outvalue:(50,65)
center[pos]:(53.0,70.333336)outvalue:(50,77)
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(20.375,32.5)outvalue:(20,31)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,63.5)outvalue:(50,64)
center[pos]:(50.0,63.5)outvalue:(50,67)
--Mapper::map--end--
13/05/28 11:31:54 INFO mapred.MapTask:Starting flush of map output
.......
13/05/28 11:31:57 INFOmapred.LocalJobRunner:
--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@6db724
(20.375,32.5)
key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)
--KReducer::reduce--end--
--KReducer::reduce--start--(50.0,63.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@6db724
(50.0,64.2)
key:(50.0,63.5)outval+center:(50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)
--KReducer::reduce--end--
--KReducer::reduce--start--(53.0,70.333336)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@6db724
(54.5,72.0)
key:(53.0,70.333336)outval+center:(59,67)(50,77) (54.5,72.0)
--KReducer::reduce--end--
13/05/28 11:31:57 INFO mapred.Task:Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting
.......
13/05/28 11:32:01 INFOmapred.JobClient: Map outputrecords=15
--NewCenter::run--start--
(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)
(50.0,63.5) (50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)
(53.0,70.333336) (59,67) (50,77) (54.5,72.0)
(20.375,32.5) (50.0,64.2) (54.5,72.0)
--NewCenter::run--end--
13/05/28 11:32:01 WARN mapred.JobClient:Use GenericOptionsParser for parsing the arguments. Applications shouldimplement Tool for the same.
.......
13/05/28 11:32:01 INFO mapred.MapTask:record buffer = 262144/327680
--Mapper::setup--start--
--Mapper::setup--end--
--Mapper::map--start--
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(50.0,64.2)outvalue:(50,61)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,64.2)outvalue:(50,64)
center[pos]:(54.5,72.0)outvalue:(59,67)
center[pos]:(20.375,32.5)outvalue:(24,34)
center[pos]:(20.375,32.5)outvalue:(19,39)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,64.2)outvalue:(50,65)
center[pos]:(54.5,72.0)outvalue:(50,77)
center[pos]:(20.375,32.5)outvalue:(20,30)
center[pos]:(20.375,32.5)outvalue:(20,31)
center[pos]:(20.375,32.5)outvalue:(20,32)
center[pos]:(50.0,64.2)outvalue:(50,64)
center[pos]:(50.0,64.2)outvalue:(50,67)
--Mapper::map--end--
13/05/28 11:32:01 INFO mapred.MapTask:Starting flush of map output
13/05/28 11:32:01 INFO mapred.MapTask:Finished spill 0
13/05/28 11:32:01 INFO mapred.Task:Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting
13/05/28 11:32:02 INFOmapred.JobClient: map 0% reduce 0%
13/05/28 11:32:04 INFOmapred.LocalJobRunner:
13/05/28 11:32:04 INFO mapred.Task: Task'attempt_local_0005_m_000000_0' done.
13/05/28 11:32:04 INFO mapred.Task: Using ResourceCalculatorPlugin :org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c06258
13/05/28 11:32:04 INFOmapred.LocalJobRunner:
13/05/28 11:32:04 INFO mapred.Merger:Merging 1 sorted segments
13/05/28 11:32:04 INFO mapred.Merger: Downto the last merge-pass, with 1 segments left of total size: 348 bytes
13/05/28 11:32:04 INFO mapred.LocalJobRunner:
--KReducer::reduce--start--(20.375,32.5)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@cffc79
(20.375,32.5)
key:(20.375,32.5)outval+center:(20,30)(20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32) (20.375,32.5)
--KReducer::reduce--end--
--KReducer::reduce--start--(50.0,64.2)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@cffc79
(50.0,64.2)
key:(50.0,64.2)outval+center:(50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)
--KReducer::reduce--end--
--KReducer::reduce--start--(54.5,72.0)org.apache.hadoop.mapreduce.ReduceContext$ValueIterable@cffc79
(54.5,72.0)
key:(54.5,72.0)outval+center:(59,67)(50,77) (54.5,72.0)
--KReducer::reduce--end--
13/05/28 11:32:04 INFO mapred.Task:Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting
.......
13/05/28 11:32:08 INFOmapred.JobClient: Map output records=15
--NewCenter::run--start--
(20.375,32.5) (20,30) (20,32) (24,34) (19,39) (20,32) (20,30) (20,31) (20,32)(20.375,32.5)
(50.0,64.2) (50,61)(50,65) (50,64) (50,67) (50,64) (50.0,64.2)
(54.5,72.0) (59,67)(50,77) (54.5,72.0)
(20.375,32.5) (50.0,64.2) (54.5,72.0)
--NewCenter::run--end--
Iterator: 5//最后输出迭代次数
以上是整个KMeans源码的输出说明,旁边用--***--的是我在程序中加的println输出的,这样有利于对旁边输出结果进行分析。不过我个人觉得,还是一步步伐试运行能更快捷的看到程序的处理流程以及每个变量的输出值。总结一下这个源码的思绪大概有三个地方须要注意的:
1、在调用map函数前框架会自动调用setup,原因上面已经说明
2、要懂得这个源码中是有1个map,3个reduce进行对应的处理,我设置的k值为3
3、要注意map的输入输出,reduce的输入输出的<key, value>是什么,这样才能懂得整个程序的结构,也有利于自己对源码的修改以满足自己的需求
4、对于我这样的初学者,还须要懂得hdfs中文件的读取和写入操纵,可以看看我前次的记载:hadoop通过FileSystem API读取和写入数据
我也是刚接触hadoop不久,如有分析不对的,欢送一起交流先进!
文章结束给大家分享下程序员的一些笑话语录: 自行车
一个程序员骑着一个很漂亮的自行车到了公司,另一个程序员看到了他,问 到,“你是从哪搞到的这么漂亮的车的?”
骑车的那个程序员说, “我刚从那边过来, 有一个漂亮的姑娘骑着这个车过来, 并停在我跟前,把衣服全脱了,然后对我说,‘你想要什么都可以’”。
另一个程序员马上说到, “你绝对做了一个正确的选择, 因为那姑娘的衣服你 并不一定穿得了”。
--------------------------------- 原创文章 By
输出和hadoop
---------------------------------