按照mahout官网https://cwiki.apache.org/confluence/display/MAHOUT/Twenty+Newsgroups的说法,我只用运行一条命令就可以完成这个算法的调用了,如下:
mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh
但是,我首先运行就出错了,因为我不是root账户,所以先改下路径,打开classify-20newsgroups.sh,替换/tmp/mahout-work-为/home/mahout/mahout-work-,这样用户mahout就具有了操作权限,但是还是出错,提示curl 找不到命令,好吧,我没安装这个,sudo apt-get install curl,ok ,ubuntu还是方便呀。
然后再运行,结果运行到2/3时候还是出错,然后我查看详细信息,居然map输入的数据条数为0?啥意思?好吧,应该是本地文件操作和HDFS文件操作混淆了,其实在执行:
+ ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq
这一步前应该把本地的20news-all上传到HDFS文件系统上面,然后重新执行第一条命令即可,全部信息如下(太多了,不知道贴的完不?):
mahout@ubuntu:~/mahout-d-0.7/examples/bin$ ./classify-20newsgroups.sh Please select a number to choose the corresponding task to run 1. cnaivebayes 2. naivebayes 3. sgd 4. clean -- cleans up the work area in /home/mahout/mahout-work-mahout Enter your choice : 2 ok. You chose 2 and we'll use naivebayes creating work directory at /home/mahout/mahout-work-mahout + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /home/mahout/mahout-work-mahout/20news-all + mkdir /home/mahout/mahout-work-mahout/20news-all + cp -R /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-test/talk.religion.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/alt.atheism /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.graphics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/comp.windows.x /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/misc.forsale /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.autos /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.motorcycles /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.baseball /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/rec.sport.hockey /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.crypt /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.electronics /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.med /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/sci.space /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/soc.religion.christian /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.guns /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.mideast /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.politics.misc /home/mahout/mahout-work-mahout/20news-bydate/20news-bydate-train/talk.religion.misc /home/mahout/mahout-work-mahout/20news-all + echo 'Creating sequence files from 20newsgroups data' Creating sequence files from 20newsgroups data + ./bin/mahout seqdirectory -i /home/mahout/mahout-work-mahout/20news-all -o /home/mahout/mahout-work-mahout/20news-seq Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/08/26 23:38:49 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/home/mahout/mahout-work-mahout/20news-all], --keyPrefix=[], --output=[/home/mahout/mahout-work-mahout/20news-seq], --startPhase=[0], --tempDir=[temp]} 13/08/26 23:42:57 INFO driver.MahoutDriver: Program took 248530 ms (Minutes: 4.142166666666666) + echo 'Converting sequence files to vectors' Converting sequence files to vectors + ./bin/mahout seq2sparse -i /home/mahout/mahout-work-mahout/20news-seq -o /home/mahout/mahout-work-mahout/20news-vectors -lnorm -nv -wt tfidf Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0 13/08/26 23:43:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 13/08/26 23:43:17 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:43:17 INFO mapred.JobClient: Running job: job_201308212334_0056 13/08/26 23:43:18 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:43:45 INFO mapred.JobClient: map 78% reduce 0% 13/08/26 23:43:51 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:43:56 INFO mapred.JobClient: Job complete: job_201308212334_0056 13/08/26 23:43:56 INFO mapred.JobClient: Counters: 19 13/08/26 23:43:56 INFO mapred.JobClient: Job Counters 13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32883 13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:43:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:43:56 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:43:56 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:43:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/08/26 23:43:56 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:43:56 INFO mapred.JobClient: Bytes Written=27503580 13/08/26 23:43:56 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_READ=36694022 13/08/26 23:43:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=21899 13/08/26 23:43:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580 13/08/26 23:43:56 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:43:56 INFO mapred.JobClient: Bytes Read=36693889 13/08/26 23:43:56 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:43:56 INFO mapred.JobClient: Map input records=18846 13/08/26 23:43:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=75157504 13/08/26 23:43:56 INFO mapred.JobClient: Spilled Records=0 13/08/26 23:43:56 INFO mapred.JobClient: CPU time spent (ms)=5730 13/08/26 23:43:56 INFO mapred.JobClient: Total committed heap usage (bytes)=15859712 13/08/26 23:43:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=974381056 13/08/26 23:43:56 INFO mapred.JobClient: Map output records=18846 13/08/26 23:43:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=133 13/08/26 23:43:56 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:43:56 INFO mapred.JobClient: Running job: job_201308212334_0057 13/08/26 23:43:57 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:44:15 INFO mapred.JobClient: map 3% reduce 0% 13/08/26 23:44:18 INFO mapred.JobClient: map 23% reduce 0% 13/08/26 23:44:21 INFO mapred.JobClient: map 60% reduce 0% 13/08/26 23:44:24 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:44:48 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:44:53 INFO mapred.JobClient: Job complete: job_201308212334_0057 13/08/26 23:44:53 INFO mapred.JobClient: Counters: 29 13/08/26 23:44:53 INFO mapred.JobClient: Job Counters 13/08/26 23:44:53 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=31312 13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:44:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:44:53 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:44:53 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:44:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18422 13/08/26 23:44:53 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:44:53 INFO mapred.JobClient: Bytes Written=2315037 13/08/26 23:44:53 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_READ=11857906 13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_READ=27503742 13/08/26 23:44:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15440401 13/08/26 23:44:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2315037 13/08/26 23:44:53 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:44:53 INFO mapred.JobClient: Bytes Read=27503580 13/08/26 23:44:53 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:44:53 INFO mapred.JobClient: Map output materialized bytes=3538084 13/08/26 23:44:53 INFO mapred.JobClient: Map input records=18846 13/08/26 23:44:53 INFO mapred.JobClient: Reduce shuffle bytes=0 13/08/26 23:44:53 INFO mapred.JobClient: Spilled Records=849345 13/08/26 23:44:53 INFO mapred.JobClient: Map output bytes=39462740 13/08/26 23:44:53 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792 13/08/26 23:44:53 INFO mapred.JobClient: CPU time spent (ms)=14080 13/08/26 23:44:53 INFO mapred.JobClient: Combine input records=3026242 13/08/26 23:44:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=162 13/08/26 23:44:53 INFO mapred.JobClient: Reduce input records=192904 13/08/26 23:44:53 INFO mapred.JobClient: Reduce input groups=192904 13/08/26 23:44:53 INFO mapred.JobClient: Combine output records=554873 13/08/26 23:44:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=283111424 13/08/26 23:44:53 INFO mapred.JobClient: Reduce output records=93563 13/08/26 23:44:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896 13/08/26 23:44:53 INFO mapred.JobClient: Map output records=2664273 13/08/26 23:44:54 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:44:55 INFO mapred.JobClient: Running job: job_201308212334_0058 13/08/26 23:44:56 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:45:13 INFO mapred.JobClient: map 94% reduce 0% 13/08/26 23:45:16 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:45:43 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:45:48 INFO mapred.JobClient: Job complete: job_201308212334_0058 13/08/26 23:45:48 INFO mapred.JobClient: Counters: 29 13/08/26 23:45:48 INFO mapred.JobClient: Job Counters 13/08/26 23:45:48 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21298 13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:45:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:45:48 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:45:48 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:45:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24763 13/08/26 23:45:48 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:45:48 INFO mapred.JobClient: Bytes Written=29314118 13/08/26 23:45:48 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_READ=27274291 13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_READ=29440826 13/08/26 23:45:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54595105 13/08/26 23:45:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118 13/08/26 23:45:48 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:45:48 INFO mapred.JobClient: Bytes Read=27503580 13/08/26 23:45:48 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:45:48 INFO mapred.JobClient: Map output materialized bytes=27274291 13/08/26 23:45:48 INFO mapred.JobClient: Map input records=18846 13/08/26 23:45:48 INFO mapred.JobClient: Reduce shuffle bytes=0 13/08/26 23:45:48 INFO mapred.JobClient: Spilled Records=37692 13/08/26 23:45:48 INFO mapred.JobClient: Map output bytes=27199343 13/08/26 23:45:48 INFO mapred.JobClient: Total committed heap usage (bytes)=215695360 13/08/26 23:45:48 INFO mapred.JobClient: CPU time spent (ms)=12980 13/08/26 23:45:48 INFO mapred.JobClient: Combine input records=0 13/08/26 23:45:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=162 13/08/26 23:45:48 INFO mapred.JobClient: Reduce input records=18846 13/08/26 23:45:48 INFO mapred.JobClient: Reduce input groups=18846 13/08/26 23:45:48 INFO mapred.JobClient: Combine output records=0 13/08/26 23:45:48 INFO mapred.JobClient: Physical memory (bytes) snapshot=332349440 13/08/26 23:45:48 INFO mapred.JobClient: Reduce output records=18846 13/08/26 23:45:48 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896 13/08/26 23:45:48 INFO mapred.JobClient: Map output records=18846 13/08/26 23:45:49 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:45:49 INFO mapred.JobClient: Running job: job_201308212334_0059 13/08/26 23:45:50 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:46:10 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:46:25 INFO mapred.JobClient: map 100% reduce 92% 13/08/26 23:46:31 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:46:36 INFO mapred.JobClient: Job complete: job_201308212334_0059 13/08/26 23:46:36 INFO mapred.JobClient: Counters: 29 13/08/26 23:46:36 INFO mapred.JobClient: Job Counters 13/08/26 23:46:36 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18217 13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:46:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:46:36 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:46:36 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:46:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20981 13/08/26 23:46:36 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:46:36 INFO mapred.JobClient: Bytes Written=29314118 13/08/26 23:46:36 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_READ=29059398 13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_READ=29314278 13/08/26 23:46:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58163419 13/08/26 23:46:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118 13/08/26 23:46:36 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:46:36 INFO mapred.JobClient: Bytes Read=29314118 13/08/26 23:46:36 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:46:36 INFO mapred.JobClient: Map output materialized bytes=29059398 13/08/26 23:46:36 INFO mapred.JobClient: Map input records=18846 13/08/26 23:46:36 INFO mapred.JobClient: Reduce shuffle bytes=0 13/08/26 23:46:36 INFO mapred.JobClient: Spilled Records=37692 13/08/26 23:46:36 INFO mapred.JobClient: Map output bytes=28984080 13/08/26 23:46:36 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984 13/08/26 23:46:36 INFO mapred.JobClient: CPU time spent (ms)=8650 13/08/26 23:46:36 INFO mapred.JobClient: Combine input records=0 13/08/26 23:46:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=160 13/08/26 23:46:37 INFO mapred.JobClient: Reduce input records=18846 13/08/26 23:46:37 INFO mapred.JobClient: Reduce input groups=18846 13/08/26 23:46:37 INFO mapred.JobClient: Combine output records=0 13/08/26 23:46:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=313606144 13/08/26 23:46:37 INFO mapred.JobClient: Reduce output records=18846 13/08/26 23:46:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896 13/08/26 23:46:37 INFO mapred.JobClient: Map output records=18846 13/08/26 23:46:37 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0 13/08/26 23:46:37 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:46:37 INFO mapred.JobClient: Running job: job_201308212334_0060 13/08/26 23:46:38 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:46:56 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:47:14 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:47:19 INFO mapred.JobClient: Job complete: job_201308212334_0060 13/08/26 23:47:19 INFO mapred.JobClient: Counters: 29 13/08/26 23:47:19 INFO mapred.JobClient: Job Counters 13/08/26 23:47:19 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21504 13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:47:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:47:19 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:47:19 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:47:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14273 13/08/26 23:47:19 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:47:19 INFO mapred.JobClient: Bytes Written=1890073 13/08/26 23:47:19 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_READ=4880788 13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_READ=29314271 13/08/26 23:47:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6235019 13/08/26 23:47:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1890073 13/08/26 23:47:19 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:47:19 INFO mapred.JobClient: Bytes Read=29314118 13/08/26 23:47:19 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:47:19 INFO mapred.JobClient: Map output materialized bytes=1309902 13/08/26 23:47:19 INFO mapred.JobClient: Map input records=18846 13/08/26 23:47:19 INFO mapred.JobClient: Reduce shuffle bytes=0 13/08/26 23:47:19 INFO mapred.JobClient: Spilled Records=442187 13/08/26 23:47:19 INFO mapred.JobClient: Map output bytes=31005336 13/08/26 23:47:19 INFO mapred.JobClient: Total committed heap usage (bytes)=176033792 13/08/26 23:47:19 INFO mapred.JobClient: CPU time spent (ms)=9210 13/08/26 23:47:19 INFO mapred.JobClient: Combine input records=2838837 13/08/26 23:47:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=153 13/08/26 23:47:19 INFO mapred.JobClient: Reduce input records=93564 13/08/26 23:47:19 INFO mapred.JobClient: Reduce input groups=93564 13/08/26 23:47:19 INFO mapred.JobClient: Combine output records=348623 13/08/26 23:47:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=284684288 13/08/26 23:47:19 INFO mapred.JobClient: Reduce output records=93564 13/08/26 23:47:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896 13/08/26 23:47:19 INFO mapred.JobClient: Map output records=2583778 13/08/26 23:47:19 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:47:19 INFO mapred.JobClient: Running job: job_201308212334_0061 13/08/26 23:47:20 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:47:38 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:47:53 INFO mapred.JobClient: map 100% reduce 67% 13/08/26 23:47:59 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:48:04 INFO mapred.JobClient: Job complete: job_201308212334_0061 13/08/26 23:48:04 INFO mapred.JobClient: Counters: 29 13/08/26 23:48:04 INFO mapred.JobClient: Job Counters 13/08/26 23:48:04 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18292 13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:48:04 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:48:04 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:48:04 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:48:04 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19293 13/08/26 23:48:04 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:48:04 INFO mapred.JobClient: Bytes Written=28689283 13/08/26 23:48:04 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_READ=29059398 13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_READ=31204324 13/08/26 23:48:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58165045 13/08/26 23:48:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283 13/08/26 23:48:04 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:48:04 INFO mapred.JobClient: Bytes Read=29314118 13/08/26 23:48:04 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:48:04 INFO mapred.JobClient: Map output materialized bytes=29059398 13/08/26 23:48:04 INFO mapred.JobClient: Map input records=18846 13/08/26 23:48:04 INFO mapred.JobClient: Reduce shuffle bytes=0 13/08/26 23:48:04 INFO mapred.JobClient: Spilled Records=37692 13/08/26 23:48:04 INFO mapred.JobClient: Map output bytes=28984080 13/08/26 23:48:04 INFO mapred.JobClient: Total committed heap usage (bytes)=205225984 13/08/26 23:48:04 INFO mapred.JobClient: CPU time spent (ms)=8770 13/08/26 23:48:04 INFO mapred.JobClient: Combine input records=0 13/08/26 23:48:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=153 13/08/26 23:48:04 INFO mapred.JobClient: Reduce input records=18846 13/08/26 23:48:04 INFO mapred.JobClient: Reduce input groups=18846 13/08/26 23:48:04 INFO mapred.JobClient: Combine output records=0 13/08/26 23:48:04 INFO mapred.JobClient: Physical memory (bytes) snapshot=320401408 13/08/26 23:48:04 INFO mapred.JobClient: Reduce output records=18846 13/08/26 23:48:04 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896 13/08/26 23:48:04 INFO mapred.JobClient: Map output records=18846 13/08/26 23:48:05 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:48:05 INFO mapred.JobClient: Running job: job_201308212334_0062 13/08/26 23:48:06 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:48:24 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:48:36 INFO mapred.JobClient: map 100% reduce 33% 13/08/26 23:48:39 INFO mapred.JobClient: map 100% reduce 86% 13/08/26 23:48:48 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:48:53 INFO mapred.JobClient: Job complete: job_201308212334_0062 13/08/26 23:48:53 INFO mapred.JobClient: Counters: 29 13/08/26 23:48:53 INFO mapred.JobClient: Job Counters 13/08/26 23:48:53 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18225 13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:48:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:48:53 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:48:53 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:48:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21045 13/08/26 23:48:53 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:48:53 INFO mapred.JobClient: Bytes Written=28689283 13/08/26 23:48:53 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_READ=28437750 13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_READ=28689443 13/08/26 23:48:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56920127 13/08/26 23:48:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283 13/08/26 23:48:53 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:48:53 INFO mapred.JobClient: Bytes Read=28689283 13/08/26 23:48:53 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:48:53 INFO mapred.JobClient: Map output materialized bytes=28437750 13/08/26 23:48:53 INFO mapred.JobClient: Map input records=18846 13/08/26 23:48:53 INFO mapred.JobClient: Reduce shuffle bytes=0 13/08/26 23:48:53 INFO mapred.JobClient: Spilled Records=37692 13/08/26 23:48:53 INFO mapred.JobClient: Map output bytes=28362505 13/08/26 23:48:53 INFO mapred.JobClient: Total committed heap usage (bytes)=204603392 13/08/26 23:48:53 INFO mapred.JobClient: CPU time spent (ms)=8340 13/08/26 23:48:53 INFO mapred.JobClient: Combine input records=0 13/08/26 23:48:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=160 13/08/26 23:48:53 INFO mapred.JobClient: Reduce input records=18846 13/08/26 23:48:53 INFO mapred.JobClient: Reduce input groups=18846 13/08/26 23:48:53 INFO mapred.JobClient: Combine output records=0 13/08/26 23:48:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=313868288 13/08/26 23:48:53 INFO mapred.JobClient: Reduce output records=18846 13/08/26 23:48:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1957584896 13/08/26 23:48:53 INFO mapred.JobClient: Map output records=18846 13/08/26 23:48:53 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-vectors/partial-vectors-0 13/08/26 23:48:53 INFO driver.MahoutDriver: Program took 339621 ms (Minutes: 5.66035) + echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset' Creating training and holdout set with a random 80-20 split of the generated vector dataset + ./bin/mahout split -i /home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors --trainingOutput /home/mahout/mahout-work-mahout/20news-train-vectors --testOutput /home/mahout/mahout-work-mahout/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/08/26 23:49:06 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only 13/08/26 23:49:07 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/home/mahout/mahout-work-mahout/20news-test-vectors], --trainingOutput=[/home/mahout/mahout-work-mahout/20news-train-vectors]} 13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 has 162419 lines 13/08/26 23:49:11 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40 13/08/26 23:49:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/08/26 23:49:11 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor 13/08/26 23:49:11 INFO compress.CodecPool: Got brand-new compressor 13/08/26 23:49:16 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11321, test: 7525 starting at 0 13/08/26 23:49:16 INFO driver.MahoutDriver: Program took 9786 ms (Minutes: 0.1631) + echo 'Training Naive Bayes model' Training Naive Bayes model + ./bin/mahout trainnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -el -o /home/mahout/mahout-work-mahout/model -li /home/mahout/mahout-work-mahout/labelindex -ow Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/08/26 23:49:22 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only 13/08/26 23:49:22 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --output=[/home/mahout/mahout-work-mahout/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 13/08/26 23:49:23 INFO common.HadoopUtil: Deleting temp 13/08/26 23:49:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/08/26 23:49:23 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 13/08/26 23:49:23 INFO compress.CodecPool: Got brand-new decompressor 13/08/26 23:49:26 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:49:26 INFO mapred.JobClient: Running job: job_201308212334_0063 13/08/26 23:49:27 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:49:49 INFO mapred.JobClient: map 43% reduce 0% 13/08/26 23:49:52 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:50:13 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:50:18 INFO mapred.JobClient: Job complete: job_201308212334_0063 13/08/26 23:50:18 INFO mapred.JobClient: Counters: 29 13/08/26 23:50:18 INFO mapred.JobClient: Job Counters 13/08/26 23:50:18 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22816 13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:50:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:50:18 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:50:18 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:50:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=20680 13/08/26 23:50:18 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:50:18 INFO mapred.JobClient: Bytes Written=2718605 13/08/26 23:50:18 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_READ=1404371 13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_READ=12669237 13/08/26 23:50:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2854477 13/08/26 23:50:18 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2718605 13/08/26 23:50:18 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:50:18 INFO mapred.JobClient: Bytes Read=12668431 13/08/26 23:50:18 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:50:18 INFO mapred.JobClient: Map output materialized bytes=1404363 13/08/26 23:50:18 INFO mapred.JobClient: Map input records=11321 13/08/26 23:50:18 INFO mapred.JobClient: Reduce shuffle bytes=1404363 13/08/26 23:50:18 INFO mapred.JobClient: Spilled Records=40 13/08/26 23:50:18 INFO mapred.JobClient: Map output bytes=16682576 13/08/26 23:50:18 INFO mapred.JobClient: Total committed heap usage (bytes)=176164864 13/08/26 23:50:18 INFO mapred.JobClient: CPU time spent (ms)=8190 13/08/26 23:50:18 INFO mapred.JobClient: Combine input records=11321 13/08/26 23:50:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=148 13/08/26 23:50:18 INFO mapred.JobClient: Reduce input records=20 13/08/26 23:50:18 INFO mapred.JobClient: Reduce input groups=20 13/08/26 23:50:18 INFO mapred.JobClient: Combine output records=20 13/08/26 23:50:18 INFO mapred.JobClient: Physical memory (bytes) snapshot=294400000 13/08/26 23:50:18 INFO mapred.JobClient: Reduce output records=20 13/08/26 23:50:18 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616 13/08/26 23:50:18 INFO mapred.JobClient: Map output records=11321 13/08/26 23:50:18 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:50:18 INFO mapred.JobClient: Running job: job_201308212334_0064 13/08/26 23:50:19 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:50:40 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:51:01 INFO mapred.JobClient: map 100% reduce 100% 13/08/26 23:51:06 INFO mapred.JobClient: Job complete: job_201308212334_0064 13/08/26 23:51:06 INFO mapred.JobClient: Counters: 29 13/08/26 23:51:06 INFO mapred.JobClient: Job Counters 13/08/26 23:51:06 INFO mapred.JobClient: Launched reduce tasks=1 13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24609 13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:51:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:51:06 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:51:06 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:51:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=15258 13/08/26 23:51:06 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:51:06 INFO mapred.JobClient: Bytes Written=893560 13/08/26 23:51:06 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_READ=362674 13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_READ=2718737 13/08/26 23:51:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=771195 13/08/26 23:51:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=893560 13/08/26 23:51:06 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:51:06 INFO mapred.JobClient: Bytes Read=2718605 13/08/26 23:51:06 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:51:06 INFO mapred.JobClient: Map output materialized bytes=362666 13/08/26 23:51:06 INFO mapred.JobClient: Map input records=20 13/08/26 23:51:06 INFO mapred.JobClient: Reduce shuffle bytes=362666 13/08/26 23:51:06 INFO mapred.JobClient: Spilled Records=4 13/08/26 23:51:06 INFO mapred.JobClient: Map output bytes=893434 13/08/26 23:51:06 INFO mapred.JobClient: Total committed heap usage (bytes)=223264768 13/08/26 23:51:06 INFO mapred.JobClient: CPU time spent (ms)=5370 13/08/26 23:51:06 INFO mapred.JobClient: Combine input records=2 13/08/26 23:51:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=132 13/08/26 23:51:06 INFO mapred.JobClient: Reduce input records=2 13/08/26 23:51:06 INFO mapred.JobClient: Reduce input groups=2 13/08/26 23:51:06 INFO mapred.JobClient: Combine output records=2 13/08/26 23:51:06 INFO mapred.JobClient: Physical memory (bytes) snapshot=300597248 13/08/26 23:51:06 INFO mapred.JobClient: Reduce output records=2 13/08/26 23:51:06 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1961967616 13/08/26 23:51:06 INFO mapred.JobClient: Map output records=2 13/08/26 23:51:07 INFO driver.MahoutDriver: Program took 104944 ms (Minutes: 1.7490666666666668) + echo 'Self testing on training set' Self testing on training set + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-train-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/08/26 23:51:19 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only 13/08/26 23:51:19 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-train-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 13/08/26 23:51:20 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:51:21 INFO mapred.JobClient: Running job: job_201308212334_0065 13/08/26 23:51:22 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:51:45 INFO mapred.JobClient: map 51% reduce 0% 13/08/26 23:51:48 INFO mapred.JobClient: map 89% reduce 0% 13/08/26 23:51:54 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:51:58 INFO mapred.JobClient: Job complete: job_201308212334_0065 13/08/26 23:51:58 INFO mapred.JobClient: Counters: 19 13/08/26 23:51:58 INFO mapred.JobClient: Job Counters 13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34216 13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:51:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:51:58 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:51:58 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:51:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/08/26 23:51:58 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:51:58 INFO mapred.JobClient: Bytes Written=2132486 13/08/26 23:51:58 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_READ=16279896 13/08/26 23:51:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22523 13/08/26 23:51:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2132486 13/08/26 23:51:58 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:51:58 INFO mapred.JobClient: Bytes Read=12668431 13/08/26 23:51:58 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:51:58 INFO mapred.JobClient: Map input records=11321 13/08/26 23:51:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=87547904 13/08/26 23:51:58 INFO mapred.JobClient: Spilled Records=0 13/08/26 23:51:58 INFO mapred.JobClient: CPU time spent (ms)=9380 13/08/26 23:51:58 INFO mapred.JobClient: Total committed heap usage (bytes)=28131328 13/08/26 23:51:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416 13/08/26 23:51:58 INFO mapred.JobClient: Map output records=11321 13/08/26 23:51:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=148 13/08/26 23:51:59 INFO test.TestNaiveBayesDriver: Standard NB Results: ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances : 11256 99.4258% Incorrectly Classified Instances : 65 0.5742% Total Classified Instances : 11321 ======================================================= Confusion Matrix ------------------------------------------------------- a b c d e f g h i j k l m n o p q r s t <--Classified as 454 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 | 458 a = alt.atheism 0 588 0 3 0 2 0 0 0 0 0 1 0 1 0 0 0 0 0 0 | 595 b = comp.graphics 0 3 553 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 563 c = comp.os.ms-windows.misc 0 0 0 592 1 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 | 595 d = comp.sys.ibm.pc.hardware 0 0 0 1 593 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 594 e = comp.sys.mac.hardware 0 2 0 1 0 576 1 0 0 0 0 0 0 0 1 0 0 0 0 0 | 581 f = comp.windows.x 0 1 0 0 0 0 579 0 0 0 0 0 1 0 0 0 0 0 0 0 | 581 g = misc.forsale 0 0 0 0 0 0 1 594 0 0 0 0 1 0 0 0 0 0 0 0 | 596 h = rec.autos 0 0 0 0 0 0 1 2 591 0 0 0 0 0 0 0 0 0 0 0 | 594 i = rec.motorcycles 0 0 0 0 0 0 0 0 0 615 1 0 0 0 0 0 0 0 0 0 | 616 j = rec.sport.baseball 0 0 0 0 0 0 1 0 0 1 581 0 0 0 0 0 0 0 0 0 | 583 k = rec.sport.hockey 0 0 1 0 0 0 0 0 0 0 0 627 1 0 0 0 0 1 0 0 | 630 l = sci.crypt 0 0 0 2 0 0 1 0 0 0 0 0 588 0 0 0 0 0 0 0 | 591 m = sci.electronics 0 1 0 0 0 0 0 0 0 0 0 0 0 586 1 0 0 0 0 0 | 588 n = sci.med 0 0 0 0 0 0 0 0 0 0 0 0 0 0 615 0 0 0 0 0 | 615 o = sci.space 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 619 1 0 0 0 | 620 p = soc.religion.christian 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 541 0 0 0 | 543 q = talk.politics.mideast 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 560 0 0 | 561 r = talk.politics.guns 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 1 351 0 | 359 s = talk.religion.misc 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 0 453 | 458 t = talk.politics.misc 13/08/26 23:51:59 INFO driver.MahoutDriver: Program took 40214 ms (Minutes: 0.6702333333333333) + echo 'Testing on holdout set' Testing on holdout set + ./bin/mahout testnb -i /home/mahout/mahout-work-mahout/20news-test-vectors -m /home/mahout/mahout-work-mahout/model -l /home/mahout/mahout-work-mahout/labelindex -ow -o /home/mahout/mahout-work-mahout/20news-testing Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/mahout/hadoop-1.0.4/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /home/mahout/mahout-d-0.7/mahout-examples-0.7-job.jar Warning: $HADOOP_HOME is deprecated. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/mahout/hadoop-1.0.4/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 13/08/26 23:52:09 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only 13/08/26 23:52:09 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/home/mahout/mahout-work-mahout/20news-test-vectors], --labelIndex=[/home/mahout/mahout-work-mahout/labelindex], --model=[/home/mahout/mahout-work-mahout/model], --output=[/home/mahout/mahout-work-mahout/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 13/08/26 23:52:10 INFO common.HadoopUtil: Deleting /home/mahout/mahout-work-mahout/20news-testing 13/08/26 23:52:10 INFO input.FileInputFormat: Total input paths to process : 1 13/08/26 23:52:11 INFO mapred.JobClient: Running job: job_201308212334_0066 13/08/26 23:52:12 INFO mapred.JobClient: map 0% reduce 0% 13/08/26 23:52:30 INFO mapred.JobClient: map 85% reduce 0% 13/08/26 23:52:36 INFO mapred.JobClient: map 100% reduce 0% 13/08/26 23:52:41 INFO mapred.JobClient: Job complete: job_201308212334_0066 13/08/26 23:52:41 INFO mapred.JobClient: Counters: 19 13/08/26 23:52:41 INFO mapred.JobClient: Job Counters 13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=25113 13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/08/26 23:52:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/08/26 23:52:41 INFO mapred.JobClient: Launched map tasks=1 13/08/26 23:52:41 INFO mapred.JobClient: Data-local map tasks=1 13/08/26 23:52:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/08/26 23:52:41 INFO mapred.JobClient: File Output Format Counters 13/08/26 23:52:41 INFO mapred.JobClient: Bytes Written=1417942 13/08/26 23:52:41 INFO mapred.JobClient: FileSystemCounters 13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_READ=12148944 13/08/26 23:52:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=22522 13/08/26 23:52:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1417942 13/08/26 23:52:41 INFO mapred.JobClient: File Input Format Counters 13/08/26 23:52:41 INFO mapred.JobClient: Bytes Read=8537480 13/08/26 23:52:41 INFO mapred.JobClient: Map-Reduce Framework 13/08/26 23:52:41 INFO mapred.JobClient: Map input records=7525 13/08/26 23:52:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=85057536 13/08/26 23:52:41 INFO mapred.JobClient: Spilled Records=0 13/08/26 23:52:41 INFO mapred.JobClient: CPU time spent (ms)=6630 13/08/26 23:52:41 INFO mapred.JobClient: Total committed heap usage (bytes)=28155904 13/08/26 23:52:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=976572416 13/08/26 23:52:41 INFO mapred.JobClient: Map output records=7525 13/08/26 23:52:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=147 13/08/26 23:52:42 INFO test.TestNaiveBayesDriver: Standard NB Results: ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances : 6801 90.3787% Incorrectly Classified Instances : 724 9.6213% Total Classified Instances : 7525 ======================================================= Confusion Matrix ------------------------------------------------------- a b c d e f g h i j k l m n o p q r s t <--Classified as 318 0 0 0 1 0 0 0 1 0 0 0 0 0 1 4 0 0 15 1 | 341 a = alt.atheism 1 318 7 20 4 7 7 2 0 1 0 1 1 2 6 0 0 0 0 1 | 378 b = comp.graphics 0 25 277 78 12 15 5 0 0 0 0 2 4 0 1 0 0 0 0 3 | 422 c = comp.os.ms-windows.misc 1 4 3 336 20 3 8 0 0 0 0 1 11 0 0 0 0 0 0 0 | 387 d = comp.sys.ibm.pc.hardware 0 3 1 6 350 1 3 0 0 0 0 1 3 1 0 0 0 0 0 0 | 369 e = comp.sys.mac.hardware 1 20 3 6 7 365 3 0 0 0 0 1 0 0 0 0 1 0 0 0 | 407 f = comp.windows.x 0 1 1 19 8 0 329 13 1 0 0 2 14 0 4 0 0 1 1 0 | 394 g = misc.forsale 0 2 1 2 3 1 10 361 8 0 0 0 4 0 0 0 0 1 0 1 | 394 h = rec.autos 0 0 0 1 0 0 2 3 393 1 0 0 0 0 0 0 0 1 0 1 | 402 i = rec.motorcycles 0 0 0 1 0 0 2 3 0 360 6 0 2 2 1 0 0 0 0 1 | 378 j = rec.sport.baseball 0 1 0 2 1 0 0 0 2 5 401 0 1 0 0 1 0 0 0 2 | 416 k = rec.sport.hockey 1 1 0 1 3 2 1 1 0 0 0 344 1 1 2 0 1 1 0 1 | 361 l = sci.crypt 0 5 0 15 14 0 5 1 1 0 0 2 348 1 1 0 0 0 0 0 | 393 m = sci.electronics 1 2 1 1 1 0 1 0 0 1 0 1 4 381 5 0 0 1 1 1 | 402 n = sci.med 1 4 0 0 2 0 2 1 0 0 0 1 2 1 356 0 0 1 0 1 | 372 o = sci.space 5 0 0 1 1 0 0 1 0 0 1 0 0 1 0 359 3 0 4 1 | 377 p = soc.religion.christian 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 2 389 0 0 2 | 397 q = talk.politics.mideast 0 0 1 0 1 1 0 1 0 0 0 2 1 1 0 0 0 335 0 6 | 349 r = talk.politics.guns 29 1 0 1 0 0 1 0 0 1 0 0 0 0 2 24 0 8 197 5 | 269 s = talk.religion.misc 2 0 0 0 2 0 0 1 0 1 1 1 0 1 2 0 2 17 3 284 | 317 t = talk.politics.misc 13/08/26 23:52:42 INFO driver.MahoutDriver: Program took 32480 ms (Minutes: 0.5413333333333333)
在job信息可以看到全部的任务信息,如下:
然后对照每个job信息,查看相应的mapper和reducer就可以分析这个算法了。
分享,快乐,成长
转载请注明出处:http://blog.csdn.net/fansy1990