安装 Mahout

1、下载
下载地址:http://archive.apache.org/dist/mahout/ ,选择需要的版本,下载安装包
例如:wget http://archive.apache.org/dist/mahout/0.9/mahout-distribution-0.9.tar.gz

2、解压
[grid@hadoop1 ~]$ tar -zxvf mahout-distribution-0.9.tar.gz

3、新增环境变量
[grid@hadoop1 ~]$ vi .bash_profile
export HADOOP_HOME=/home/grid/hadoop-1.2.1
export HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
export MAHOUT_HOME=/home/grid/mahout-distribution-0.9
export MAHOUT_CONF_DIR=/home/grid/mahout-distribution-0.9/conf
export PATH=$PATH:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin

使用环境变量生效
[grid@hadoop1 ~]$ source .bash_profile

附:几个重要的环境变量
JAVA_HOME mahout运行需指定jdk的目录
MAHOUT_JAVA_HOME 指定此变量可覆盖JAVA_HOME值
HADOOP_HOME 如果配置,则在hadoop分布式平台上运行,否则单机运行
HADOOP_CONF_DIR 指定hadoop的配置文件目录
MAHOUT_LOCAL 如果此变量值不为空,则单机运行mahout
MAHOUT_CONF_DIR mahout配置文件的路径,默认值是$MAHOUT_HOME/src/conf
MAHOUT_HEAPSIZE mahout运行时可用的最大heap大小

4、将 mahout 复制到集群中的其它机器上,并设置环境变量
[grid@hadoop1 ~]$ scp -r ~/mahout-distribution-0.9 grid@hadoop2:~
[grid@hadoop1 ~]$ scp -r ~/mahout-distribution-0.9 grid@hadoop3:~

5、启动Hadoop
[grid@hadoop1 ~]$ sh ~/hadoop-1.2.1/bin/start-all.sh

6、验证 mahout 是否安装成功
[grid@hadoop2 ~]$ mahout -help
安装 Mahout_第1张图片

7、运行20newsgroup的测试样例
[grid@hadoop1 ~]$ sh $MAHOUT_HOME/examples/bin/classify-20newsgroups.sh
Please select a number to choose the corresponding task to run
1. cnaivebayes
2. naivebayes
3. sgd
4. clean -- cleans up the work area in /tmp/mahout-work-grid
Enter your choice : 1
ok. You chose 1 and we'll use cnaivebayes
creating work directory at /tmp/mahout-work-grid
Downloading 20news-bydate
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.7M  100 13.7M    0     0  86594      0  0:02:47  0:02:47 --:--:--  122k
Extracting...
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf /tmp/mahout-work-grid/20news-all
+ mkdir /tmp/mahout-work-grid/20news-all
+ cp -R /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/alt.atheism /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.graphics /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.windows.x /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/misc.forsale /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.autos /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.motorcycles /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.baseball /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.hockey /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.crypt /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.electronics /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.med /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.space /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/soc.religion.christian /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.guns /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.mideast /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.religion.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/alt.atheism /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.graphics /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.windows.x /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/misc.forsale /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.autos /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.motorcycles /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.baseball /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.hockey /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.crypt /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.electronics /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.med /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.space /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/soc.religion.christian /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.guns /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.mideast /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.religion.misc /tmp/mahout-work-grid/20news-all
+ '[' /home/grid/hadoop-1.2.1 '!=' '' ']'
+ '[' '' == '' ']'
+ echo 'Copying 20newsgroups data to HDFS'
Copying 20newsgroups data to HDFS
+ set +e
+ /home/grid/hadoop-1.2.1/bin/hadoop dfs -rmr /tmp/mahout-work-grid/20news-all
Warning: $HADOOP_HOME is deprecated.

Deleted hdfs://hadoop1:9000/tmp/mahout-work-grid/20news-all
+ set -e
+ /home/grid/hadoop-1.2.1/bin/hadoop dfs -put /tmp/mahout-work-grid/20news-all /tmp/mahout-work-grid/20news-all
Warning: $HADOOP_HOME is deprecated.

+ echo 'Creating sequence files from 20newsgroups data'
Creating sequence files from 20newsgroups data
+ ./bin/mahout seqdirectory -i /tmp/mahout-work-grid/20news-all -o /tmp/mahout-work-grid/20news-seq -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.

15/03/08 20:54:40 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/tmp/mahout-work-grid/20news-all], --keyPrefix=[], --method=[mapreduce], --output=[/tmp/mahout-work-grid/20news-seq], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
15/03/08 20:54:48 INFO input.FileInputFormat: Total input paths to process : 18846
15/03/08 20:54:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/03/08 20:54:50 WARN snappy.LoadSnappy: Snappy native library not loaded
15/03/08 20:55:36 INFO mapred.JobClient: Running job: job_201503081659_0001
15/03/08 20:55:37 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 20:56:05 INFO mapred.JobClient:  map 1% reduce 0%
15/03/08 20:56:08 INFO mapred.JobClient:  map 2% reduce 0%
15/03/08 20:56:11 INFO mapred.JobClient:  map 4% reduce 0%
15/03/08 20:56:14 INFO mapred.JobClient:  map 6% reduce 0%
15/03/08 20:56:17 INFO mapred.JobClient:  map 7% reduce 0%
15/03/08 20:56:20 INFO mapred.JobClient:  map 8% reduce 0%
15/03/08 20:56:23 INFO mapred.JobClient:  map 10% reduce 0%
15/03/08 20:56:26 INFO mapred.JobClient:  map 11% reduce 0%
15/03/08 20:56:29 INFO mapred.JobClient:  map 13% reduce 0%
15/03/08 20:56:32 INFO mapred.JobClient:  map 17% reduce 0%
15/03/08 20:56:35 INFO mapred.JobClient:  map 18% reduce 0%
15/03/08 20:56:38 INFO mapred.JobClient:  map 19% reduce 0%
15/03/08 20:56:41 INFO mapred.JobClient:  map 20% reduce 0%
15/03/08 20:56:44 INFO mapred.JobClient:  map 22% reduce 0%
15/03/08 20:56:47 INFO mapred.JobClient:  map 23% reduce 0%
15/03/08 20:56:51 INFO mapred.JobClient:  map 26% reduce 0%
15/03/08 20:56:54 INFO mapred.JobClient:  map 28% reduce 0%
15/03/08 20:56:57 INFO mapred.JobClient:  map 29% reduce 0%
15/03/08 20:57:00 INFO mapred.JobClient:  map 31% reduce 0%
15/03/08 20:57:03 INFO mapred.JobClient:  map 32% reduce 0%
15/03/08 20:57:06 INFO mapred.JobClient:  map 33% reduce 0%
15/03/08 20:57:09 INFO mapred.JobClient:  map 35% reduce 0%
15/03/08 20:57:12 INFO mapred.JobClient:  map 37% reduce 0%
15/03/08 20:57:15 INFO mapred.JobClient:  map 38% reduce 0%
15/03/08 20:57:18 INFO mapred.JobClient:  map 40% reduce 0%
15/03/08 20:57:21 INFO mapred.JobClient:  map 41% reduce 0%
15/03/08 20:57:24 INFO mapred.JobClient:  map 43% reduce 0%
15/03/08 20:57:27 INFO mapred.JobClient:  map 45% reduce 0%
15/03/08 20:57:30 INFO mapred.JobClient:  map 47% reduce 0%
15/03/08 20:57:33 INFO mapred.JobClient:  map 50% reduce 0%
15/03/08 20:57:36 INFO mapred.JobClient:  map 52% reduce 0%
15/03/08 20:57:39 INFO mapred.JobClient:  map 54% reduce 0%
15/03/08 20:57:42 INFO mapred.JobClient:  map 56% reduce 0%
15/03/08 20:57:46 INFO mapred.JobClient:  map 57% reduce 0%
15/03/08 20:57:49 INFO mapred.JobClient:  map 59% reduce 0%
15/03/08 20:57:52 INFO mapred.JobClient:  map 62% reduce 0%
15/03/08 20:57:55 INFO mapred.JobClient:  map 65% reduce 0%
15/03/08 20:57:58 INFO mapred.JobClient:  map 67% reduce 0%
15/03/08 20:58:01 INFO mapred.JobClient:  map 69% reduce 0%
15/03/08 20:58:04 INFO mapred.JobClient:  map 72% reduce 0%
15/03/08 20:58:07 INFO mapred.JobClient:  map 75% reduce 0%
15/03/08 20:58:10 INFO mapred.JobClient:  map 78% reduce 0%
15/03/08 20:58:13 INFO mapred.JobClient:  map 80% reduce 0%
15/03/08 20:58:16 INFO mapred.JobClient:  map 84% reduce 0%
15/03/08 20:58:19 INFO mapred.JobClient:  map 87% reduce 0%
15/03/08 20:58:22 INFO mapred.JobClient:  map 91% reduce 0%
15/03/08 20:58:25 INFO mapred.JobClient:  map 94% reduce 0%
15/03/08 20:58:28 INFO mapred.JobClient:  map 97% reduce 0%
15/03/08 20:58:32 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 20:58:36 INFO mapred.JobClient: Job complete: job_201503081659_0001
15/03/08 20:58:36 INFO mapred.JobClient: Counters: 18
15/03/08 20:58:36 INFO mapred.JobClient:   Job Counters 
15/03/08 20:58:36 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=171122
15/03/08 20:58:36 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 20:58:36 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 20:58:36 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 20:58:36 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/03/08 20:58:36 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 20:58:36 INFO mapred.JobClient:     Bytes Written=19202391
15/03/08 20:58:36 INFO mapred.JobClient:   FileSystemCounters
15/03/08 20:58:36 INFO mapred.JobClient:     HDFS_BYTES_READ=37565643
15/03/08 20:58:36 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=60041
15/03/08 20:58:36 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=19202391
15/03/08 20:58:36 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 20:58:36 INFO mapred.JobClient:     Bytes Read=0
15/03/08 20:58:36 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 20:58:36 INFO mapred.JobClient:     Map input records=18846
15/03/08 20:58:36 INFO mapred.JobClient:     Physical memory (bytes) snapshot=113790976
15/03/08 20:58:36 INFO mapred.JobClient:     Spilled Records=0
15/03/08 20:58:36 INFO mapred.JobClient:     CPU time spent (ms)=109350
15/03/08 20:58:36 INFO mapred.JobClient:     Total committed heap usage (bytes)=46481408
15/03/08 20:58:36 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=724959232
15/03/08 20:58:36 INFO mapred.JobClient:     Map output records=18846
15/03/08 20:58:36 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1710640
15/03/08 20:58:36 INFO driver.MahoutDriver: Program took 237312 ms (Minutes: 3.9552)
+ echo 'Converting sequence files to vectors'
Converting sequence files to vectors
+ ./bin/mahout seq2sparse -i /tmp/mahout-work-grid/20news-seq -o /tmp/mahout-work-grid/20news-vectors -lnorm -nv -wt tfidf
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.

15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Tokenizing documents in /tmp/mahout-work-grid/20news-seq
15/03/08 20:58:47 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 20:58:49 INFO mapred.JobClient: Running job: job_201503081659_0002
15/03/08 20:58:50 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 20:59:11 INFO mapred.JobClient:  map 46% reduce 0%
15/03/08 20:59:14 INFO mapred.JobClient:  map 97% reduce 0%
15/03/08 20:59:19 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 20:59:24 INFO mapred.JobClient: Job complete: job_201503081659_0002
15/03/08 20:59:24 INFO mapred.JobClient: Counters: 19
15/03/08 20:59:24 INFO mapred.JobClient:   Job Counters 
15/03/08 20:59:24 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=29191
15/03/08 20:59:24 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 20:59:24 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 20:59:24 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 20:59:24 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 20:59:24 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/03/08 20:59:24 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 20:59:24 INFO mapred.JobClient:     Bytes Written=27503580
15/03/08 20:59:24 INFO mapred.JobClient:   FileSystemCounters
15/03/08 20:59:24 INFO mapred.JobClient:     HDFS_BYTES_READ=19202520
15/03/08 20:59:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58026
15/03/08 20:59:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=27503580
15/03/08 20:59:24 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 20:59:24 INFO mapred.JobClient:     Bytes Read=19202391
15/03/08 20:59:24 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 20:59:24 INFO mapred.JobClient:     Map input records=18846
15/03/08 20:59:24 INFO mapred.JobClient:     Physical memory (bytes) snapshot=84549632
15/03/08 20:59:24 INFO mapred.JobClient:     Spilled Records=0
15/03/08 20:59:24 INFO mapred.JobClient:     CPU time spent (ms)=15470
15/03/08 20:59:24 INFO mapred.JobClient:     Total committed heap usage (bytes)=23855104
15/03/08 20:59:24 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=724852736
15/03/08 20:59:24 INFO mapred.JobClient:     Map output records=18846
15/03/08 20:59:24 INFO mapred.JobClient:     SPLIT_RAW_BYTES=129
15/03/08 20:59:24 INFO vectorizer.SparseVectorsFromSequenceFiles: Creating Term Frequency Vectors
15/03/08 20:59:24 INFO vectorizer.DictionaryVectorizer: Creating dictionary from /tmp/mahout-work-grid/20news-vectors/tokenized-documents and saving at /tmp/mahout-work-grid/20news-vectors/wordcount
15/03/08 20:59:26 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 20:59:27 INFO mapred.JobClient: Running job: job_201503081659_0003
15/03/08 20:59:28 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:00:00 INFO mapred.JobClient:  map 32% reduce 0%
15/03/08 21:00:03 INFO mapred.JobClient:  map 58% reduce 0%
15/03/08 21:00:06 INFO mapred.JobClient:  map 90% reduce 0%
15/03/08 21:00:08 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:00:23 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:00:27 INFO mapred.JobClient: Job complete: job_201503081659_0003
15/03/08 21:00:27 INFO mapred.JobClient: Counters: 29
15/03/08 21:00:27 INFO mapred.JobClient:   Job Counters 
15/03/08 21:00:27 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:00:27 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=37260
15/03/08 21:00:27 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:00:27 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:00:27 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:00:27 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:00:27 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=14824
15/03/08 21:00:27 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:00:27 INFO mapred.JobClient:     Bytes Written=2315037
15/03/08 21:00:27 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:00:27 INFO mapred.JobClient:     FILE_BYTES_READ=11857906
15/03/08 21:00:27 INFO mapred.JobClient:     HDFS_BYTES_READ=27503733
15/03/08 21:00:27 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=15513248
15/03/08 21:00:27 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2315037
15/03/08 21:00:27 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:00:27 INFO mapred.JobClient:     Bytes Read=27503580
15/03/08 21:00:27 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:00:27 INFO mapred.JobClient:     Map output materialized bytes=3538084
15/03/08 21:00:27 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:00:27 INFO mapred.JobClient:     Reduce shuffle bytes=3538084
15/03/08 21:00:27 INFO mapred.JobClient:     Spilled Records=849345
15/03/08 21:00:27 INFO mapred.JobClient:     Map output bytes=39462740
15/03/08 21:00:27 INFO mapred.JobClient:     Total committed heap usage (bytes)=147394560
15/03/08 21:00:27 INFO mapred.JobClient:     CPU time spent (ms)=26820
15/03/08 21:00:27 INFO mapred.JobClient:     Combine input records=3026242
15/03/08 21:00:27 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153
15/03/08 21:00:27 INFO mapred.JobClient:     Reduce input records=192904
15/03/08 21:00:27 INFO mapred.JobClient:     Reduce input groups=192904
15/03/08 21:00:27 INFO mapred.JobClient:     Combine output records=554873
15/03/08 21:00:27 INFO mapred.JobClient:     Physical memory (bytes) snapshot=278233088
15/03/08 21:00:27 INFO mapred.JobClient:     Reduce output records=93563
15/03/08 21:00:27 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1456336896
15/03/08 21:00:27 INFO mapred.JobClient:     Map output records=2664273
15/03/08 21:00:32 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:00:33 INFO mapred.JobClient: Running job: job_201503081659_0004
15/03/08 21:00:34 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:01:09 INFO mapred.JobClient:  map 31% reduce 0%
15/03/08 21:01:12 INFO mapred.JobClient:  map 72% reduce 0%
15/03/08 21:01:15 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:01:33 INFO mapred.JobClient:  map 100% reduce 67%
15/03/08 21:01:36 INFO mapred.JobClient:  map 100% reduce 79%
15/03/08 21:01:39 INFO mapred.JobClient:  map 100% reduce 97%
15/03/08 21:01:40 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:01:46 INFO mapred.JobClient: Job complete: job_201503081659_0004
15/03/08 21:01:46 INFO mapred.JobClient: Counters: 29
15/03/08 21:01:46 INFO mapred.JobClient:   Job Counters 
15/03/08 21:01:46 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:01:46 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=37702
15/03/08 21:01:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:01:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:01:46 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:01:46 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:01:46 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=25246
15/03/08 21:01:46 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:01:46 INFO mapred.JobClient:     Bytes Written=29314118
15/03/08 21:01:46 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:01:46 INFO mapred.JobClient:     FILE_BYTES_READ=29226519
15/03/08 21:01:46 INFO mapred.JobClient:     HDFS_BYTES_READ=27503733
15/03/08 21:01:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=54669982
15/03/08 21:01:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
15/03/08 21:01:46 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:01:46 INFO mapred.JobClient:     Bytes Read=27503580
15/03/08 21:01:46 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:01:46 INFO mapred.JobClient:     Map output materialized bytes=27274291
15/03/08 21:01:46 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:01:46 INFO mapred.JobClient:     Reduce shuffle bytes=27274291
15/03/08 21:01:46 INFO mapred.JobClient:     Spilled Records=37692
15/03/08 21:01:46 INFO mapred.JobClient:     Map output bytes=27199343
15/03/08 21:01:46 INFO mapred.JobClient:     Total committed heap usage (bytes)=174735360
15/03/08 21:01:46 INFO mapred.JobClient:     CPU time spent (ms)=33560
15/03/08 21:01:46 INFO mapred.JobClient:     Combine input records=0
15/03/08 21:01:46 INFO mapred.JobClient:     SPLIT_RAW_BYTES=153
15/03/08 21:01:46 INFO mapred.JobClient:     Reduce input records=18846
15/03/08 21:01:46 INFO mapred.JobClient:     Reduce input groups=18846
15/03/08 21:01:46 INFO mapred.JobClient:     Combine output records=0
15/03/08 21:01:46 INFO mapred.JobClient:     Physical memory (bytes) snapshot=310116352
15/03/08 21:01:46 INFO mapred.JobClient:     Reduce output records=18846
15/03/08 21:01:46 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1474854912
15/03/08 21:01:46 INFO mapred.JobClient:     Map output records=18846
15/03/08 21:01:48 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:01:49 INFO mapred.JobClient: Running job: job_201503081659_0005
15/03/08 21:01:50 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:02:17 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:02:30 INFO mapred.JobClient:  map 100% reduce 71%
15/03/08 21:02:33 INFO mapred.JobClient:  map 100% reduce 99%
15/03/08 21:02:35 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:02:39 INFO mapred.JobClient: Job complete: job_201503081659_0005
15/03/08 21:02:39 INFO mapred.JobClient: Counters: 29
15/03/08 21:02:39 INFO mapred.JobClient:   Job Counters 
15/03/08 21:02:39 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:02:39 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=27104
15/03/08 21:02:39 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:02:39 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:02:39 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:02:39 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:02:39 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=18088
15/03/08 21:02:39 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:02:39 INFO mapred.JobClient:     Bytes Written=29314118
15/03/08 21:02:39 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:02:39 INFO mapred.JobClient:     FILE_BYTES_READ=29059398
15/03/08 21:02:39 INFO mapred.JobClient:     HDFS_BYTES_READ=29314269
15/03/08 21:02:39 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58236992
15/03/08 21:02:39 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
15/03/08 21:02:39 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:02:39 INFO mapred.JobClient:     Bytes Read=29314118
15/03/08 21:02:39 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:02:39 INFO mapred.JobClient:     Map output materialized bytes=29059398
15/03/08 21:02:39 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:02:39 INFO mapred.JobClient:     Reduce shuffle bytes=29059398
15/03/08 21:02:39 INFO mapred.JobClient:     Spilled Records=37692
15/03/08 21:02:39 INFO mapred.JobClient:     Map output bytes=28984080
15/03/08 21:02:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=176521216
15/03/08 21:02:39 INFO mapred.JobClient:     CPU time spent (ms)=21310
15/03/08 21:02:39 INFO mapred.JobClient:     Combine input records=0
15/03/08 21:02:39 INFO mapred.JobClient:     SPLIT_RAW_BYTES=151
15/03/08 21:02:39 INFO mapred.JobClient:     Reduce input records=18846
15/03/08 21:02:39 INFO mapred.JobClient:     Reduce input groups=18846
15/03/08 21:02:39 INFO mapred.JobClient:     Combine output records=0
15/03/08 21:02:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=305786880
15/03/08 21:02:39 INFO mapred.JobClient:     Reduce output records=18846
15/03/08 21:02:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1476845568
15/03/08 21:02:39 INFO mapred.JobClient:     Map output records=18846
15/03/08 21:02:39 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/partial-vectors-0
15/03/08 21:02:39 INFO vectorizer.SparseVectorsFromSequenceFiles: Calculating IDF
15/03/08 21:02:42 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:02:43 INFO mapred.JobClient: Running job: job_201503081659_0006
15/03/08 21:02:44 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:03:08 INFO mapred.JobClient:  map 47% reduce 0%
15/03/08 21:03:11 INFO mapred.JobClient:  map 88% reduce 0%
15/03/08 21:03:13 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:03:23 INFO mapred.JobClient:  map 100% reduce 33%
15/03/08 21:03:27 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:03:31 INFO mapred.JobClient: Job complete: job_201503081659_0006
15/03/08 21:03:31 INFO mapred.JobClient: Counters: 29
15/03/08 21:03:31 INFO mapred.JobClient:   Job Counters 
15/03/08 21:03:31 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:03:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=29689
15/03/08 21:03:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:03:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:03:31 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:03:31 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:03:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13909
15/03/08 21:03:31 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:03:31 INFO mapred.JobClient:     Bytes Written=1890073
15/03/08 21:03:31 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:03:31 INFO mapred.JobClient:     FILE_BYTES_READ=4880830
15/03/08 21:03:31 INFO mapred.JobClient:     HDFS_BYTES_READ=29314270
15/03/08 21:03:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6307710
15/03/08 21:03:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1890073
15/03/08 21:03:31 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:03:31 INFO mapred.JobClient:     Bytes Read=29314118
15/03/08 21:03:31 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:03:31 INFO mapred.JobClient:     Map output materialized bytes=1309902
15/03/08 21:03:31 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:03:31 INFO mapred.JobClient:     Reduce shuffle bytes=1309902
15/03/08 21:03:31 INFO mapred.JobClient:     Spilled Records=442190
15/03/08 21:03:31 INFO mapred.JobClient:     Map output bytes=31005336
15/03/08 21:03:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=147394560
15/03/08 21:03:31 INFO mapred.JobClient:     CPU time spent (ms)=23130
15/03/08 21:03:31 INFO mapred.JobClient:     Combine input records=2838840
15/03/08 21:03:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=152
15/03/08 21:03:31 INFO mapred.JobClient:     Reduce input records=93564
15/03/08 21:03:31 INFO mapred.JobClient:     Reduce input groups=93564
15/03/08 21:03:31 INFO mapred.JobClient:     Combine output records=348626
15/03/08 21:03:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=281788416
15/03/08 21:03:31 INFO mapred.JobClient:     Reduce output records=93564
15/03/08 21:03:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1457623040
15/03/08 21:03:31 INFO mapred.JobClient:     Map output records=2583778
15/03/08 21:03:32 INFO vectorizer.SparseVectorsFromSequenceFiles: Pruning
15/03/08 21:03:33 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:03:34 INFO mapred.JobClient: Running job: job_201503081659_0007
15/03/08 21:03:35 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:03:58 INFO mapred.JobClient:  map 40% reduce 0%
15/03/08 21:04:01 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:04:26 INFO mapred.JobClient:  map 100% reduce 33%
15/03/08 21:04:30 INFO mapred.JobClient:  map 100% reduce 66%
15/03/08 21:04:33 INFO mapred.JobClient:  map 100% reduce 69%
15/03/08 21:04:36 INFO mapred.JobClient:  map 100% reduce 73%
15/03/08 21:04:38 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:04:45 INFO mapred.JobClient: Job complete: job_201503081659_0007
15/03/08 21:04:45 INFO mapred.JobClient: Counters: 29
15/03/08 21:04:45 INFO mapred.JobClient:   Job Counters 
15/03/08 21:04:45 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:04:45 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33546
15/03/08 21:04:45 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:04:45 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:04:45 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:04:45 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:04:45 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=32143
15/03/08 21:04:45 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:04:45 INFO mapred.JobClient:     Bytes Written=28689283
15/03/08 21:04:45 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:04:45 INFO mapred.JobClient:     FILE_BYTES_READ=9646422
15/03/08 21:04:45 INFO mapred.JobClient:     HDFS_BYTES_READ=29314270
15/03/08 21:04:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=15602818
15/03/08 21:04:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
15/03/08 21:04:45 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:04:45 INFO mapred.JobClient:     Bytes Read=29314118
15/03/08 21:04:45 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:04:45 INFO mapred.JobClient:     Map output materialized bytes=7741585
15/03/08 21:04:45 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:04:45 INFO mapred.JobClient:     Reduce shuffle bytes=7741585
15/03/08 21:04:45 INFO mapred.JobClient:     Spilled Records=37692
15/03/08 21:04:45 INFO mapred.JobClient:     Map output bytes=28984080
15/03/08 21:04:45 INFO mapred.JobClient:     Total committed heap usage (bytes)=176521216
15/03/08 21:04:45 INFO mapred.JobClient:     CPU time spent (ms)=49000
15/03/08 21:04:45 INFO mapred.JobClient:     Combine input records=0
15/03/08 21:04:45 INFO mapred.JobClient:     SPLIT_RAW_BYTES=152
15/03/08 21:04:45 INFO mapred.JobClient:     Reduce input records=18846
15/03/08 21:04:45 INFO mapred.JobClient:     Reduce input groups=18846
15/03/08 21:04:45 INFO mapred.JobClient:     Combine output records=0
15/03/08 21:04:45 INFO mapred.JobClient:     Physical memory (bytes) snapshot=305831936
15/03/08 21:04:45 INFO mapred.JobClient:     Reduce output records=18846
15/03/08 21:04:45 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1458589696
15/03/08 21:04:45 INFO mapred.JobClient:     Map output records=18846
15/03/08 21:04:47 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:04:48 INFO mapred.JobClient: Running job: job_201503081659_0008
15/03/08 21:04:49 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:05:18 INFO mapred.JobClient:  map 92% reduce 0%
15/03/08 21:05:20 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:05:40 INFO mapred.JobClient:  map 100% reduce 71%
15/03/08 21:05:44 INFO mapred.JobClient:  map 100% reduce 87%
15/03/08 21:05:47 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:05:55 INFO mapred.JobClient: Job complete: job_201503081659_0008
15/03/08 21:05:55 INFO mapred.JobClient: Counters: 29
15/03/08 21:05:55 INFO mapred.JobClient:   Job Counters 
15/03/08 21:05:55 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:05:55 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=35779
15/03/08 21:05:55 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:05:55 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:05:55 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:05:55 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:05:55 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=26799
15/03/08 21:05:55 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:05:55 INFO mapred.JobClient:     Bytes Written=28689283
15/03/08 21:05:55 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:05:55 INFO mapred.JobClient:     FILE_BYTES_READ=28437750
15/03/08 21:05:55 INFO mapred.JobClient:     HDFS_BYTES_READ=28689445
15/03/08 21:05:55 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56992598
15/03/08 21:05:55 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
15/03/08 21:05:55 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:05:55 INFO mapred.JobClient:     Bytes Read=28689283
15/03/08 21:05:55 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:05:55 INFO mapred.JobClient:     Map output materialized bytes=28437750
15/03/08 21:05:55 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:05:55 INFO mapred.JobClient:     Reduce shuffle bytes=28437750
15/03/08 21:05:55 INFO mapred.JobClient:     Spilled Records=37692
15/03/08 21:05:55 INFO mapred.JobClient:     Map output bytes=28362505
15/03/08 21:05:55 INFO mapred.JobClient:     Total committed heap usage (bytes)=175898624
15/03/08 21:05:55 INFO mapred.JobClient:     CPU time spent (ms)=34270
15/03/08 21:05:55 INFO mapred.JobClient:     Combine input records=0
15/03/08 21:05:55 INFO mapred.JobClient:     SPLIT_RAW_BYTES=162
15/03/08 21:05:55 INFO mapred.JobClient:     Reduce input records=18846
15/03/08 21:05:55 INFO mapred.JobClient:     Reduce input groups=18846
15/03/08 21:05:55 INFO mapred.JobClient:     Combine output records=0
15/03/08 21:05:55 INFO mapred.JobClient:     Physical memory (bytes) snapshot=290082816
15/03/08 21:05:55 INFO mapred.JobClient:     Reduce output records=18846
15/03/08 21:05:55 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1465872384
15/03/08 21:05:55 INFO mapred.JobClient:     Map output records=18846
15/03/08 21:05:55 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/tf-vectors-partial
15/03/08 21:05:55 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/tf-vectors-toprune
15/03/08 21:05:58 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:05:59 INFO mapred.JobClient: Running job: job_201503081659_0009
15/03/08 21:06:00 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:06:24 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:06:37 INFO mapred.JobClient:  map 100% reduce 68%
15/03/08 21:06:40 INFO mapred.JobClient:  map 100% reduce 87%
15/03/08 21:06:43 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:06:47 INFO mapred.JobClient: Job complete: job_201503081659_0009
15/03/08 21:06:47 INFO mapred.JobClient: Counters: 29
15/03/08 21:06:47 INFO mapred.JobClient:   Job Counters 
15/03/08 21:06:47 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:06:47 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=21911
15/03/08 21:06:47 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:06:48 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:06:48 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:06:48 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:06:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=19167
15/03/08 21:06:48 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:06:48 INFO mapred.JobClient:     Bytes Written=28689283
15/03/08 21:06:48 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:06:48 INFO mapred.JobClient:     FILE_BYTES_READ=30342579
15/03/08 21:06:48 INFO mapred.JobClient:     HDFS_BYTES_READ=28689427
15/03/08 21:06:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56996636
15/03/08 21:06:48 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
15/03/08 21:06:48 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:06:48 INFO mapred.JobClient:     Bytes Read=28689283
15/03/08 21:06:48 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:06:48 INFO mapred.JobClient:     Map output materialized bytes=28437750
15/03/08 21:06:48 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:06:48 INFO mapred.JobClient:     Reduce shuffle bytes=28437750
15/03/08 21:06:48 INFO mapred.JobClient:     Spilled Records=37692
15/03/08 21:06:48 INFO mapred.JobClient:     Map output bytes=28362505
15/03/08 21:06:48 INFO mapred.JobClient:     Total committed heap usage (bytes)=175898624
15/03/08 21:06:48 INFO mapred.JobClient:     CPU time spent (ms)=23140
15/03/08 21:06:48 INFO mapred.JobClient:     Combine input records=0
15/03/08 21:06:48 INFO mapred.JobClient:     SPLIT_RAW_BYTES=144
15/03/08 21:06:48 INFO mapred.JobClient:     Reduce input records=18846
15/03/08 21:06:48 INFO mapred.JobClient:     Reduce input groups=18846
15/03/08 21:06:48 INFO mapred.JobClient:     Combine output records=0
15/03/08 21:06:48 INFO mapred.JobClient:     Physical memory (bytes) snapshot=308453376
15/03/08 21:06:48 INFO mapred.JobClient:     Reduce output records=18846
15/03/08 21:06:48 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1454231552
15/03/08 21:06:48 INFO mapred.JobClient:     Map output records=18846
15/03/08 21:06:49 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:06:50 INFO mapred.JobClient: Running job: job_201503081659_0010
15/03/08 21:06:51 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:07:12 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:07:25 INFO mapred.JobClient:  map 100% reduce 71%
15/03/08 21:07:28 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:07:33 INFO mapred.JobClient: Job complete: job_201503081659_0010
15/03/08 21:07:33 INFO mapred.JobClient: Counters: 29
15/03/08 21:07:33 INFO mapred.JobClient:   Job Counters 
15/03/08 21:07:33 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:07:33 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=20931
15/03/08 21:07:33 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:07:33 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:07:33 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:07:33 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:07:33 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=17614
15/03/08 21:07:33 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:07:33 INFO mapred.JobClient:     Bytes Written=28689283
15/03/08 21:07:33 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:07:33 INFO mapred.JobClient:     FILE_BYTES_READ=28437750
15/03/08 21:07:33 INFO mapred.JobClient:     HDFS_BYTES_READ=28689434
15/03/08 21:07:33 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56993684
15/03/08 21:07:33 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
15/03/08 21:07:33 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:07:33 INFO mapred.JobClient:     Bytes Read=28689283
15/03/08 21:07:33 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:07:33 INFO mapred.JobClient:     Map output materialized bytes=28437750
15/03/08 21:07:33 INFO mapred.JobClient:     Map input records=18846
15/03/08 21:07:33 INFO mapred.JobClient:     Reduce shuffle bytes=28437750
15/03/08 21:07:33 INFO mapred.JobClient:     Spilled Records=37692
15/03/08 21:07:33 INFO mapred.JobClient:     Map output bytes=28362505
15/03/08 21:07:33 INFO mapred.JobClient:     Total committed heap usage (bytes)=175898624
15/03/08 21:07:33 INFO mapred.JobClient:     CPU time spent (ms)=20320
15/03/08 21:07:33 INFO mapred.JobClient:     Combine input records=0
15/03/08 21:07:33 INFO mapred.JobClient:     SPLIT_RAW_BYTES=151
15/03/08 21:07:33 INFO mapred.JobClient:     Reduce input records=18846
15/03/08 21:07:33 INFO mapred.JobClient:     Reduce input groups=18846
15/03/08 21:07:33 INFO mapred.JobClient:     Combine output records=0
15/03/08 21:07:33 INFO mapred.JobClient:     Physical memory (bytes) snapshot=298102784
15/03/08 21:07:33 INFO mapred.JobClient:     Reduce output records=18846
15/03/08 21:07:33 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1454256128
15/03/08 21:07:33 INFO mapred.JobClient:     Map output records=18846
15/03/08 21:07:33 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/partial-vectors-0
15/03/08 21:07:33 INFO driver.MahoutDriver: Program took 529754 ms (Minutes: 8.829233333333333)
+ echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
Creating training and holdout set with a random 80-20 split of the generated vector dataset
+ ./bin/mahout split -i /tmp/mahout-work-grid/20news-vectors/tfidf-vectors --trainingOutput /tmp/mahout-work-grid/20news-train-vectors --testOutput /tmp/mahout-work-grid/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.

15/03/08 21:07:40 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
15/03/08 21:07:41 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-grid/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/tmp/mahout-work-grid/20news-test-vectors], --trainingOutput=[/tmp/mahout-work-grid/20news-train-vectors]}
15/03/08 21:07:46 INFO utils.SplitInput: part-r-00000 has 162419 lines
15/03/08 21:07:46 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
15/03/08 21:07:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/03/08 21:07:46 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/03/08 21:07:46 INFO compress.CodecPool: Got brand-new compressor
15/03/08 21:07:46 INFO compress.CodecPool: Got brand-new compressor
15/03/08 21:08:01 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11441, test: 7405 starting at 0
15/03/08 21:08:01 INFO driver.MahoutDriver: Program took 20712 ms (Minutes: 0.3452)
+ echo 'Training Naive Bayes model'
Training Naive Bayes model
+ ./bin/mahout trainnb -i /tmp/mahout-work-grid/20news-train-vectors -el -o /tmp/mahout-work-grid/model -li /tmp/mahout-work-grid/labelindex -ow -c
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.

15/03/08 21:08:13 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
15/03/08 21:08:14 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/tmp/mahout-work-grid/20news-train-vectors], --labelIndex=[/tmp/mahout-work-grid/labelindex], --output=[/tmp/mahout-work-grid/model], --overwrite=null, --startPhase=[0], --tempDir=[temp], --trainComplementary=null}
15/03/08 21:08:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/03/08 21:08:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/03/08 21:08:17 INFO compress.CodecPool: Got brand-new decompressor
15/03/08 21:08:26 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:08:28 INFO mapred.JobClient: Running job: job_201503081659_0011
15/03/08 21:08:29 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:08:54 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:09:09 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:09:13 INFO mapred.JobClient: Job complete: job_201503081659_0011
15/03/08 21:09:14 INFO mapred.JobClient: Counters: 29
15/03/08 21:09:14 INFO mapred.JobClient:   Job Counters 
15/03/08 21:09:14 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:09:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=24934
15/03/08 21:09:14 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:09:14 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:09:14 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:09:14 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:09:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13530
15/03/08 21:09:14 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:09:14 INFO mapred.JobClient:     Bytes Written=2757285
15/03/08 21:09:14 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:09:14 INFO mapred.JobClient:     FILE_BYTES_READ=1527193
15/03/08 21:09:14 INFO mapred.JobClient:     HDFS_BYTES_READ=12920362
15/03/08 21:09:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=3173016
15/03/08 21:09:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2757285
15/03/08 21:09:14 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:09:14 INFO mapred.JobClient:     Bytes Read=12920223
15/03/08 21:09:14 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:09:14 INFO mapred.JobClient:     Map output materialized bytes=1526511
15/03/08 21:09:14 INFO mapred.JobClient:     Map input records=11441
15/03/08 21:09:14 INFO mapred.JobClient:     Reduce shuffle bytes=1526511
15/03/08 21:09:14 INFO mapred.JobClient:     Spilled Records=40
15/03/08 21:09:14 INFO mapred.JobClient:     Map output bytes=16896230
15/03/08 21:09:14 INFO mapred.JobClient:     Total committed heap usage (bytes)=147394560
15/03/08 21:09:14 INFO mapred.JobClient:     CPU time spent (ms)=16930
15/03/08 21:09:14 INFO mapred.JobClient:     Combine input records=11441
15/03/08 21:09:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=139
15/03/08 21:09:14 INFO mapred.JobClient:     Reduce input records=20
15/03/08 21:09:14 INFO mapred.JobClient:     Reduce input groups=20
15/03/08 21:09:14 INFO mapred.JobClient:     Combine output records=20
15/03/08 21:09:14 INFO mapred.JobClient:     Physical memory (bytes) snapshot=275156992
15/03/08 21:09:14 INFO mapred.JobClient:     Reduce output records=20
15/03/08 21:09:14 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1458122752
15/03/08 21:09:14 INFO mapred.JobClient:     Map output records=11441
15/03/08 21:09:15 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:09:16 INFO mapred.JobClient: Running job: job_201503081659_0012
15/03/08 21:09:17 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:09:32 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:09:42 INFO mapred.JobClient:  map 100% reduce 33%
15/03/08 21:09:45 INFO mapred.JobClient:  map 100% reduce 100%
15/03/08 21:09:49 INFO mapred.JobClient: Job complete: job_201503081659_0012
15/03/08 21:09:49 INFO mapred.JobClient: Counters: 29
15/03/08 21:09:49 INFO mapred.JobClient:   Job Counters 
15/03/08 21:09:49 INFO mapred.JobClient:     Launched reduce tasks=1
15/03/08 21:09:49 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=15773
15/03/08 21:09:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:09:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:09:49 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:09:49 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:09:49 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12833
15/03/08 21:09:49 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:09:49 INFO mapred.JobClient:     Bytes Written=891489
15/03/08 21:09:49 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:09:49 INFO mapred.JobClient:     FILE_BYTES_READ=441934
15/03/08 21:09:49 INFO mapred.JobClient:     HDFS_BYTES_READ=2757416
15/03/08 21:09:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1004208
15/03/08 21:09:49 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=891489
15/03/08 21:09:49 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:09:49 INFO mapred.JobClient:     Bytes Read=2757285
15/03/08 21:09:49 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:09:49 INFO mapred.JobClient:     Map output materialized bytes=441926
15/03/08 21:09:49 INFO mapred.JobClient:     Map input records=20
15/03/08 21:09:49 INFO mapred.JobClient:     Reduce shuffle bytes=441926
15/03/08 21:09:49 INFO mapred.JobClient:     Spilled Records=4
15/03/08 21:09:49 INFO mapred.JobClient:     Map output bytes=891363
15/03/08 21:09:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=226623488
15/03/08 21:09:49 INFO mapred.JobClient:     CPU time spent (ms)=8220
15/03/08 21:09:49 INFO mapred.JobClient:     Combine input records=2
15/03/08 21:09:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=131
15/03/08 21:09:49 INFO mapred.JobClient:     Reduce input records=2
15/03/08 21:09:49 INFO mapred.JobClient:     Reduce input groups=2
15/03/08 21:09:49 INFO mapred.JobClient:     Combine output records=2
15/03/08 21:09:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=293044224
15/03/08 21:09:49 INFO mapred.JobClient:     Reduce output records=2
15/03/08 21:09:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1475592192
15/03/08 21:09:49 INFO mapred.JobClient:     Map output records=2
15/03/08 21:09:51 INFO driver.MahoutDriver: Program took 97669 ms (Minutes: 1.6278333333333332)
+ echo 'Self testing on training set'
Self testing on training set
+ ./bin/mahout testnb -i /tmp/mahout-work-grid/20news-train-vectors -m /tmp/mahout-work-grid/model -l /tmp/mahout-work-grid/labelindex -ow -o /tmp/mahout-work-grid/20news-testing -c
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.

15/03/08 21:09:57 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
15/03/08 21:09:58 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-grid/20news-train-vectors], --labelIndex=[/tmp/mahout-work-grid/labelindex], --model=[/tmp/mahout-work-grid/model], --output=[/tmp/mahout-work-grid/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp], --testComplementary=null}
15/03/08 21:10:01 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:10:02 INFO mapred.JobClient: Running job: job_201503081659_0013
15/03/08 21:10:03 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:10:23 INFO mapred.JobClient:  map 20% reduce 0%
15/03/08 21:10:26 INFO mapred.JobClient:  map 36% reduce 0%
15/03/08 21:10:29 INFO mapred.JobClient:  map 54% reduce 0%
15/03/08 21:10:32 INFO mapred.JobClient:  map 71% reduce 0%
15/03/08 21:10:35 INFO mapred.JobClient:  map 89% reduce 0%
15/03/08 21:10:38 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:10:42 INFO mapred.JobClient: Job complete: job_201503081659_0013
15/03/08 21:10:42 INFO mapred.JobClient: Counters: 20
15/03/08 21:10:42 INFO mapred.JobClient:   Job Counters 
15/03/08 21:10:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=34759
15/03/08 21:10:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:10:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:10:42 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:10:42 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:10:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/03/08 21:10:42 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:10:42 INFO mapred.JobClient:     Bytes Written=2155560
15/03/08 21:10:42 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:10:42 INFO mapred.JobClient:     FILE_BYTES_READ=3676597
15/03/08 21:10:42 INFO mapred.JobClient:     HDFS_BYTES_READ=12920362
15/03/08 21:10:42 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59447
15/03/08 21:10:42 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2155560
15/03/08 21:10:42 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:10:42 INFO mapred.JobClient:     Bytes Read=12920223
15/03/08 21:10:42 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:10:42 INFO mapred.JobClient:     Map input records=11441
15/03/08 21:10:42 INFO mapred.JobClient:     Physical memory (bytes) snapshot=99229696
15/03/08 21:10:42 INFO mapred.JobClient:     Spilled Records=0
15/03/08 21:10:42 INFO mapred.JobClient:     CPU time spent (ms)=25750
15/03/08 21:10:42 INFO mapred.JobClient:     Total committed heap usage (bytes)=35360768
15/03/08 21:10:42 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=724860928
15/03/08 21:10:42 INFO mapred.JobClient:     Map output records=11441
15/03/08 21:10:42 INFO mapred.JobClient:     SPLIT_RAW_BYTES=139
15/03/08 21:10:44 INFO test.TestNaiveBayesDriver: Complementary Results: 
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :      11313	   98.8812%
Incorrectly Classified Instances        :        128	    1.1188%
Total Classified Instances              :      11441

=======================================================
Confusion Matrix
-------------------------------------------------------
a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k    	l    	m    	n     o    	p    	q    	r    	s    	t    	<--Classified as
480  	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0     1    	0    	0    	0    	1    	0    	 |  482   	a     = alt.atheism
0    	568  	2    	1    	1    	4    	1    	0    	0    	0    	1    	1    	0    	0     1    	0    	0    	0    	0    	0    	 |  580   	b     = comp.graphics
1    	2    	608  	3    	0    	1    	1    	0    	0    	0    	0    	0    	0    	0     0    	0    	0    	0    	0    	0    	 |  616   	c     = comp.os.ms-windows.misc
1    	0    	3    	596  	0    	0    	1    	1    	0    	0    	0    	0    	0    	0     0    	0    	0    	0    	0    	0    	 |  602   	d     = comp.sys.ibm.pc.hardware
0    	1    	2    	2    	565  	0    	1    	0    	0    	0    	0    	0    	1    	0     0    	0    	1    	1    	0    	0    	 |  574   	e     = comp.sys.mac.hardware
0    	1    	1    	0    	0    	610  	1    	0    	0    	0    	0    	0    	0    	0     2    	0    	1    	0    	0    	0    	 |  616   	f     = comp.windows.x
0    	0    	0    	5    	2    	0    	552  	4    	2    	0    	2    	1    	3    	2     0    	0    	1    	0    	0    	0    	 |  574   	g     = misc.forsale
0    	0    	0    	1    	0    	0    	1    	586  	1    	0    	0    	1    	0    	0     0    	0    	0    	0    	0    	0    	 |  590   	h     = rec.autos
0    	0    	0    	0    	0    	0    	0    	0    	601  	0    	0    	0    	0    	0     0    	0    	0    	0    	0    	0    	 |  601   	i     = rec.motorcycles
0    	0    	0    	0    	0    	0    	0    	0    	0    	623  	3    	1    	0    	0     0    	0    	0    	0    	0    	0    	 |  627   	j     = rec.sport.baseball
0    	0    	0    	0    	0    	0    	0    	0    	0    	1    	620  	0    	0    	0     0    	0    	0    	0    	0    	0    	 |  621   	k     = rec.sport.hockey
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	620  	0    	0     0    	0    	0    	1    	0    	0    	 |  621   	l     = sci.crypt
0    	0    	0    	1    	0    	0    	0    	1    	0    	0    	0    	0    	583  	0     2    	0    	1    	0    	0    	0    	 |  588   	m     = sci.electronics
0    	0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	602   0    	0    	0    	0    	0    	0    	 |  603   	n     = sci.med
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	1     581  	0    	0    	0    	0    	0    	 |  582   	o     = sci.space
0    	0    	0    	0    	0    	0    	0    	0    	1    	0    	0    	0    	0    	1     0    	599  	0    	0    	0    	0    	 |  601   	p     = soc.religion.christian
0    	0    	0    	0    	0    	0    	0    	0    	0    	2    	0    	0    	0    	0     0    	0    	546  	0    	0    	0    	 |  548   	q     = talk.politics.mideast
0    	0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	2    	0    	0     1    	0    	0    	559  	0    	0    	 |  563   	r     = talk.politics.guns
21   	0    	0    	0    	0    	0    	0    	0    	0    	1    	0    	0    	0    	2     0    	3    	2    	1    	348  	1    	 |  379   	s     = talk.religion.misc
0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	1    	0    	0     1    	0    	1    	4    	0    	466  	 |  473   	t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                         0.98
Accuracy                                   98.8812%
Reliability                                94.0454%
Reliability (standard deviation)            0.2162

15/03/08 21:10:44 INFO driver.MahoutDriver: Program took 46839 ms (Minutes: 0.78065)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /tmp/mahout-work-grid/20news-test-vectors -m /tmp/mahout-work-grid/model -l /tmp/mahout-work-grid/labelindex -ow -o /tmp/mahout-work-grid/20news-testing -c
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf
MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.

15/03/08 21:10:51 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
15/03/08 21:10:52 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-grid/20news-test-vectors], --labelIndex=[/tmp/mahout-work-grid/labelindex], --model=[/tmp/mahout-work-grid/model], --output=[/tmp/mahout-work-grid/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp], --testComplementary=null}
15/03/08 21:10:53 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-testing
15/03/08 21:10:55 INFO input.FileInputFormat: Total input paths to process : 1
15/03/08 21:10:56 INFO mapred.JobClient: Running job: job_201503081659_0014
15/03/08 21:10:57 INFO mapred.JobClient:  map 0% reduce 0%
15/03/08 21:11:15 INFO mapred.JobClient:  map 28% reduce 0%
15/03/08 21:11:19 INFO mapred.JobClient:  map 54% reduce 0%
15/03/08 21:11:22 INFO mapred.JobClient:  map 81% reduce 0%
15/03/08 21:11:25 INFO mapred.JobClient:  map 100% reduce 0%
15/03/08 21:11:29 INFO mapred.JobClient: Job complete: job_201503081659_0014
15/03/08 21:11:29 INFO mapred.JobClient: Counters: 20
15/03/08 21:11:29 INFO mapred.JobClient:   Job Counters 
15/03/08 21:11:29 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=29556
15/03/08 21:11:29 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/03/08 21:11:29 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/03/08 21:11:29 INFO mapred.JobClient:     Launched map tasks=1
15/03/08 21:11:29 INFO mapred.JobClient:     Data-local map tasks=1
15/03/08 21:11:29 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/03/08 21:11:29 INFO mapred.JobClient:   File Output Format Counters 
15/03/08 21:11:29 INFO mapred.JobClient:     Bytes Written=1394868
15/03/08 21:11:29 INFO mapred.JobClient:   FileSystemCounters
15/03/08 21:11:29 INFO mapred.JobClient:     FILE_BYTES_READ=3676597
15/03/08 21:11:29 INFO mapred.JobClient:     HDFS_BYTES_READ=8434407
15/03/08 21:11:29 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59446
15/03/08 21:11:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1394868
15/03/08 21:11:29 INFO mapred.JobClient:   File Input Format Counters 
15/03/08 21:11:29 INFO mapred.JobClient:     Bytes Read=8434269
15/03/08 21:11:29 INFO mapred.JobClient:   Map-Reduce Framework
15/03/08 21:11:29 INFO mapred.JobClient:     Map input records=7405
15/03/08 21:11:29 INFO mapred.JobClient:     Physical memory (bytes) snapshot=85610496
15/03/08 21:11:29 INFO mapred.JobClient:     Spilled Records=0
15/03/08 21:11:29 INFO mapred.JobClient:     CPU time spent (ms)=17800
15/03/08 21:11:29 INFO mapred.JobClient:     Total committed heap usage (bytes)=35471360
15/03/08 21:11:29 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=724860928
15/03/08 21:11:29 INFO mapred.JobClient:     Map output records=7405
15/03/08 21:11:29 INFO mapred.JobClient:     SPLIT_RAW_BYTES=138
15/03/08 21:11:31 INFO test.TestNaiveBayesDriver: Complementary Results: 
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       6615	   89.3315%
Incorrectly Classified Instances        :        790	   10.6685%
Total Classified Instances              :       7405

=======================================================
Confusion Matrix
-------------------------------------------------------
a    	b    	c    	d    	e    	f    	g    	h    	i    	j    	k    	l    	m    	n     o    	p    	q    	r    	s    	t    	<--Classified as
299  	0    	0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	0    	1     2    	4    	2    	1    	7    	0    	 |  317   	a     = alt.atheism
0    	303  	15   	12   	3    	21   	7    	2    	1    	0    	6    	9    	4    	1     3    	1    	1    	1    	1    	2    	 |  393   	b     = comp.graphics
1    	9    	290  	30   	6    	15   	6    	2    	0    	0    	2    	2    	3    	1     1    	0    	0    	0    	0    	1    	 |  369   	c     = comp.os.ms-windows.misc
1    	9    	18   	303  	7    	6    	12   	1    	2    	0    	3    	0    	9    	4     3    	0    	0    	2    	0    	0    	 |  380   	d     = comp.sys.ibm.pc.hardware
1    	5    	3    	6    	338  	5    	3    	2    	1    	3    	5    	2    	6    	2     6    	0    	0    	1    	0    	0    	 |  389   	e     = comp.sys.mac.hardware
1    	14   	2    	5    	1    	339  	2    	1    	1    	0    	1    	2    	0    	2     1    	0    	0    	0    	0    	0    	 |  372   	f     = comp.windows.x
0    	9    	7    	26   	14   	3    	282  	15   	2    	4    	3    	3    	14   	1     5    	2    	4    	2    	5    	0    	 |  401   	g     = misc.forsale
0    	1    	0    	0    	3    	0    	4    	375  	5    	1    	2    	0    	3    	0     1    	0    	0    	4    	0    	1    	 |  400   	h     = rec.autos
1    	0    	0    	0    	0    	0    	1    	3    	386  	0    	1    	0    	1    	1     1    	0    	0    	0    	0    	0    	 |  395   	i     = rec.motorcycles
0    	0    	1    	0    	0    	0    	0    	1    	0    	346  	12   	0    	2    	0     1    	1    	2    	0    	1    	0    	 |  367   	j     = rec.sport.baseball
0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	376  	0    	0    	0     0    	0    	0    	0    	0    	1    	 |  378   	k     = rec.sport.hockey
0    	0    	2    	0    	0    	2    	0    	1    	0    	0    	0    	360  	0    	2     1    	0    	0    	0    	0    	2    	 |  370   	l     = sci.crypt
0    	3    	2    	9    	4    	4    	6    	5    	7    	2    	2    	1    	337  	2     7    	4    	0    	0    	0    	1    	 |  396   	m     = sci.electronics
0    	0    	0    	1    	0    	1    	2    	2    	2    	0    	3    	2    	1    	365   1    	1    	3    	2    	1    	0    	 |  387   	n     = sci.med
0    	2    	1    	0    	0    	0    	3    	2    	0    	0    	0    	0    	2    	2     389  	0    	1    	1    	1    	1    	 |  405   	o     = sci.space
3    	0    	0    	2    	2    	0    	0    	1    	0    	0    	0    	0    	1    	1     1    	380  	3    	0    	1    	1    	 |  396   	p     = soc.religion.christian
0    	1    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0    	0     0    	1    	390  	0    	0    	0    	 |  392   	q     = talk.politics.mideast
1    	0    	0    	0    	0    	1    	0    	0    	2    	0    	0    	5    	0    	2     0    	1    	2    	324  	3    	6    	 |  347   	r     = talk.politics.guns
26   	0    	0    	0    	3    	0    	0    	1    	0    	0    	2    	0    	0    	2     3    	25   	2    	6    	173  	6    	 |  249   	s     = talk.religion.misc
0    	0    	1    	1    	0    	0    	1    	0    	0    	2    	2    	3    	0    	2     0    	0    	7    	20   	3    	260  	 |  302   	t     = talk.politics.misc

=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8602
Accuracy                                   89.3315%
Reliability                                84.7841%
Reliability (standard deviation)            0.2149

15/03/08 21:11:31 INFO driver.MahoutDriver: Program took 40182 ms (Minutes: 0.6697)
[grid@hadoop1 ~]$




你可能感兴趣的:(安装 Mahout)