[grid@hadoop1 ~]$ sh $MAHOUT_HOME/examples/bin/classify-20newsgroups.sh Please select a number to choose the corresponding task to run 1. cnaivebayes 2. naivebayes 3. sgd 4. clean -- cleans up the work area in /tmp/mahout-work-grid Enter your choice : 1 ok. You chose 1 and we'll use cnaivebayes creating work directory at /tmp/mahout-work-grid Downloading 20news-bydate % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 13.7M 100 13.7M 0 0 86594 0 0:02:47 0:02:47 --:--:-- 122k Extracting... + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-grid/20news-all + mkdir /tmp/mahout-work-grid/20news-all + cp -R /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/alt.atheism /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.graphics /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/comp.windows.x /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/misc.forsale /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.autos /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.motorcycles /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.baseball /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/rec.sport.hockey /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.crypt /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.electronics /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.med /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/sci.space /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/soc.religion.christian /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.guns /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.mideast /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.politics.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-test/talk.religion.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/alt.atheism /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.graphics /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/comp.windows.x /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/misc.forsale /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.autos /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.motorcycles /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.baseball /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/rec.sport.hockey /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.crypt /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.electronics /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.med /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/sci.space /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/soc.religion.christian /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.guns /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.mideast /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.politics.misc /tmp/mahout-work-grid/20news-bydate/20news-bydate-train/talk.religion.misc /tmp/mahout-work-grid/20news-all + '[' /home/grid/hadoop-1.2.1 '!=' '' ']' + '[' '' == '' ']' + echo 'Copying 20newsgroups data to HDFS' Copying 20newsgroups data to HDFS + set +e + /home/grid/hadoop-1.2.1/bin/hadoop dfs -rmr /tmp/mahout-work-grid/20news-all Warning: $HADOOP_HOME is deprecated. Deleted hdfs://hadoop1:9000/tmp/mahout-work-grid/20news-all + set -e + /home/grid/hadoop-1.2.1/bin/hadoop dfs -put /tmp/mahout-work-grid/20news-all /tmp/mahout-work-grid/20news-all Warning: $HADOOP_HOME is deprecated. + echo 'Creating sequence files from 20newsgroups data' Creating sequence files from 20newsgroups data + ./bin/mahout seqdirectory -i /tmp/mahout-work-grid/20news-all -o /tmp/mahout-work-grid/20news-seq -ow MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar Warning: $HADOOP_HOME is deprecated. 15/03/08 20:54:40 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/tmp/mahout-work-grid/20news-all], --keyPrefix=[], --method=[mapreduce], --output=[/tmp/mahout-work-grid/20news-seq], --overwrite=null, --startPhase=[0], --tempDir=[temp]} 15/03/08 20:54:48 INFO input.FileInputFormat: Total input paths to process : 18846 15/03/08 20:54:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library 15/03/08 20:54:50 WARN snappy.LoadSnappy: Snappy native library not loaded 15/03/08 20:55:36 INFO mapred.JobClient: Running job: job_201503081659_0001 15/03/08 20:55:37 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 20:56:05 INFO mapred.JobClient: map 1% reduce 0% 15/03/08 20:56:08 INFO mapred.JobClient: map 2% reduce 0% 15/03/08 20:56:11 INFO mapred.JobClient: map 4% reduce 0% 15/03/08 20:56:14 INFO mapred.JobClient: map 6% reduce 0% 15/03/08 20:56:17 INFO mapred.JobClient: map 7% reduce 0% 15/03/08 20:56:20 INFO mapred.JobClient: map 8% reduce 0% 15/03/08 20:56:23 INFO mapred.JobClient: map 10% reduce 0% 15/03/08 20:56:26 INFO mapred.JobClient: map 11% reduce 0% 15/03/08 20:56:29 INFO mapred.JobClient: map 13% reduce 0% 15/03/08 20:56:32 INFO mapred.JobClient: map 17% reduce 0% 15/03/08 20:56:35 INFO mapred.JobClient: map 18% reduce 0% 15/03/08 20:56:38 INFO mapred.JobClient: map 19% reduce 0% 15/03/08 20:56:41 INFO mapred.JobClient: map 20% reduce 0% 15/03/08 20:56:44 INFO mapred.JobClient: map 22% reduce 0% 15/03/08 20:56:47 INFO mapred.JobClient: map 23% reduce 0% 15/03/08 20:56:51 INFO mapred.JobClient: map 26% reduce 0% 15/03/08 20:56:54 INFO mapred.JobClient: map 28% reduce 0% 15/03/08 20:56:57 INFO mapred.JobClient: map 29% reduce 0% 15/03/08 20:57:00 INFO mapred.JobClient: map 31% reduce 0% 15/03/08 20:57:03 INFO mapred.JobClient: map 32% reduce 0% 15/03/08 20:57:06 INFO mapred.JobClient: map 33% reduce 0% 15/03/08 20:57:09 INFO mapred.JobClient: map 35% reduce 0% 15/03/08 20:57:12 INFO mapred.JobClient: map 37% reduce 0% 15/03/08 20:57:15 INFO mapred.JobClient: map 38% reduce 0% 15/03/08 20:57:18 INFO mapred.JobClient: map 40% reduce 0% 15/03/08 20:57:21 INFO mapred.JobClient: map 41% reduce 0% 15/03/08 20:57:24 INFO mapred.JobClient: map 43% reduce 0% 15/03/08 20:57:27 INFO mapred.JobClient: map 45% reduce 0% 15/03/08 20:57:30 INFO mapred.JobClient: map 47% reduce 0% 15/03/08 20:57:33 INFO mapred.JobClient: map 50% reduce 0% 15/03/08 20:57:36 INFO mapred.JobClient: map 52% reduce 0% 15/03/08 20:57:39 INFO mapred.JobClient: map 54% reduce 0% 15/03/08 20:57:42 INFO mapred.JobClient: map 56% reduce 0% 15/03/08 20:57:46 INFO mapred.JobClient: map 57% reduce 0% 15/03/08 20:57:49 INFO mapred.JobClient: map 59% reduce 0% 15/03/08 20:57:52 INFO mapred.JobClient: map 62% reduce 0% 15/03/08 20:57:55 INFO mapred.JobClient: map 65% reduce 0% 15/03/08 20:57:58 INFO mapred.JobClient: map 67% reduce 0% 15/03/08 20:58:01 INFO mapred.JobClient: map 69% reduce 0% 15/03/08 20:58:04 INFO mapred.JobClient: map 72% reduce 0% 15/03/08 20:58:07 INFO mapred.JobClient: map 75% reduce 0% 15/03/08 20:58:10 INFO mapred.JobClient: map 78% reduce 0% 15/03/08 20:58:13 INFO mapred.JobClient: map 80% reduce 0% 15/03/08 20:58:16 INFO mapred.JobClient: map 84% reduce 0% 15/03/08 20:58:19 INFO mapred.JobClient: map 87% reduce 0% 15/03/08 20:58:22 INFO mapred.JobClient: map 91% reduce 0% 15/03/08 20:58:25 INFO mapred.JobClient: map 94% reduce 0% 15/03/08 20:58:28 INFO mapred.JobClient: map 97% reduce 0% 15/03/08 20:58:32 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 20:58:36 INFO mapred.JobClient: Job complete: job_201503081659_0001 15/03/08 20:58:36 INFO mapred.JobClient: Counters: 18 15/03/08 20:58:36 INFO mapred.JobClient: Job Counters 15/03/08 20:58:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=171122 15/03/08 20:58:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 20:58:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 20:58:36 INFO mapred.JobClient: Launched map tasks=1 15/03/08 20:58:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 15/03/08 20:58:36 INFO mapred.JobClient: File Output Format Counters 15/03/08 20:58:36 INFO mapred.JobClient: Bytes Written=19202391 15/03/08 20:58:36 INFO mapred.JobClient: FileSystemCounters 15/03/08 20:58:36 INFO mapred.JobClient: HDFS_BYTES_READ=37565643 15/03/08 20:58:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60041 15/03/08 20:58:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=19202391 15/03/08 20:58:36 INFO mapred.JobClient: File Input Format Counters 15/03/08 20:58:36 INFO mapred.JobClient: Bytes Read=0 15/03/08 20:58:36 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 20:58:36 INFO mapred.JobClient: Map input records=18846 15/03/08 20:58:36 INFO mapred.JobClient: Physical memory (bytes) snapshot=113790976 15/03/08 20:58:36 INFO mapred.JobClient: Spilled Records=0 15/03/08 20:58:36 INFO mapred.JobClient: CPU time spent (ms)=109350 15/03/08 20:58:36 INFO mapred.JobClient: Total committed heap usage (bytes)=46481408 15/03/08 20:58:36 INFO mapred.JobClient: Virtual memory (bytes) snapshot=724959232 15/03/08 20:58:36 INFO mapred.JobClient: Map output records=18846 15/03/08 20:58:36 INFO mapred.JobClient: SPLIT_RAW_BYTES=1710640 15/03/08 20:58:36 INFO driver.MahoutDriver: Program took 237312 ms (Minutes: 3.9552) + echo 'Converting sequence files to vectors' Converting sequence files to vectors + ./bin/mahout seq2sparse -i /tmp/mahout-work-grid/20news-seq -o /tmp/mahout-work-grid/20news-vectors -lnorm -nv -wt tfidf MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar Warning: $HADOOP_HOME is deprecated. 15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1 15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0 15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 15/03/08 20:58:44 INFO vectorizer.SparseVectorsFromSequenceFiles: Tokenizing documents in /tmp/mahout-work-grid/20news-seq 15/03/08 20:58:47 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 20:58:49 INFO mapred.JobClient: Running job: job_201503081659_0002 15/03/08 20:58:50 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 20:59:11 INFO mapred.JobClient: map 46% reduce 0% 15/03/08 20:59:14 INFO mapred.JobClient: map 97% reduce 0% 15/03/08 20:59:19 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 20:59:24 INFO mapred.JobClient: Job complete: job_201503081659_0002 15/03/08 20:59:24 INFO mapred.JobClient: Counters: 19 15/03/08 20:59:24 INFO mapred.JobClient: Job Counters 15/03/08 20:59:24 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29191 15/03/08 20:59:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 20:59:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 20:59:24 INFO mapred.JobClient: Launched map tasks=1 15/03/08 20:59:24 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 20:59:24 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 15/03/08 20:59:24 INFO mapred.JobClient: File Output Format Counters 15/03/08 20:59:24 INFO mapred.JobClient: Bytes Written=27503580 15/03/08 20:59:24 INFO mapred.JobClient: FileSystemCounters 15/03/08 20:59:24 INFO mapred.JobClient: HDFS_BYTES_READ=19202520 15/03/08 20:59:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58026 15/03/08 20:59:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27503580 15/03/08 20:59:24 INFO mapred.JobClient: File Input Format Counters 15/03/08 20:59:24 INFO mapred.JobClient: Bytes Read=19202391 15/03/08 20:59:24 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 20:59:24 INFO mapred.JobClient: Map input records=18846 15/03/08 20:59:24 INFO mapred.JobClient: Physical memory (bytes) snapshot=84549632 15/03/08 20:59:24 INFO mapred.JobClient: Spilled Records=0 15/03/08 20:59:24 INFO mapred.JobClient: CPU time spent (ms)=15470 15/03/08 20:59:24 INFO mapred.JobClient: Total committed heap usage (bytes)=23855104 15/03/08 20:59:24 INFO mapred.JobClient: Virtual memory (bytes) snapshot=724852736 15/03/08 20:59:24 INFO mapred.JobClient: Map output records=18846 15/03/08 20:59:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=129 15/03/08 20:59:24 INFO vectorizer.SparseVectorsFromSequenceFiles: Creating Term Frequency Vectors 15/03/08 20:59:24 INFO vectorizer.DictionaryVectorizer: Creating dictionary from /tmp/mahout-work-grid/20news-vectors/tokenized-documents and saving at /tmp/mahout-work-grid/20news-vectors/wordcount 15/03/08 20:59:26 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 20:59:27 INFO mapred.JobClient: Running job: job_201503081659_0003 15/03/08 20:59:28 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:00:00 INFO mapred.JobClient: map 32% reduce 0% 15/03/08 21:00:03 INFO mapred.JobClient: map 58% reduce 0% 15/03/08 21:00:06 INFO mapred.JobClient: map 90% reduce 0% 15/03/08 21:00:08 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:00:23 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:00:27 INFO mapred.JobClient: Job complete: job_201503081659_0003 15/03/08 21:00:27 INFO mapred.JobClient: Counters: 29 15/03/08 21:00:27 INFO mapred.JobClient: Job Counters 15/03/08 21:00:27 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:00:27 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=37260 15/03/08 21:00:27 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:00:27 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:00:27 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:00:27 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:00:27 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=14824 15/03/08 21:00:27 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:00:27 INFO mapred.JobClient: Bytes Written=2315037 15/03/08 21:00:27 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:00:27 INFO mapred.JobClient: FILE_BYTES_READ=11857906 15/03/08 21:00:27 INFO mapred.JobClient: HDFS_BYTES_READ=27503733 15/03/08 21:00:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15513248 15/03/08 21:00:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2315037 15/03/08 21:00:27 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:00:27 INFO mapred.JobClient: Bytes Read=27503580 15/03/08 21:00:27 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:00:27 INFO mapred.JobClient: Map output materialized bytes=3538084 15/03/08 21:00:27 INFO mapred.JobClient: Map input records=18846 15/03/08 21:00:27 INFO mapred.JobClient: Reduce shuffle bytes=3538084 15/03/08 21:00:27 INFO mapred.JobClient: Spilled Records=849345 15/03/08 21:00:27 INFO mapred.JobClient: Map output bytes=39462740 15/03/08 21:00:27 INFO mapred.JobClient: Total committed heap usage (bytes)=147394560 15/03/08 21:00:27 INFO mapred.JobClient: CPU time spent (ms)=26820 15/03/08 21:00:27 INFO mapred.JobClient: Combine input records=3026242 15/03/08 21:00:27 INFO mapred.JobClient: SPLIT_RAW_BYTES=153 15/03/08 21:00:27 INFO mapred.JobClient: Reduce input records=192904 15/03/08 21:00:27 INFO mapred.JobClient: Reduce input groups=192904 15/03/08 21:00:27 INFO mapred.JobClient: Combine output records=554873 15/03/08 21:00:27 INFO mapred.JobClient: Physical memory (bytes) snapshot=278233088 15/03/08 21:00:27 INFO mapred.JobClient: Reduce output records=93563 15/03/08 21:00:27 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1456336896 15/03/08 21:00:27 INFO mapred.JobClient: Map output records=2664273 15/03/08 21:00:32 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:00:33 INFO mapred.JobClient: Running job: job_201503081659_0004 15/03/08 21:00:34 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:01:09 INFO mapred.JobClient: map 31% reduce 0% 15/03/08 21:01:12 INFO mapred.JobClient: map 72% reduce 0% 15/03/08 21:01:15 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:01:33 INFO mapred.JobClient: map 100% reduce 67% 15/03/08 21:01:36 INFO mapred.JobClient: map 100% reduce 79% 15/03/08 21:01:39 INFO mapred.JobClient: map 100% reduce 97% 15/03/08 21:01:40 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:01:46 INFO mapred.JobClient: Job complete: job_201503081659_0004 15/03/08 21:01:46 INFO mapred.JobClient: Counters: 29 15/03/08 21:01:46 INFO mapred.JobClient: Job Counters 15/03/08 21:01:46 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:01:46 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=37702 15/03/08 21:01:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:01:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:01:46 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:01:46 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:01:46 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=25246 15/03/08 21:01:46 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:01:46 INFO mapred.JobClient: Bytes Written=29314118 15/03/08 21:01:46 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:01:46 INFO mapred.JobClient: FILE_BYTES_READ=29226519 15/03/08 21:01:46 INFO mapred.JobClient: HDFS_BYTES_READ=27503733 15/03/08 21:01:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54669982 15/03/08 21:01:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118 15/03/08 21:01:46 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:01:46 INFO mapred.JobClient: Bytes Read=27503580 15/03/08 21:01:46 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:01:46 INFO mapred.JobClient: Map output materialized bytes=27274291 15/03/08 21:01:46 INFO mapred.JobClient: Map input records=18846 15/03/08 21:01:46 INFO mapred.JobClient: Reduce shuffle bytes=27274291 15/03/08 21:01:46 INFO mapred.JobClient: Spilled Records=37692 15/03/08 21:01:46 INFO mapred.JobClient: Map output bytes=27199343 15/03/08 21:01:46 INFO mapred.JobClient: Total committed heap usage (bytes)=174735360 15/03/08 21:01:46 INFO mapred.JobClient: CPU time spent (ms)=33560 15/03/08 21:01:46 INFO mapred.JobClient: Combine input records=0 15/03/08 21:01:46 INFO mapred.JobClient: SPLIT_RAW_BYTES=153 15/03/08 21:01:46 INFO mapred.JobClient: Reduce input records=18846 15/03/08 21:01:46 INFO mapred.JobClient: Reduce input groups=18846 15/03/08 21:01:46 INFO mapred.JobClient: Combine output records=0 15/03/08 21:01:46 INFO mapred.JobClient: Physical memory (bytes) snapshot=310116352 15/03/08 21:01:46 INFO mapred.JobClient: Reduce output records=18846 15/03/08 21:01:46 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1474854912 15/03/08 21:01:46 INFO mapred.JobClient: Map output records=18846 15/03/08 21:01:48 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:01:49 INFO mapred.JobClient: Running job: job_201503081659_0005 15/03/08 21:01:50 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:02:17 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:02:30 INFO mapred.JobClient: map 100% reduce 71% 15/03/08 21:02:33 INFO mapred.JobClient: map 100% reduce 99% 15/03/08 21:02:35 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:02:39 INFO mapred.JobClient: Job complete: job_201503081659_0005 15/03/08 21:02:39 INFO mapred.JobClient: Counters: 29 15/03/08 21:02:39 INFO mapred.JobClient: Job Counters 15/03/08 21:02:39 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:02:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=27104 15/03/08 21:02:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:02:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:02:39 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:02:39 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:02:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18088 15/03/08 21:02:39 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:02:39 INFO mapred.JobClient: Bytes Written=29314118 15/03/08 21:02:39 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:02:39 INFO mapred.JobClient: FILE_BYTES_READ=29059398 15/03/08 21:02:39 INFO mapred.JobClient: HDFS_BYTES_READ=29314269 15/03/08 21:02:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=58236992 15/03/08 21:02:39 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=29314118 15/03/08 21:02:39 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:02:39 INFO mapred.JobClient: Bytes Read=29314118 15/03/08 21:02:39 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:02:39 INFO mapred.JobClient: Map output materialized bytes=29059398 15/03/08 21:02:39 INFO mapred.JobClient: Map input records=18846 15/03/08 21:02:39 INFO mapred.JobClient: Reduce shuffle bytes=29059398 15/03/08 21:02:39 INFO mapred.JobClient: Spilled Records=37692 15/03/08 21:02:39 INFO mapred.JobClient: Map output bytes=28984080 15/03/08 21:02:39 INFO mapred.JobClient: Total committed heap usage (bytes)=176521216 15/03/08 21:02:39 INFO mapred.JobClient: CPU time spent (ms)=21310 15/03/08 21:02:39 INFO mapred.JobClient: Combine input records=0 15/03/08 21:02:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=151 15/03/08 21:02:39 INFO mapred.JobClient: Reduce input records=18846 15/03/08 21:02:39 INFO mapred.JobClient: Reduce input groups=18846 15/03/08 21:02:39 INFO mapred.JobClient: Combine output records=0 15/03/08 21:02:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=305786880 15/03/08 21:02:39 INFO mapred.JobClient: Reduce output records=18846 15/03/08 21:02:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1476845568 15/03/08 21:02:39 INFO mapred.JobClient: Map output records=18846 15/03/08 21:02:39 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/partial-vectors-0 15/03/08 21:02:39 INFO vectorizer.SparseVectorsFromSequenceFiles: Calculating IDF 15/03/08 21:02:42 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:02:43 INFO mapred.JobClient: Running job: job_201503081659_0006 15/03/08 21:02:44 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:03:08 INFO mapred.JobClient: map 47% reduce 0% 15/03/08 21:03:11 INFO mapred.JobClient: map 88% reduce 0% 15/03/08 21:03:13 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:03:23 INFO mapred.JobClient: map 100% reduce 33% 15/03/08 21:03:27 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:03:31 INFO mapred.JobClient: Job complete: job_201503081659_0006 15/03/08 21:03:31 INFO mapred.JobClient: Counters: 29 15/03/08 21:03:31 INFO mapred.JobClient: Job Counters 15/03/08 21:03:31 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:03:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29689 15/03/08 21:03:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:03:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:03:31 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:03:31 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:03:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13909 15/03/08 21:03:31 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:03:31 INFO mapred.JobClient: Bytes Written=1890073 15/03/08 21:03:31 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:03:31 INFO mapred.JobClient: FILE_BYTES_READ=4880830 15/03/08 21:03:31 INFO mapred.JobClient: HDFS_BYTES_READ=29314270 15/03/08 21:03:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6307710 15/03/08 21:03:31 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1890073 15/03/08 21:03:31 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:03:31 INFO mapred.JobClient: Bytes Read=29314118 15/03/08 21:03:31 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:03:31 INFO mapred.JobClient: Map output materialized bytes=1309902 15/03/08 21:03:31 INFO mapred.JobClient: Map input records=18846 15/03/08 21:03:31 INFO mapred.JobClient: Reduce shuffle bytes=1309902 15/03/08 21:03:31 INFO mapred.JobClient: Spilled Records=442190 15/03/08 21:03:31 INFO mapred.JobClient: Map output bytes=31005336 15/03/08 21:03:31 INFO mapred.JobClient: Total committed heap usage (bytes)=147394560 15/03/08 21:03:31 INFO mapred.JobClient: CPU time spent (ms)=23130 15/03/08 21:03:31 INFO mapred.JobClient: Combine input records=2838840 15/03/08 21:03:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=152 15/03/08 21:03:31 INFO mapred.JobClient: Reduce input records=93564 15/03/08 21:03:31 INFO mapred.JobClient: Reduce input groups=93564 15/03/08 21:03:31 INFO mapred.JobClient: Combine output records=348626 15/03/08 21:03:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=281788416 15/03/08 21:03:31 INFO mapred.JobClient: Reduce output records=93564 15/03/08 21:03:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1457623040 15/03/08 21:03:31 INFO mapred.JobClient: Map output records=2583778 15/03/08 21:03:32 INFO vectorizer.SparseVectorsFromSequenceFiles: Pruning 15/03/08 21:03:33 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:03:34 INFO mapred.JobClient: Running job: job_201503081659_0007 15/03/08 21:03:35 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:03:58 INFO mapred.JobClient: map 40% reduce 0% 15/03/08 21:04:01 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:04:26 INFO mapred.JobClient: map 100% reduce 33% 15/03/08 21:04:30 INFO mapred.JobClient: map 100% reduce 66% 15/03/08 21:04:33 INFO mapred.JobClient: map 100% reduce 69% 15/03/08 21:04:36 INFO mapred.JobClient: map 100% reduce 73% 15/03/08 21:04:38 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:04:45 INFO mapred.JobClient: Job complete: job_201503081659_0007 15/03/08 21:04:45 INFO mapred.JobClient: Counters: 29 15/03/08 21:04:45 INFO mapred.JobClient: Job Counters 15/03/08 21:04:45 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:04:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33546 15/03/08 21:04:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:04:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:04:45 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:04:45 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:04:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=32143 15/03/08 21:04:45 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:04:45 INFO mapred.JobClient: Bytes Written=28689283 15/03/08 21:04:45 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:04:45 INFO mapred.JobClient: FILE_BYTES_READ=9646422 15/03/08 21:04:45 INFO mapred.JobClient: HDFS_BYTES_READ=29314270 15/03/08 21:04:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=15602818 15/03/08 21:04:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283 15/03/08 21:04:45 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:04:45 INFO mapred.JobClient: Bytes Read=29314118 15/03/08 21:04:45 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:04:45 INFO mapred.JobClient: Map output materialized bytes=7741585 15/03/08 21:04:45 INFO mapred.JobClient: Map input records=18846 15/03/08 21:04:45 INFO mapred.JobClient: Reduce shuffle bytes=7741585 15/03/08 21:04:45 INFO mapred.JobClient: Spilled Records=37692 15/03/08 21:04:45 INFO mapred.JobClient: Map output bytes=28984080 15/03/08 21:04:45 INFO mapred.JobClient: Total committed heap usage (bytes)=176521216 15/03/08 21:04:45 INFO mapred.JobClient: CPU time spent (ms)=49000 15/03/08 21:04:45 INFO mapred.JobClient: Combine input records=0 15/03/08 21:04:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=152 15/03/08 21:04:45 INFO mapred.JobClient: Reduce input records=18846 15/03/08 21:04:45 INFO mapred.JobClient: Reduce input groups=18846 15/03/08 21:04:45 INFO mapred.JobClient: Combine output records=0 15/03/08 21:04:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=305831936 15/03/08 21:04:45 INFO mapred.JobClient: Reduce output records=18846 15/03/08 21:04:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1458589696 15/03/08 21:04:45 INFO mapred.JobClient: Map output records=18846 15/03/08 21:04:47 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:04:48 INFO mapred.JobClient: Running job: job_201503081659_0008 15/03/08 21:04:49 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:05:18 INFO mapred.JobClient: map 92% reduce 0% 15/03/08 21:05:20 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:05:40 INFO mapred.JobClient: map 100% reduce 71% 15/03/08 21:05:44 INFO mapred.JobClient: map 100% reduce 87% 15/03/08 21:05:47 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:05:55 INFO mapred.JobClient: Job complete: job_201503081659_0008 15/03/08 21:05:55 INFO mapred.JobClient: Counters: 29 15/03/08 21:05:55 INFO mapred.JobClient: Job Counters 15/03/08 21:05:55 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:05:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=35779 15/03/08 21:05:55 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:05:55 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:05:55 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:05:55 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:05:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=26799 15/03/08 21:05:55 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:05:55 INFO mapred.JobClient: Bytes Written=28689283 15/03/08 21:05:55 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:05:55 INFO mapred.JobClient: FILE_BYTES_READ=28437750 15/03/08 21:05:55 INFO mapred.JobClient: HDFS_BYTES_READ=28689445 15/03/08 21:05:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56992598 15/03/08 21:05:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283 15/03/08 21:05:55 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:05:55 INFO mapred.JobClient: Bytes Read=28689283 15/03/08 21:05:55 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:05:55 INFO mapred.JobClient: Map output materialized bytes=28437750 15/03/08 21:05:55 INFO mapred.JobClient: Map input records=18846 15/03/08 21:05:55 INFO mapred.JobClient: Reduce shuffle bytes=28437750 15/03/08 21:05:55 INFO mapred.JobClient: Spilled Records=37692 15/03/08 21:05:55 INFO mapred.JobClient: Map output bytes=28362505 15/03/08 21:05:55 INFO mapred.JobClient: Total committed heap usage (bytes)=175898624 15/03/08 21:05:55 INFO mapred.JobClient: CPU time spent (ms)=34270 15/03/08 21:05:55 INFO mapred.JobClient: Combine input records=0 15/03/08 21:05:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=162 15/03/08 21:05:55 INFO mapred.JobClient: Reduce input records=18846 15/03/08 21:05:55 INFO mapred.JobClient: Reduce input groups=18846 15/03/08 21:05:55 INFO mapred.JobClient: Combine output records=0 15/03/08 21:05:55 INFO mapred.JobClient: Physical memory (bytes) snapshot=290082816 15/03/08 21:05:55 INFO mapred.JobClient: Reduce output records=18846 15/03/08 21:05:55 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1465872384 15/03/08 21:05:55 INFO mapred.JobClient: Map output records=18846 15/03/08 21:05:55 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/tf-vectors-partial 15/03/08 21:05:55 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/tf-vectors-toprune 15/03/08 21:05:58 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:05:59 INFO mapred.JobClient: Running job: job_201503081659_0009 15/03/08 21:06:00 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:06:24 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:06:37 INFO mapred.JobClient: map 100% reduce 68% 15/03/08 21:06:40 INFO mapred.JobClient: map 100% reduce 87% 15/03/08 21:06:43 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:06:47 INFO mapred.JobClient: Job complete: job_201503081659_0009 15/03/08 21:06:47 INFO mapred.JobClient: Counters: 29 15/03/08 21:06:47 INFO mapred.JobClient: Job Counters 15/03/08 21:06:47 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:06:47 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21911 15/03/08 21:06:47 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:06:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:06:48 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:06:48 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:06:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=19167 15/03/08 21:06:48 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:06:48 INFO mapred.JobClient: Bytes Written=28689283 15/03/08 21:06:48 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:06:48 INFO mapred.JobClient: FILE_BYTES_READ=30342579 15/03/08 21:06:48 INFO mapred.JobClient: HDFS_BYTES_READ=28689427 15/03/08 21:06:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56996636 15/03/08 21:06:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283 15/03/08 21:06:48 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:06:48 INFO mapred.JobClient: Bytes Read=28689283 15/03/08 21:06:48 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:06:48 INFO mapred.JobClient: Map output materialized bytes=28437750 15/03/08 21:06:48 INFO mapred.JobClient: Map input records=18846 15/03/08 21:06:48 INFO mapred.JobClient: Reduce shuffle bytes=28437750 15/03/08 21:06:48 INFO mapred.JobClient: Spilled Records=37692 15/03/08 21:06:48 INFO mapred.JobClient: Map output bytes=28362505 15/03/08 21:06:48 INFO mapred.JobClient: Total committed heap usage (bytes)=175898624 15/03/08 21:06:48 INFO mapred.JobClient: CPU time spent (ms)=23140 15/03/08 21:06:48 INFO mapred.JobClient: Combine input records=0 15/03/08 21:06:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=144 15/03/08 21:06:48 INFO mapred.JobClient: Reduce input records=18846 15/03/08 21:06:48 INFO mapred.JobClient: Reduce input groups=18846 15/03/08 21:06:48 INFO mapred.JobClient: Combine output records=0 15/03/08 21:06:48 INFO mapred.JobClient: Physical memory (bytes) snapshot=308453376 15/03/08 21:06:48 INFO mapred.JobClient: Reduce output records=18846 15/03/08 21:06:48 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1454231552 15/03/08 21:06:48 INFO mapred.JobClient: Map output records=18846 15/03/08 21:06:49 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:06:50 INFO mapred.JobClient: Running job: job_201503081659_0010 15/03/08 21:06:51 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:07:12 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:07:25 INFO mapred.JobClient: map 100% reduce 71% 15/03/08 21:07:28 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:07:33 INFO mapred.JobClient: Job complete: job_201503081659_0010 15/03/08 21:07:33 INFO mapred.JobClient: Counters: 29 15/03/08 21:07:33 INFO mapred.JobClient: Job Counters 15/03/08 21:07:33 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:07:33 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=20931 15/03/08 21:07:33 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:07:33 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:07:33 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:07:33 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:07:33 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=17614 15/03/08 21:07:33 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:07:33 INFO mapred.JobClient: Bytes Written=28689283 15/03/08 21:07:33 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:07:33 INFO mapred.JobClient: FILE_BYTES_READ=28437750 15/03/08 21:07:33 INFO mapred.JobClient: HDFS_BYTES_READ=28689434 15/03/08 21:07:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=56993684 15/03/08 21:07:33 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=28689283 15/03/08 21:07:33 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:07:33 INFO mapred.JobClient: Bytes Read=28689283 15/03/08 21:07:33 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:07:33 INFO mapred.JobClient: Map output materialized bytes=28437750 15/03/08 21:07:33 INFO mapred.JobClient: Map input records=18846 15/03/08 21:07:33 INFO mapred.JobClient: Reduce shuffle bytes=28437750 15/03/08 21:07:33 INFO mapred.JobClient: Spilled Records=37692 15/03/08 21:07:33 INFO mapred.JobClient: Map output bytes=28362505 15/03/08 21:07:33 INFO mapred.JobClient: Total committed heap usage (bytes)=175898624 15/03/08 21:07:33 INFO mapred.JobClient: CPU time spent (ms)=20320 15/03/08 21:07:33 INFO mapred.JobClient: Combine input records=0 15/03/08 21:07:33 INFO mapred.JobClient: SPLIT_RAW_BYTES=151 15/03/08 21:07:33 INFO mapred.JobClient: Reduce input records=18846 15/03/08 21:07:33 INFO mapred.JobClient: Reduce input groups=18846 15/03/08 21:07:33 INFO mapred.JobClient: Combine output records=0 15/03/08 21:07:33 INFO mapred.JobClient: Physical memory (bytes) snapshot=298102784 15/03/08 21:07:33 INFO mapred.JobClient: Reduce output records=18846 15/03/08 21:07:33 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1454256128 15/03/08 21:07:33 INFO mapred.JobClient: Map output records=18846 15/03/08 21:07:33 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-vectors/partial-vectors-0 15/03/08 21:07:33 INFO driver.MahoutDriver: Program took 529754 ms (Minutes: 8.829233333333333) + echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset' Creating training and holdout set with a random 80-20 split of the generated vector dataset + ./bin/mahout split -i /tmp/mahout-work-grid/20news-vectors/tfidf-vectors --trainingOutput /tmp/mahout-work-grid/20news-train-vectors --testOutput /tmp/mahout-work-grid/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar Warning: $HADOOP_HOME is deprecated. 15/03/08 21:07:40 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only 15/03/08 21:07:41 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-grid/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/tmp/mahout-work-grid/20news-test-vectors], --trainingOutput=[/tmp/mahout-work-grid/20news-train-vectors]} 15/03/08 21:07:46 INFO utils.SplitInput: part-r-00000 has 162419 lines 15/03/08 21:07:46 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40 15/03/08 21:07:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library 15/03/08 21:07:46 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/03/08 21:07:46 INFO compress.CodecPool: Got brand-new compressor 15/03/08 21:07:46 INFO compress.CodecPool: Got brand-new compressor 15/03/08 21:08:01 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11441, test: 7405 starting at 0 15/03/08 21:08:01 INFO driver.MahoutDriver: Program took 20712 ms (Minutes: 0.3452) + echo 'Training Naive Bayes model' Training Naive Bayes model + ./bin/mahout trainnb -i /tmp/mahout-work-grid/20news-train-vectors -el -o /tmp/mahout-work-grid/model -li /tmp/mahout-work-grid/labelindex -ow -c MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar Warning: $HADOOP_HOME is deprecated. 15/03/08 21:08:13 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only 15/03/08 21:08:14 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/tmp/mahout-work-grid/20news-train-vectors], --labelIndex=[/tmp/mahout-work-grid/labelindex], --output=[/tmp/mahout-work-grid/model], --overwrite=null, --startPhase=[0], --tempDir=[temp], --trainComplementary=null} 15/03/08 21:08:16 INFO util.NativeCodeLoader: Loaded the native-hadoop library 15/03/08 21:08:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/03/08 21:08:17 INFO compress.CodecPool: Got brand-new decompressor 15/03/08 21:08:26 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:08:28 INFO mapred.JobClient: Running job: job_201503081659_0011 15/03/08 21:08:29 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:08:54 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:09:09 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:09:13 INFO mapred.JobClient: Job complete: job_201503081659_0011 15/03/08 21:09:14 INFO mapred.JobClient: Counters: 29 15/03/08 21:09:14 INFO mapred.JobClient: Job Counters 15/03/08 21:09:14 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:09:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24934 15/03/08 21:09:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:09:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:09:14 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:09:14 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:09:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13530 15/03/08 21:09:14 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:09:14 INFO mapred.JobClient: Bytes Written=2757285 15/03/08 21:09:14 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:09:14 INFO mapred.JobClient: FILE_BYTES_READ=1527193 15/03/08 21:09:14 INFO mapred.JobClient: HDFS_BYTES_READ=12920362 15/03/08 21:09:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3173016 15/03/08 21:09:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2757285 15/03/08 21:09:14 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:09:14 INFO mapred.JobClient: Bytes Read=12920223 15/03/08 21:09:14 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:09:14 INFO mapred.JobClient: Map output materialized bytes=1526511 15/03/08 21:09:14 INFO mapred.JobClient: Map input records=11441 15/03/08 21:09:14 INFO mapred.JobClient: Reduce shuffle bytes=1526511 15/03/08 21:09:14 INFO mapred.JobClient: Spilled Records=40 15/03/08 21:09:14 INFO mapred.JobClient: Map output bytes=16896230 15/03/08 21:09:14 INFO mapred.JobClient: Total committed heap usage (bytes)=147394560 15/03/08 21:09:14 INFO mapred.JobClient: CPU time spent (ms)=16930 15/03/08 21:09:14 INFO mapred.JobClient: Combine input records=11441 15/03/08 21:09:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=139 15/03/08 21:09:14 INFO mapred.JobClient: Reduce input records=20 15/03/08 21:09:14 INFO mapred.JobClient: Reduce input groups=20 15/03/08 21:09:14 INFO mapred.JobClient: Combine output records=20 15/03/08 21:09:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=275156992 15/03/08 21:09:14 INFO mapred.JobClient: Reduce output records=20 15/03/08 21:09:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1458122752 15/03/08 21:09:14 INFO mapred.JobClient: Map output records=11441 15/03/08 21:09:15 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:09:16 INFO mapred.JobClient: Running job: job_201503081659_0012 15/03/08 21:09:17 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:09:32 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:09:42 INFO mapred.JobClient: map 100% reduce 33% 15/03/08 21:09:45 INFO mapred.JobClient: map 100% reduce 100% 15/03/08 21:09:49 INFO mapred.JobClient: Job complete: job_201503081659_0012 15/03/08 21:09:49 INFO mapred.JobClient: Counters: 29 15/03/08 21:09:49 INFO mapred.JobClient: Job Counters 15/03/08 21:09:49 INFO mapred.JobClient: Launched reduce tasks=1 15/03/08 21:09:49 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=15773 15/03/08 21:09:49 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:09:49 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:09:49 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:09:49 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:09:49 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12833 15/03/08 21:09:49 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:09:49 INFO mapred.JobClient: Bytes Written=891489 15/03/08 21:09:49 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:09:49 INFO mapred.JobClient: FILE_BYTES_READ=441934 15/03/08 21:09:49 INFO mapred.JobClient: HDFS_BYTES_READ=2757416 15/03/08 21:09:49 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1004208 15/03/08 21:09:49 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=891489 15/03/08 21:09:49 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:09:49 INFO mapred.JobClient: Bytes Read=2757285 15/03/08 21:09:49 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:09:49 INFO mapred.JobClient: Map output materialized bytes=441926 15/03/08 21:09:49 INFO mapred.JobClient: Map input records=20 15/03/08 21:09:49 INFO mapred.JobClient: Reduce shuffle bytes=441926 15/03/08 21:09:49 INFO mapred.JobClient: Spilled Records=4 15/03/08 21:09:49 INFO mapred.JobClient: Map output bytes=891363 15/03/08 21:09:49 INFO mapred.JobClient: Total committed heap usage (bytes)=226623488 15/03/08 21:09:49 INFO mapred.JobClient: CPU time spent (ms)=8220 15/03/08 21:09:49 INFO mapred.JobClient: Combine input records=2 15/03/08 21:09:49 INFO mapred.JobClient: SPLIT_RAW_BYTES=131 15/03/08 21:09:49 INFO mapred.JobClient: Reduce input records=2 15/03/08 21:09:49 INFO mapred.JobClient: Reduce input groups=2 15/03/08 21:09:49 INFO mapred.JobClient: Combine output records=2 15/03/08 21:09:49 INFO mapred.JobClient: Physical memory (bytes) snapshot=293044224 15/03/08 21:09:49 INFO mapred.JobClient: Reduce output records=2 15/03/08 21:09:49 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1475592192 15/03/08 21:09:49 INFO mapred.JobClient: Map output records=2 15/03/08 21:09:51 INFO driver.MahoutDriver: Program took 97669 ms (Minutes: 1.6278333333333332) + echo 'Self testing on training set' Self testing on training set + ./bin/mahout testnb -i /tmp/mahout-work-grid/20news-train-vectors -m /tmp/mahout-work-grid/model -l /tmp/mahout-work-grid/labelindex -ow -o /tmp/mahout-work-grid/20news-testing -c MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar Warning: $HADOOP_HOME is deprecated. 15/03/08 21:09:57 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only 15/03/08 21:09:58 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-grid/20news-train-vectors], --labelIndex=[/tmp/mahout-work-grid/labelindex], --model=[/tmp/mahout-work-grid/model], --output=[/tmp/mahout-work-grid/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp], --testComplementary=null} 15/03/08 21:10:01 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:10:02 INFO mapred.JobClient: Running job: job_201503081659_0013 15/03/08 21:10:03 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:10:23 INFO mapred.JobClient: map 20% reduce 0% 15/03/08 21:10:26 INFO mapred.JobClient: map 36% reduce 0% 15/03/08 21:10:29 INFO mapred.JobClient: map 54% reduce 0% 15/03/08 21:10:32 INFO mapred.JobClient: map 71% reduce 0% 15/03/08 21:10:35 INFO mapred.JobClient: map 89% reduce 0% 15/03/08 21:10:38 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:10:42 INFO mapred.JobClient: Job complete: job_201503081659_0013 15/03/08 21:10:42 INFO mapred.JobClient: Counters: 20 15/03/08 21:10:42 INFO mapred.JobClient: Job Counters 15/03/08 21:10:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=34759 15/03/08 21:10:42 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:10:42 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:10:42 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:10:42 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:10:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 15/03/08 21:10:42 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:10:42 INFO mapred.JobClient: Bytes Written=2155560 15/03/08 21:10:42 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:10:42 INFO mapred.JobClient: FILE_BYTES_READ=3676597 15/03/08 21:10:42 INFO mapred.JobClient: HDFS_BYTES_READ=12920362 15/03/08 21:10:42 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59447 15/03/08 21:10:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2155560 15/03/08 21:10:42 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:10:42 INFO mapred.JobClient: Bytes Read=12920223 15/03/08 21:10:42 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:10:42 INFO mapred.JobClient: Map input records=11441 15/03/08 21:10:42 INFO mapred.JobClient: Physical memory (bytes) snapshot=99229696 15/03/08 21:10:42 INFO mapred.JobClient: Spilled Records=0 15/03/08 21:10:42 INFO mapred.JobClient: CPU time spent (ms)=25750 15/03/08 21:10:42 INFO mapred.JobClient: Total committed heap usage (bytes)=35360768 15/03/08 21:10:42 INFO mapred.JobClient: Virtual memory (bytes) snapshot=724860928 15/03/08 21:10:42 INFO mapred.JobClient: Map output records=11441 15/03/08 21:10:42 INFO mapred.JobClient: SPLIT_RAW_BYTES=139 15/03/08 21:10:44 INFO test.TestNaiveBayesDriver: Complementary Results: ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances : 11313 98.8812% Incorrectly Classified Instances : 128 1.1188% Total Classified Instances : 11441 ======================================================= Confusion Matrix ------------------------------------------------------- a b c d e f g h i j k l m n o p q r s t <--Classified as 480 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 | 482 a = alt.atheism 0 568 2 1 1 4 1 0 0 0 1 1 0 0 1 0 0 0 0 0 | 580 b = comp.graphics 1 2 608 3 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 | 616 c = comp.os.ms-windows.misc 1 0 3 596 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 | 602 d = comp.sys.ibm.pc.hardware 0 1 2 2 565 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 | 574 e = comp.sys.mac.hardware 0 1 1 0 0 610 1 0 0 0 0 0 0 0 2 0 1 0 0 0 | 616 f = comp.windows.x 0 0 0 5 2 0 552 4 2 0 2 1 3 2 0 0 1 0 0 0 | 574 g = misc.forsale 0 0 0 1 0 0 1 586 1 0 0 1 0 0 0 0 0 0 0 0 | 590 h = rec.autos 0 0 0 0 0 0 0 0 601 0 0 0 0 0 0 0 0 0 0 0 | 601 i = rec.motorcycles 0 0 0 0 0 0 0 0 0 623 3 1 0 0 0 0 0 0 0 0 | 627 j = rec.sport.baseball 0 0 0 0 0 0 0 0 0 1 620 0 0 0 0 0 0 0 0 0 | 621 k = rec.sport.hockey 0 0 0 0 0 0 0 0 0 0 0 620 0 0 0 0 0 1 0 0 | 621 l = sci.crypt 0 0 0 1 0 0 0 1 0 0 0 0 583 0 2 0 1 0 0 0 | 588 m = sci.electronics 0 0 1 0 0 0 0 0 0 0 0 0 0 602 0 0 0 0 0 0 | 603 n = sci.med 0 0 0 0 0 0 0 0 0 0 0 0 0 1 581 0 0 0 0 0 | 582 o = sci.space 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 599 0 0 0 0 | 601 p = soc.religion.christian 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 546 0 0 0 | 548 q = talk.politics.mideast 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 559 0 0 | 563 r = talk.politics.guns 21 0 0 0 0 0 0 0 0 1 0 0 0 2 0 3 2 1 348 1 | 379 s = talk.religion.misc 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 4 0 466 | 473 t = talk.politics.misc ======================================================= Statistics ------------------------------------------------------- Kappa 0.98 Accuracy 98.8812% Reliability 94.0454% Reliability (standard deviation) 0.2162 15/03/08 21:10:44 INFO driver.MahoutDriver: Program took 46839 ms (Minutes: 0.78065) + echo 'Testing on holdout set' Testing on holdout set + ./bin/mahout testnb -i /tmp/mahout-work-grid/20news-test-vectors -m /tmp/mahout-work-grid/model -l /tmp/mahout-work-grid/labelindex -ow -o /tmp/mahout-work-grid/20news-testing -c MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Warning: $HADOOP_HOME is deprecated. Running on hadoop, using /home/grid/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/grid/hadoop-1.2.1/conf MAHOUT-JOB: /home/grid/mahout-distribution-0.9/mahout-examples-0.9-job.jar Warning: $HADOOP_HOME is deprecated. 15/03/08 21:10:51 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only 15/03/08 21:10:52 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-grid/20news-test-vectors], --labelIndex=[/tmp/mahout-work-grid/labelindex], --model=[/tmp/mahout-work-grid/model], --output=[/tmp/mahout-work-grid/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp], --testComplementary=null} 15/03/08 21:10:53 INFO common.HadoopUtil: Deleting /tmp/mahout-work-grid/20news-testing 15/03/08 21:10:55 INFO input.FileInputFormat: Total input paths to process : 1 15/03/08 21:10:56 INFO mapred.JobClient: Running job: job_201503081659_0014 15/03/08 21:10:57 INFO mapred.JobClient: map 0% reduce 0% 15/03/08 21:11:15 INFO mapred.JobClient: map 28% reduce 0% 15/03/08 21:11:19 INFO mapred.JobClient: map 54% reduce 0% 15/03/08 21:11:22 INFO mapred.JobClient: map 81% reduce 0% 15/03/08 21:11:25 INFO mapred.JobClient: map 100% reduce 0% 15/03/08 21:11:29 INFO mapred.JobClient: Job complete: job_201503081659_0014 15/03/08 21:11:29 INFO mapred.JobClient: Counters: 20 15/03/08 21:11:29 INFO mapred.JobClient: Job Counters 15/03/08 21:11:29 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29556 15/03/08 21:11:29 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/03/08 21:11:29 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/03/08 21:11:29 INFO mapred.JobClient: Launched map tasks=1 15/03/08 21:11:29 INFO mapred.JobClient: Data-local map tasks=1 15/03/08 21:11:29 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 15/03/08 21:11:29 INFO mapred.JobClient: File Output Format Counters 15/03/08 21:11:29 INFO mapred.JobClient: Bytes Written=1394868 15/03/08 21:11:29 INFO mapred.JobClient: FileSystemCounters 15/03/08 21:11:29 INFO mapred.JobClient: FILE_BYTES_READ=3676597 15/03/08 21:11:29 INFO mapred.JobClient: HDFS_BYTES_READ=8434407 15/03/08 21:11:29 INFO mapred.JobClient: FILE_BYTES_WRITTEN=59446 15/03/08 21:11:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1394868 15/03/08 21:11:29 INFO mapred.JobClient: File Input Format Counters 15/03/08 21:11:29 INFO mapred.JobClient: Bytes Read=8434269 15/03/08 21:11:29 INFO mapred.JobClient: Map-Reduce Framework 15/03/08 21:11:29 INFO mapred.JobClient: Map input records=7405 15/03/08 21:11:29 INFO mapred.JobClient: Physical memory (bytes) snapshot=85610496 15/03/08 21:11:29 INFO mapred.JobClient: Spilled Records=0 15/03/08 21:11:29 INFO mapred.JobClient: CPU time spent (ms)=17800 15/03/08 21:11:29 INFO mapred.JobClient: Total committed heap usage (bytes)=35471360 15/03/08 21:11:29 INFO mapred.JobClient: Virtual memory (bytes) snapshot=724860928 15/03/08 21:11:29 INFO mapred.JobClient: Map output records=7405 15/03/08 21:11:29 INFO mapred.JobClient: SPLIT_RAW_BYTES=138 15/03/08 21:11:31 INFO test.TestNaiveBayesDriver: Complementary Results: ======================================================= Summary ------------------------------------------------------- Correctly Classified Instances : 6615 89.3315% Incorrectly Classified Instances : 790 10.6685% Total Classified Instances : 7405 ======================================================= Confusion Matrix ------------------------------------------------------- a b c d e f g h i j k l m n o p q r s t <--Classified as 299 0 0 1 0 0 0 0 0 0 0 0 0 1 2 4 2 1 7 0 | 317 a = alt.atheism 0 303 15 12 3 21 7 2 1 0 6 9 4 1 3 1 1 1 1 2 | 393 b = comp.graphics 1 9 290 30 6 15 6 2 0 0 2 2 3 1 1 0 0 0 0 1 | 369 c = comp.os.ms-windows.misc 1 9 18 303 7 6 12 1 2 0 3 0 9 4 3 0 0 2 0 0 | 380 d = comp.sys.ibm.pc.hardware 1 5 3 6 338 5 3 2 1 3 5 2 6 2 6 0 0 1 0 0 | 389 e = comp.sys.mac.hardware 1 14 2 5 1 339 2 1 1 0 1 2 0 2 1 0 0 0 0 0 | 372 f = comp.windows.x 0 9 7 26 14 3 282 15 2 4 3 3 14 1 5 2 4 2 5 0 | 401 g = misc.forsale 0 1 0 0 3 0 4 375 5 1 2 0 3 0 1 0 0 4 0 1 | 400 h = rec.autos 1 0 0 0 0 0 1 3 386 0 1 0 1 1 1 0 0 0 0 0 | 395 i = rec.motorcycles 0 0 1 0 0 0 0 1 0 346 12 0 2 0 1 1 2 0 1 0 | 367 j = rec.sport.baseball 0 1 0 0 0 0 0 0 0 0 376 0 0 0 0 0 0 0 0 1 | 378 k = rec.sport.hockey 0 0 2 0 0 2 0 1 0 0 0 360 0 2 1 0 0 0 0 2 | 370 l = sci.crypt 0 3 2 9 4 4 6 5 7 2 2 1 337 2 7 4 0 0 0 1 | 396 m = sci.electronics 0 0 0 1 0 1 2 2 2 0 3 2 1 365 1 1 3 2 1 0 | 387 n = sci.med 0 2 1 0 0 0 3 2 0 0 0 0 2 2 389 0 1 1 1 1 | 405 o = sci.space 3 0 0 2 2 0 0 1 0 0 0 0 1 1 1 380 3 0 1 1 | 396 p = soc.religion.christian 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 390 0 0 0 | 392 q = talk.politics.mideast 1 0 0 0 0 1 0 0 2 0 0 5 0 2 0 1 2 324 3 6 | 347 r = talk.politics.guns 26 0 0 0 3 0 0 1 0 0 2 0 0 2 3 25 2 6 173 6 | 249 s = talk.religion.misc 0 0 1 1 0 0 1 0 0 2 2 3 0 2 0 0 7 20 3 260 | 302 t = talk.politics.misc ======================================================= Statistics ------------------------------------------------------- Kappa 0.8602 Accuracy 89.3315% Reliability 84.7841% Reliability (standard deviation) 0.2149 15/03/08 21:11:31 INFO driver.MahoutDriver: Program took 40182 ms (Minutes: 0.6697) [grid@hadoop1 ~]$