landen@landen-Lenovo:~/文档/20news$ mahout trainclassifier --help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
Usage:
[--gramSize <gramSize> --help --input <input> --output <output>
--classifierType <classifierType> --dataSource <dataSource> --alpha <a> --minDf
<minDf> --minSupport <minSupport> --skipCleanup]
Options
--gramSize (-ng) gramSize Size of the n-gram. Default Value:
1
--help (-h) Print out help
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for output.
--classifierType (-type) classifierType Type of classifier: bayes|cbayes.
Default: bayes
--dataSource (-source) dataSource Location of model: hdfs. Default
Value: hdfs
--alpha (-a) a Smoothing parameter Default Value:
1.0
--minDf (-mf) minDf Minimum Term Document Frequency: 1
--minSupport (-ms) minSupport Minimum Support (Term Frequency):
1
--skipCleanup (-sc) Skip cleanup of feature extraction
output
13/07/12 16:32:22 INFO driver.MahoutDriver: Program took 52 ms (Minutes: 9.5E-4)
landen@landen-Lenovo:~/文档/20news$ mahout testclassifier --help
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
Usage:
[--defaultCat <defaultCat> --testDir <testDir> --encoding <encoding>
--gramSize <gramSize> --model <model> --classifierType <classifierType>
--dataSource <dataSource> --help --method <method> --verbose --alpha <a>
--confusionMatrix <confusionMatrix>]
Options
--defaultCat (-default) defaultCat The default category Default
Value: unknown
--testDir (-d) testDir The directory where test documents
resides in
--encoding (-e) encoding The file encoding. Defaults to
UTF-8
--gramSize (-ng) gramSize Size of the n-gram. Default Value:
1
--model (-m) model The path on HDFS as defined by the
-source parameter
--classifierType (-type) classifierType Type of classifier: bayes|cbayes.
Default Value: bayes
--dataSource (-source) dataSource Location of model: hdfs
--help (-h) Print out help
--method (-method) method Method of Classification:
sequential|mapreduce. Default
Value: mapreduce
--verbose (-v) Output which values were correctly
and incorrectly classified
--alpha (-a) a Smoothing parameter Default Value:
1.0
--confusionMatrix (-cm) confusionMatrix Export ConfusionMatrix as
SequenceFile
13/07/12 16:32:37 INFO driver.MahoutDriver: Program took 42 ms (Minutes: 7.0E-4)
landen@landen-Lenovo:~/文档/20news$ hadoop fs -ls /20news
Warning: $HADOOP_HOME is deprecated.
Found 3 items
drwxr-xr-x - landen supergroup 0 2013-07-11 17:16 /20news/20news-test
drwxr-xr-x - landen supergroup 0 2013-07-11 17:16 /20news/20news-train
drwxr-xr-x - landen supergroup 0 2013-07-11 21:54 /20news/model
landen@landen-Lenovo:~/文档/20news$ mahout testclassifier -m /20news/model -d /20news/20news-test -type bayes -ng 3 -source hdfs -method mapreduce
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/landen/UntarFile/hadoop-1.0.4
No HADOOP_CONF_DIR set, using /home/landen/UntarFile/hadoop-1.0.4/conf
MAHOUT-JOB: /home/landen/UntarFile/mahout-distribution-0.6/mahout-examples-0.6-job.jar
Warning: $HADOOP_HOME is deprecated.
13/07/12 16:39:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/12 16:40:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/12 16:40:00 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/12 16:40:00 INFO mapred.FileInputFormat: Total input paths to process : 20
13/07/12 16:40:01 INFO mapred.JobClient: Running job: job_201307111633_0009
13/07/12 16:40:02 INFO mapred.JobClient: map 0% reduce 0%
13/07/12 16:43:18 INFO mapred.JobClient: map 3% reduce 0%
13/07/12 16:43:22 INFO mapred.JobClient: map 5% reduce 0%
13/07/12 16:43:28 INFO mapred.JobClient: map 6% reduce 0%
13/07/12 16:43:37 INFO mapred.JobClient: map 8% reduce 0%
13/07/12 16:43:42 INFO mapred.JobClient: map 4% reduce 0%
13/07/12 16:43:56 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000001_0, Status : FAILED
13/07/12 16:44:06 INFO mapred.JobClient: map 5% reduce 1%
13/07/12 16:44:13 INFO mapred.JobClient: map 6% reduce 1%
13/07/12 16:44:23 INFO mapred.JobClient: map 7% reduce 1%
13/07/12 16:44:29 INFO mapred.JobClient: map 8% reduce 1%
13/07/12 16:44:35 INFO mapred.JobClient: map 11% reduce 1%
13/07/12 16:44:38 INFO mapred.JobClient: map 12% reduce 1%
13/07/12 16:44:44 INFO mapred.JobClient: map 13% reduce 1%
13/07/12 16:44:47 INFO mapred.JobClient: map 9% reduce 1%
13/07/12 16:44:53 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000002_0, Status : FAILED
Error: Java heap space
attempt_201307111633_0009_m_000002_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task).
attempt_201307111633_0009_m_000002_0: log4j:WARN Please initialize the log4j system properly.
13/07/12 16:45:03 INFO mapred.JobClient: map 9% reduce 3%
13/07/12 16:45:28 INFO mapred.JobClient: map 14% reduce 3%
13/07/12 16:45:31 INFO mapred.JobClient: map 17% reduce 3%
13/07/12 16:45:34 INFO mapred.JobClient: map 20% reduce 3%
13/07/12 16:45:37 INFO mapred.JobClient: map 20% reduce 5%
13/07/12 16:45:46 INFO mapred.JobClient: map 20% reduce 6%
13/07/12 16:45:55 INFO mapred.JobClient: map 22% reduce 6%
13/07/12 16:45:58 INFO mapred.JobClient: map 24% reduce 6%
13/07/12 16:46:01 INFO mapred.JobClient: map 25% reduce 6%
13/07/12 16:46:07 INFO mapred.JobClient: map 25% reduce 8%
13/07/12 16:46:22 INFO mapred.JobClient: map 26% reduce 8%
13/07/12 16:46:25 INFO mapred.JobClient: map 27% reduce 8%
13/07/12 16:46:31 INFO mapred.JobClient: map 28% reduce 8%
13/07/12 16:46:40 INFO mapred.JobClient: map 29% reduce 8%
13/07/12 16:47:04 INFO mapred.JobClient: map 30% reduce 8%
13/07/12 16:47:16 INFO mapred.JobClient: map 30% reduce 10%
13/07/12 16:47:32 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000007_0, Status : FAILED
Error: Java heap space
13/07/12 16:47:56 INFO mapred.JobClient: map 34% reduce 10%
13/07/12 16:48:13 INFO mapred.JobClient: map 34% reduce 11%
13/07/12 16:48:19 INFO mapred.JobClient: map 39% reduce 11%
13/07/12 16:48:22 INFO mapred.JobClient: map 40% reduce 11%
13/07/12 16:48:34 INFO mapred.JobClient: map 40% reduce 13%
13/07/12 16:48:43 INFO mapred.JobClient: map 44% reduce 13%
13/07/12 16:48:46 INFO mapred.JobClient: map 45% reduce 13%
13/07/12 16:48:58 INFO mapred.JobClient: map 45% reduce 15%
13/07/12 16:49:04 INFO mapred.JobClient: map 48% reduce 15%
13/07/12 16:49:07 INFO mapred.JobClient: map 50% reduce 15%
13/07/12 16:49:13 INFO mapred.JobClient: map 50% reduce 16%
13/07/12 16:49:25 INFO mapred.JobClient: map 53% reduce 16%
13/07/12 16:49:28 INFO mapred.JobClient: map 54% reduce 16%
13/07/12 16:49:43 INFO mapred.JobClient: map 59% reduce 18%
13/07/12 16:49:58 INFO mapred.JobClient: map 59% reduce 20%
13/07/12 16:50:04 INFO mapred.JobClient: map 64% reduce 20%
13/07/12 16:50:13 INFO mapred.JobClient: map 64% reduce 21%
13/07/12 16:50:25 INFO mapred.JobClient: map 69% reduce 21%
13/07/12 16:50:43 INFO mapred.JobClient: map 69% reduce 23%
13/07/12 16:50:46 INFO mapred.JobClient: map 73% reduce 23%
13/07/12 16:50:49 INFO mapred.JobClient: map 75% reduce 23%
13/07/12 16:50:58 INFO mapred.JobClient: map 75% reduce 25%
13/07/12 16:51:08 INFO mapred.JobClient: map 78% reduce 25%
13/07/12 16:51:11 INFO mapred.JobClient: map 80% reduce 25%
13/07/12 16:51:23 INFO mapred.JobClient: map 80% reduce 26%
13/07/12 16:51:29 INFO mapred.JobClient: map 83% reduce 26%
13/07/12 16:51:32 INFO mapred.JobClient: map 85% reduce 26%
13/07/12 16:51:44 INFO mapred.JobClient: map 85% reduce 28%
13/07/12 16:51:50 INFO mapred.JobClient: map 89% reduce 28%
13/07/12 16:51:53 INFO mapred.JobClient: map 90% reduce 28%
13/07/12 16:52:14 INFO mapred.JobClient: map 90% reduce 30%
13/07/12 16:52:20 INFO mapred.JobClient: map 95% reduce 30%
13/07/12 16:52:26 INFO mapred.JobClient: map 95% reduce 31%
13/07/12 16:52:49 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000004_0, Status : FAILED
org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: 文件已存在
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:167)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:312)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:385)
at org.apache.hadoop.mapred.Child$4.run(Child.java:257)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: EEXIST: 文件已存在
at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
... 7 more
attempt_201307111633_0009_m_000004_0: Exception in thread "Thread for syncLogs" java.lang.OutOfMemoryError: Java heap space
attempt_201307111633_0009_m_000004_0: at java.util.Arrays.copyOfRange(Arrays.java:2694)
attempt_201307111633_0009_m_000004_0: at java.lang.String.<init>(String.java:203)
attempt_201307111633_0009_m_000004_0: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread for syncLogs"
13/07/12 16:53:02 INFO mapred.JobClient: map 97% reduce 31%
13/07/12 16:53:05 INFO mapred.JobClient: map 95% reduce 31%
13/07/12 16:53:10 INFO mapred.JobClient: Task Id : attempt_201307111633_0009_m_000004_1, Status : FAILED
Error: Java heap space
13/07/12 16:53:20 INFO mapred.JobClient: map 96% reduce 31%
13/07/12 16:53:23 INFO mapred.JobClient: map 98% reduce 31%
13/07/12 16:53:26 INFO mapred.JobClient: map 100% reduce 31%
13/07/12 16:53:35 INFO mapred.JobClient: map 100% reduce 100%
13/07/12 16:53:41 INFO mapred.JobClient: Job complete: job_201307111633_0009
13/07/12 16:53:41 INFO mapred.JobClient: Counters: 30
13/07/12 16:53:41 INFO mapred.JobClient: Job Counters
13/07/12 16:53:41 INFO mapred.JobClient: Launched reduce tasks=1
13/07/12 16:53:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1153539
13/07/12 16:53:41 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/12 16:53:41 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/07/12 16:53:41 INFO mapred.JobClient: Launched map tasks=25
13/07/12 16:53:41 INFO mapred.JobClient: Data-local map tasks=25
13/07/12 16:53:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=596582
13/07/12 16:53:41 INFO mapred.JobClient: File Input Format Counters
13/07/12 16:53:41 INFO mapred.JobClient: Bytes Read=10399829
13/07/12 16:53:41 INFO mapred.JobClient: File Output Format Counters
13/07/12 16:53:41 INFO mapred.JobClient: Bytes Written=13482
13/07/12 16:53:41 INFO mapred.JobClient: FileSystemCounters
13/07/12 16:53:41 INFO mapred.JobClient: FILE_BYTES_READ=11889
13/07/12 16:53:41 INFO mapred.JobClient: HDFS_BYTES_READ=421848302
13/07/12 16:53:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=497127
13/07/12 16:53:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=13482
13/07/12 16:53:41 INFO mapred.JobClient: Map-Reduce Framework
13/07/12 16:53:41 INFO mapred.JobClient: Map output materialized bytes=12003
13/07/12 16:53:41 INFO mapred.JobClient: Map input records=7532
13/07/12 16:53:41 INFO mapred.JobClient: Reduce shuffle bytes=11395
13/07/12 16:53:41 INFO mapred.JobClient: Spilled Records=460
13/07/12 16:53:41 INFO mapred.JobClient: Map output bytes=377830
13/07/12 16:53:41 INFO mapred.JobClient: Total committed heap usage (bytes)=2999517184
13/07/12 16:53:41 INFO mapred.JobClient: CPU time spent (ms)=293160
13/07/12 16:53:41 INFO mapred.JobClient: Map input bytes=10399829
13/07/12 16:53:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=2273
13/07/12 16:53:41 INFO mapred.JobClient: Combine input records=7532
13/07/12 16:53:41 INFO mapred.JobClient: Reduce input records=230
13/07/12 16:53:41 INFO mapred.JobClient: Reduce input groups=230
13/07/12 16:53:41 INFO mapred.JobClient: Combine output records=230
13/07/12 16:53:41 INFO mapred.JobClient: Physical memory (bytes) snapshot=3793125376
13/07/12 16:53:41 INFO mapred.JobClient: Reduce output records=230
13/07/12 16:53:41 INFO mapred.JobClient: Virtual memory (bytes) snapshot=8323325952
13/07/12 16:53:41 INFO mapred.JobClient: Map output records=7532
13/07/12 16:53:43 INFO bayes.BayesClassifierDriver: =======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j k l m n o p q r s t <--Classified as
381 0 0 0 0 9 2 0 1 0 1 0 1 0 0 0 0 0 3 0 | 398 a = rec.motorcycles
1 284 0 0 0 1 4 0 6 2 11 0 3 65 0 0 5 0 3 10 | 395 b = comp.windows.x
1 0 340 3 0 2 6 1 0 0 0 0 1 1 12 0 7 0 2 0 | 376 c = talk.politics.mideast
4 0 1 330 0 2 2 0 0 2 1 1 3 0 1 3 12 0 2 0 | 364 d = talk.politics.guns
3 0 4 31 37 6 9 1 0 10 0 0 0 6 93 9 6 36 0 0 | 251 e = talk.religion.misc
7 0 0 0 0 361 2 2 0 1 3 0 6 1 0 1 0 0 11 1 | 396 f = rec.autos
0 0 0 0 0 1 383 9 1 0 0 0 0 0 0 0 0 0 3 0 | 397 g = rec.sport.baseball
1 0 0 0 0 0 8 382 1 0 0 0 2 1 1 0 2 0 1 0 | 399 h = rec.sport.hockey
1 0 0 0 0 3 3 0 335 4 5 0 10 4 0 0 2 0 10 8 | 385 i = comp.sys.mac.hardware
0 3 0 0 0 0 1 0 0 367 0 0 5 10 1 3 2 0 2 0 | 394 j = sci.space
0 0 0 0 0 2 1 0 27 1 300 0 19 11 0 0 0 0 11 20 | 392 k = comp.sys.ibm.pc.hardware
6 0 2 110 0 6 11 4 1 14 0 104 2 1 11 10 26 1 1 0 | 310 l = talk.politics.misc
6 0 1 0 0 4 1 0 8 2 16 0 314 9 0 4 15 0 5 8 | 393 m = sci.electronics
0 13 1 0 0 2 6 0 11 5 11 0 11 304 0 2 10 0 5 8 | 389 n = comp.graphics
2 0 0 0 0 0 5 1 0 2 1 0 1 3 373 5 0 2 1 2 | 398 o = soc.religion.christian
3 0 0 1 0 2 3 3 2 3 2 0 12 10 8 337 1 0 9 0 | 396 p = sci.med
0 1 0 1 0 0 4 0 3 0 1 0 3 8 0 2 370 0 2 1 | 396 q = sci.crypt
9 0 4 10 1 4 6 1 2 4 2 0 0 2 77 14 12 170 0 1 | 319 r = alt.atheism
4 0 0 0 0 9 1 1 9 1 12 0 6 3 0 2 0 0 340 2 | 390 s = misc.forsale
6 5 0 0 0 1 8 0 8 5 50 0 2 39 1 0 8 0 3 258 | 394 t = comp.os.ms-windows.misc
13/07/12 16:53:43 INFO driver.MahoutDriver: Program took 824521 ms (Minutes: 13.742016666666666)
landen@landen-Lenovo:~/文档/20news$