47.在master节点安装MahoutClient,打开Linux Shell运行mahout命令查看Mahout自带的案例程序,将查询结果显示如下。
[root@master~]# mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR toclasspath.
Running on hadoop, using/usr/hdp/2.4.3.0-227/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/hdp/2.4.3.0-227/hadoop/conf
MAHOUT-JOB:/usr/hdp/2.4.3.0-227/mahout/mahout-examples-0.9.0.2.4.3.0-227-job.jar
WARNING: Use "yarn jar" to launch YARNapplications.
An example program must be given as the firstargument.
Valid program names are:
arff.vector: :Generate Vectors from an ARFF file or directory
baumwelch: :Baum-Welch algorithm for unsupervised HMM training
buildforest: :Build the random forest classifier
canopy: :Canopy clustering
cat: : Print afile or resource as the logistic regression models would see it
cleansvd: :Cleanup and verification of SVD output
clusterdump: :Dump cluster output to text
clusterpp: :Groups Clustering Output In Clusters
cmdump: : Dumpconfusion matrix in HTML or text formats
concatmatrices: : Concatenates 2 matrices ofsame cardinality into a single matrix
cvb: : LDA viaCollapsed Variation Bayes (0th deriv. approx)
cvb0_local: :LDA via Collapsed Variation Bayes, in memory locally.
describe: :Describe the fields and target variable in a data set
evaluateFactorization: : compute RMSE and MAE of a rating matrixfactorization against probes
fkmeans: :Fuzzy K-means clustering
hmmpredict: :Generate random sequence of observations by given HMM
itemsimilarity: : Compute the item-item-similarities for item-basedcollaborative filtering
kmeans: :K-means clustering
lucene.vector:: Generate Vectors from a Lucene index
lucene2seq: :Generate Text SequenceFiles from a Lucene index
matrixdump: :Dump matrix in CSV format
matrixmult: :Take the product of two matrices
parallelALS: :ALS-WR factorization of a rating matrix
qualcluster: :Runs clustering experiments and summarizes results in a CSV
recommendfactorized: : Compute recommendations using the factorizationof a rating matrix
recommenditembased: : Compute recommendations using item-basedcollaborative filtering
regexconverter: : Convert text files on a per line basis based onregular expressions
resplit: :Splits a set of SequenceFiles into a number of equal splits
rowid: : MapSequenceFile
rowsimilarity:: Compute the pairwise similarities of the rows of a matrix
runAdaptiveLogistic:: Score new production data using a probably trained and validatedAdaptivelogisticRegression model
runlogistic: :Run a logistic regression model against CSV data
seq2encoded: :Encoded Sparse Vector generation from Text sequence files
seq2sparse: :Sparse Vector generation from Text sequence files
seqdirectory:: Generate sequence files (of Text) from a directory
seqdumper: :Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containinggzipped mail archives
seqwiki: :Wikipedia xml dump to sequence file
spectralkmeans: : Spectral k-means clustering
split: : SplitInput data into test and train sets
splitDataset:: split a rating dataset into training and probe parts
ssvd: :Stochastic SVD
streamingkmeans: : Streaming k-means clustering
svd: : LanczosSingular Value Decomposition
testforest: :Test the random forest classifier
testnb: : Testthe Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
trainlogistic:: Train a logistic regression using stochastic gradient descent
trainnb: :Train the Vector-based Bayes classifier
transpose: :Take the transpose of a matrix
validateAdaptiveLogistic:: Validate an AdaptivelogisticRegression model against hold-out data set
vecdist: :Compute the distances between a set of Vectors (or Cluster or Canopy, they mustfit in memory) and a list of Vectors
vectordump: :Dump vectors from a sequence file to text
viterbi: :Viterbi decoding of hidden states from given output states sequence
48.使用Mahout工具将解压后的20news-bydate.tar.gz文件内容转换成序列文件,保存到/data/mahout/20news/output/20news-seq/目录中,并查看该目录的列表信息,将操作命令和查询结果显示如下。
[root@master ~]# mkdir 20news
[root@master ~]# tar -xzf 20news-bydate.tar.gz -C20news
[root@master ~]# hadoop fs -mkdir -p/data/mahout/20news/20news-all
[root@master ~]# hadoop fs -put 20news/*/data/mahout/20news/20news-all
[root@master ~]# mahout seqdirectory -i /data/mahout/20news/20news-all-o /data/mahout/20news/output/20news-seq
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR toclasspath.
Running on hadoop, using/usr/hdp/2.4.3.0-227/hadoop/bin/hadoop andHADOOP_CONF_DIR=/usr/hdp/2.4.3.0-227/hadoop/conf
MAHOUT-JOB:/usr/hdp/2.4.3.0-227/mahout/mahout-examples-0.9.0.2.4.3.0-227-job.jar
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/12 05:04:32 WARN driver.MahoutDriver: Noseqdirectory.props found on classpath, will use command-line arguments only
17/05/12 05:04:32 INFO common.AbstractJob: Commandline arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],--input=[/data/mahout/20news/20news-all], --keyPrefix=[], --method=[mapreduce],--output=[/data/mahout/20news/output/20news-seq], --startPhase=[0],--tempDir=[temp]}
17/05/12 05:04:35 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/12 05:04:35 INFO client.RMProxy: Connecting toResourceManager at slaver1/10.0.0.108:8050
17/05/12 05:04:53 INFO input.FileInputFormat: Totalinput paths to process : 4262
17/05/12 05:04:53 INFO input.CombineFileInputFormat:DEBUG: Terminated node allocation with : CompletedNodes: 2, size left: 8691977
17/05/12 05:05:10 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/12 05:05:20 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494563840869_0001
17/05/12 05:05:21 INFO impl.YarnClientImpl: Submittedapplication application_1494563840869_0001
17/05/12 05:05:21 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494563840869_0001/
17/05/12 05:05:21 INFO mapreduce.Job: Running job:job_1494563840869_0001
17/05/12 05:06:34 INFO mapreduce.Job: Jobjob_1494563840869_0001 running in uber mode : false
17/05/12 05:06:34 INFO mapreduce.Job: map 0% reduce 0%
17/05/12 05:06:59 INFO mapreduce.Job: map 14% reduce 0%
17/05/12 05:07:02 INFO mapreduce.Job: map 37% reduce 0%
17/05/12 05:07:05 INFO mapreduce.Job: map 66% reduce 0%
17/05/12 05:07:08 INFO mapreduce.Job: map 100% reduce 0%
17/05/12 05:07:15 INFO mapreduce.Job: Jobjob_1494563840869_0001 completed successfully
17/05/12 05:07:15 INFO mapreduce.Job: Counters: 30
FileSystem Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=140047
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=9148620
HDFS: Number of bytes written=3244364
HDFS: Number of read operations=17052
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=49028
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=24514
Total vcore-seconds taken by all map tasks=24514
Total megabyte-seconds taken by all map tasks=24612056
Map-Reduce Framework
Map input records=4262
Map output records=4262
Input split bytes=456643
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=176
CPU time spent (ms)=15710
Physical memory (bytes) snapshot=250163200
Virtual memory (bytes) snapshot=2792144896
Total committed heap usage (bytes)=123207680
FileInput Format Counters
Bytes Read=0
FileOutput Format Counters
Bytes Written=3244364
17/05/12 05:07:15 INFO driver.MahoutDriver: Programtook 163575 ms (Minutes: 2.72625)
[root@master ~]# hadoop fs -ls /data/mahout/20news/output/20news-seq
Found 2 items
-rw-r--r-- 3root hdfs 0 2017-05-12 05:07/data/mahout/20news/output/20news-seq/_SUCCESS
-rw-r--r-- 3root hdfs 3244364 2017-05-12 05:07/data/mahout/20news/output/20news-seq/part-m-00000
49.使用Mahout工具将解压后的20news-bydate.tar.gz文件内容转换成序列文件,保存到/data/mahout/20news/output/20news-seq/目录中,使用-text命令查看序列文件内容(前20行即可),将操作命令和查询结果显示如下。
[root@master ~]# mkdir 20news
[root@master ~]# tar -xzf 20news-bydate.tar.gz -C20news
[root@master ~]# hadoop fs -mkdir -p /data/mahout/20news/20news-all
[root@master ~]# hadoop fs -put 20news/*/data/mahout/20news/20news-all
[root@master ~]# mahout seqdirectory -i/data/mahout/20news/20news-all -o /data/mahout/20news/output/20news-seq
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR toclasspath.
Running on hadoop, using/usr/hdp/2.4.3.0-227/hadoop/bin/hadoop andHADOOP_CONF_DIR=/usr/hdp/2.4.3.0-227/hadoop/conf
MAHOUT-JOB:/usr/hdp/2.4.3.0-227/mahout/mahout-examples-0.9.0.2.4.3.0-227-job.jar
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/12 05:04:32 WARN driver.MahoutDriver: Noseqdirectory.props found on classpath, will use command-line arguments only
17/05/12 05:04:32 INFO common.AbstractJob: Commandline arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],--input=[/data/mahout/20news/20news-all], --keyPrefix=[], --method=[mapreduce],--output=[/data/mahout/20news/output/20news-seq], --startPhase=[0],--tempDir=[temp]}
17/05/12 05:04:35 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/12 05:04:35 INFO client.RMProxy: Connecting toResourceManager at slaver1/10.0.0.108:8050
17/05/12 05:04:53 INFO input.FileInputFormat: Totalinput paths to process : 4262
17/05/12 05:04:53 INFO input.CombineFileInputFormat:DEBUG: Terminated node allocation with : CompletedNodes: 2, size left: 8691977
17/05/12 05:05:10 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/12 05:05:20 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494563840869_0001
17/05/12 05:05:21 INFO impl.YarnClientImpl: Submittedapplication application_1494563840869_0001
17/05/12 05:05:21 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494563840869_0001/
17/05/12 05:05:21 INFO mapreduce.Job: Running job:job_1494563840869_0001
17/05/12 05:06:34 INFO mapreduce.Job: Jobjob_1494563840869_0001 running in uber mode : false
17/05/12 05:06:34 INFO mapreduce.Job: map 0% reduce 0%
17/05/12 05:06:59 INFO mapreduce.Job: map 14% reduce 0%
17/05/12 05:07:02 INFO mapreduce.Job: map 37% reduce 0%
17/05/12 05:07:05 INFO mapreduce.Job: map 66% reduce 0%
17/05/12 05:07:08 INFO mapreduce.Job: map 100% reduce 0%
17/05/12 05:07:15 INFO mapreduce.Job: Jobjob_1494563840869_0001 completed successfully
17/05/12 05:07:15 INFO mapreduce.Job: Counters: 30
FileSystem Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=140047
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=9148620
HDFS: Number of bytes written=3244364
HDFS: Number of read operations=17052
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=49028
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=24514
Total vcore-seconds taken by all map tasks=24514
Total megabyte-seconds taken by all map tasks=24612056
Map-Reduce Framework
Map input records=4262
Map output records=4262
Input split bytes=456643
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=176
CPU time spent (ms)=15710
Physical memory (bytes) snapshot=250163200
Virtual memory (bytes) snapshot=2792144896
Total committed heap usage (bytes)=123207680
FileInput Format Counters
Bytes Read=0
FileOutput Format Counters
Bytes Written=3244364
17/05/12 05:07:15 INFO driver.MahoutDriver: Programtook 163575 ms (Minutes: 2.72625)
[root@master ~]# hadoop fs -text/data/mahout/20news/output/20news-seq/part-m-00000 | head -n 20
17/05/12 05:26:18 INFO zlib.ZlibFactory: Successfullyloaded & initialized native-zlib library
17/05/12 05:26:18 INFO compress.CodecPool: Gotbrand-new decompressor [.deflate]
17/05/12 05:26:18 INFO compress.CodecPool: Gotbrand-new decompressor [.deflate]
17/05/12 05:26:18 INFO compress.CodecPool: Gotbrand-new decompressor [.deflate]
17/05/12 05:26:18 INFO compress.CodecPool: Gotbrand-new decompressor [.deflate]
/20news-bydate-test/alt.atheism/53068 From: [email protected](dean.kaflowitz)
Subject: Re: about the bible quiz answers
Organization: AT&T
Distribution: na
Lines: 18
In article
>
>
> #12) The 2 cheribums are on the Ark of theCovenant. When God said make no
> graven image, he was refering to idols, whichwere created to be worshipped.
> The Ark of the Covenant wasn't wrodhipped andonly the high priest could
> enter the Holy of Holies where it was kept once ayear, on the Day of
> Atonement.
I am not familiar with, or knowledgeable about theoriginal language,
but I believe there is a word for "idol" andthat the translator
would have used the word "idol" instead of"graven image" had
the original said "idol." So I think you're wrong here, but
then again I could be too. I just suggesting a way to determine
text: Unable to write to output stream.
50.使用Mahout挖掘工具对数据集user-item-score.txt(用户-物品-得分)进行物品推荐,要求采用基于项目的协同过滤算法,欧几里得距离公式定义,并且每位用户的推荐个数为3,设置非布尔数据,最大偏好值为4,最小偏好值为1,将推荐输出结果保存到output目录中,通过-cat命令查询输出结果part-r-00000中的内容。将以上执行推荐算法的命令和查询结果显示如下。
[hdfs@master ~]$ hadoop fs -mkdir -p/data/mahout/project
[hdfs@master ~]$ hadoop fs -put user-item-score.txt/data/mahout/project
[hdfs@master ~]$ mahout recommenditembased -i/data/mahout/project/ user-item-score.txt -o /data/mahout/project/output -n 3-b false -s SIMILARITY_EUCLIDEAN_DISTANCE --maxPrefsPerUser 4 --minPrefsPerUser1 --maxPrefsInItemSimilarity 4 --tempDir /data/mahout/project/temp
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR toclasspath.
Running on hadoop, using/usr/hdp/2.4.3.0-227/hadoop/bin/hadoop andHADOOP_CONF_DIR=/usr/hdp/2.4.3.0-227/hadoop/conf
MAHOUT-JOB:/usr/hdp/2.4.3.0-227/mahout/mahout-examples-0.9.0.2.4.3.0-227-job.jar
WARNING: Use "yarn jar" to launch YARNapplications.
17/05/15 19:33:06 WARN driver.MahoutDriver: Norecommenditembased.props found on classpath, will use command-line argumentsonly
17/05/15 19:33:07 INFO common.AbstractJob: Commandline arguments: {--booleanData=[false], --endPhase=[2147483647],--input=[/data/mahout/project/user.txt], --maxPrefsInItemSimilarity=[4],--maxPrefsPerUser=[4], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1],--numRecommendations=[3], --output=[/data/mahout/project/output], --similarityClassname=[SIMILARITY_EUCLIDEAN_DISTANCE],--startPhase=[0], --tempDir=[/data/mahout/project/temp]}
17/05/15 19:33:07 INFO common.AbstractJob: Commandline arguments: {--booleanData=[false], --endPhase=[2147483647],--input=[/data/mahout/project/user.txt], --minPrefsPerUser=[1],--output=[/data/mahout/project/temp/preparePreferenceMatrix],--ratingShift=[0.0], --startPhase=[0], --tempDir=[/data/mahout/project/temp]}
17/05/15 19:33:08 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:33:08 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:33:10 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:33:10 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:33:10 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0013
17/05/15 19:33:11 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0013
17/05/15 19:33:11 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0013/
17/05/15 19:33:11 INFO mapreduce.Job: Running job:job_1494874269419_0013
17/05/15 19:33:18 INFO mapreduce.Job: Jobjob_1494874269419_0013 running in uber mode : false
17/05/15 19:33:18 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:33:25 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:33:33 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:33:36 INFO mapreduce.Job: Jobjob_1494874269419_0013 completed successfully
17/05/15 19:33:37 INFO mapreduce.Job: Counters: 49
FileSystem Counters
FILE: Number of bytes read=54
FILE: Number of bytes written=272323
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=341
HDFS: Number of bytes written=187
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3313
Total time spent by all reduces in occupied slots (ms)=12410
Total time spent by all map tasks (ms)=3313
Total time spent by all reduce tasks (ms)=6205
Total vcore-seconds taken by all map tasks=3313
Total vcore-seconds taken by all reduce tasks=6205
Total megabyte-seconds taken by all map tasks=1696256
Total megabyte-seconds taken by all reduce tasks=6353920
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=84
Map output materialized bytes=46
Input split bytes=112
Combine input records=21
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=46
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=116
CPU time spent (ms)=2080
Physical memory (bytes) snapshot=656359424
Virtual memory (bytes) snapshot=5180207104
Total committed heap usage (bytes)=484442112
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=229
FileOutput Format Counters
Bytes Written=187
17/05/15 19:33:37 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:33:37 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:33:38 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:33:40 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:33:41 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0014
17/05/15 19:33:43 INFO impl.YarnClientImpl:Application submission is not finished, submitted applicationapplication_1494874269419_0014 is still in NEW_SAVING
17/05/15 19:33:44 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0014
17/05/15 19:33:44 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0014/
17/05/15 19:33:44 INFO mapreduce.Job: Running job:job_1494874269419_0014
17/05/15 19:33:55 INFO mapreduce.Job: Jobjob_1494874269419_0014 running in uber mode : false
17/05/15 19:33:55 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:34:03 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:34:12 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:34:27 INFO mapreduce.Job: Jobjob_1494874269419_0014 completed successfully
17/05/15 19:34:27 INFO mapreduce.Job: Counters: 50
FileSystem Counters
FILE: Number of bytes read=113
FILE: Number of bytes written=273073
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=341
HDFS: Number of bytes written=288
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4912
Total time spent by all reduces in occupied slots (ms)=13474
Total time spent by all map tasks (ms)=4912
Total time spent by all reduce tasks (ms)=6737
Total vcore-seconds taken by all map tasks=4912
Total vcore-seconds taken by all reduce tasks=6737
Total megabyte-seconds taken by all map tasks=2514944
Total megabyte-seconds taken by all reduce tasks=6898688
Map-Reduce Framework
Map input records=21
Map output records=21
Map output bytes=147
Map output materialized bytes=105
Input split bytes=112
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=105
Reduce input records=21
Reduce output records=5
Spilled Records=42
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=125
CPU time spent (ms)=1830
Physical memory (bytes) snapshot=666886144
Virtual memory (bytes) snapshot=5178028032
Total committed heap usage (bytes)=488636416
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=229
FileOutput Format Counters
Bytes Written=288
org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
USERS=5
17/05/15 19:34:28 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:34:28 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:34:29 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:34:29 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:34:29 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0015
17/05/15 19:34:29 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0015
17/05/15 19:34:29 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0015/
17/05/15 19:34:29 INFO mapreduce.Job: Running job:job_1494874269419_0015
17/05/15 19:34:36 INFO mapreduce.Job: Job job_1494874269419_0015running in uber mode : false
17/05/15 19:34:36 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:34:45 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:34:51 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:34:52 INFO mapreduce.Job: Jobjob_1494874269419_0015 completed successfully
17/05/15 19:34:52 INFO mapreduce.Job: Counters: 49
FileSystem Counters
FILE: Number of bytes read=126
FILE: Number of bytes written=272583
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=445
HDFS: Number of bytes written=335
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5388
Total time spent by all reduces in occupied slots (ms)=6558
Total time spent by all map tasks (ms)=5388
Total time spent by all reduce tasks (ms)=3279
Total vcore-seconds taken by all map tasks=5388
Total vcore-seconds taken by all reduce tasks=3279
Total megabyte-seconds taken by all map tasks=2758656
Total megabyte-seconds taken by all reduce tasks=3357696
Map-Reduce Framework
Map input records=5
Map output records=21
Map output bytes=336
Map output materialized bytes=118
Input split bytes=157
Combine input records=21
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=118
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=127
CPU time spent (ms)=1970
Physical memory (bytes)snapshot=661454848
Virtual memory (bytes) snapshot=5178253312
Total committed heap usage (bytes)=486014976
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=288
FileOutput Format Counters
Bytes Written=335
17/05/15 19:34:52 INFO common.AbstractJob: Commandline arguments: {--endPhase=[2147483647], --excludeSelfSimilarity=[true],--input=[/data/mahout/project/temp/preparePreferenceMatrix/ratingMatrix],--maxObservationsPerColumn=[4], --maxObservationsPerRow=[4],--maxSimilaritiesPerRow=[100], --numberOfColumns=[5],--output=[/data/mahout/project/temp/similarityMatrix],--randomSeed=[-9223372036854775808],--similarityClassname=[SIMILARITY_EUCLIDEAN_DISTANCE], --startPhase=[0],--tempDir=[/data/mahout/project/temp], --threshold=[4.9E-324]}
17/05/15 19:34:52 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:34:52 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:34:52 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:34:53 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:34:53 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0016
17/05/15 19:34:53 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0016
17/05/15 19:34:53 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0016/
17/05/15 19:34:53 INFO mapreduce.Job: Running job:job_1494874269419_0016
17/05/15 19:35:00 INFO mapreduce.Job: Jobjob_1494874269419_0016 running in uber mode : false
17/05/15 19:35:00 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:35:05 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:35:11 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:35:13 INFO mapreduce.Job: Jobjob_1494874269419_0016 completed successfully
17/05/15 19:35:13 INFO mapreduce.Job: Counters: 49
FileSystem Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=272971
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=493
HDFS: Number of bytes written=150
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3341
Total time spent by all reduces in occupied slots (ms)=6930
Total time spent by all map tasks (ms)=3341
Total time spent by all reduce tasks (ms)=3465
Total vcore-seconds taken by all map tasks=3341
Total vcore-seconds taken by all reduce tasks=3465
Total megabyte-seconds taken by all map tasks=1710592
Total megabyte-seconds taken by all reduce tasks=3548160
Map-Reduce Framework
Map input records=7
Map output records=1
Map output bytes=52
Map output materialized bytes=42
Input split bytes=158
Combine input records=1
Combine output records=1
Reduce input groups=1
Reduce shuffle bytes=42
Reduce input records=1
Reduce output records=0
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=129
CPU time spent (ms)=1890
Physical memory (bytes) snapshot=665804800
Virtual memory (bytes) snapshot=5178810368
Total committed heap usage (bytes)=488636416
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=335
FileOutput Format Counters
Bytes Written=98
17/05/15 19:35:14 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:35:14 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:35:15 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:35:16 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:35:16 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0017
17/05/15 19:35:17 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0017
17/05/15 19:35:17 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0017/
17/05/15 19:35:17 INFO mapreduce.Job: Running job:job_1494874269419_0017
17/05/15 19:35:24 INFO mapreduce.Job: Jobjob_1494874269419_0017 running in uber mode : false
17/05/15 19:35:24 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:35:33 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:35:39 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:35:40 INFO mapreduce.Job: Jobjob_1494874269419_0017 completed successfully
17/05/15 19:35:40 INFO mapreduce.Job: Counters: 52
FileSystem Counters
FILE: Number of bytes read=166
FILE: Number of byteswritten=276957
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=545
HDFS: Number of bytes written=447
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=5
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6262
Total time spent by all reduces in occupied slots (ms)=7262
Total time spent by all map tasks (ms)=6262
Total time spent by all reduce tasks (ms)=3631
Total vcore-seconds taken by all map tasks=6262
Total vcore-seconds taken by all reduce tasks=3631
Total megabyte-seconds taken by all map tasks=3206144
Total megabyte-seconds taken by all reduce tasks=3718144
Map-Reduce Framework
Map input records=7
Map output records=22
Map output bytes=476
Map output materialized bytes=158
Input split bytes=158
Combine input records=22
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=158
Reduce input records=8
Reduce output records=5
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=154
CPU time spent (ms)=4190
Physical memory (bytes) snapshot=666284032
Virtual memory (bytes) snapshot=5179322368
Total committed heap usage (bytes)=489684992
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=335
FileOutput Format Counters
Bytes Written=363
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
NEGLECTED_OBSERVATIONS=2
ROWS=7
USED_OBSERVATIONS=19
17/05/15 19:35:40 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:35:40 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:35:44 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:35:45 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:35:45 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0018
17/05/15 19:35:45 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0018
17/05/15 19:35:45 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0018/
17/05/15 19:35:45 INFO mapreduce.Job: Running job:job_1494874269419_0018
17/05/15 19:35:57 INFO mapreduce.Job: Jobjob_1494874269419_0018 running in uber mode : false
17/05/15 19:35:57 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:36:07 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:36:14 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:36:15 INFO mapreduce.Job: Jobjob_1494874269419_0018 completed successfully
17/05/15 19:36:15 INFO mapreduce.Job: Counters: 51
FileSystem Counters
FILE: Number of bytes read=160
FILE: Number of bytes written=275869
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=576
HDFS: Number of bytes written=365
HDFS: Number of read operations=10
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8191
Total time spent by all reduces in occupied slots (ms)=8114
Total time spent by all map tasks (ms)=8191
Total time spent by all reduce tasks (ms)=4057
Total vcore-seconds taken by all map tasks=8191
Total vcore-seconds taken by all reduce tasks=4057
Total megabyte-seconds taken by all map tasks=4193792
Total megabyte-seconds taken by all reduce tasks=4154368
Map-Reduce Framework
Map input records=5
Map output records=19
Map output bytes=632
Map output materialized bytes=152
Input split bytes=129
Combine input records=19
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=152
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=195
CPU time spent (ms)=6470
Physical memory (bytes) snapshot=681738240
Virtual memory (bytes) snapshot=5182803968
Total committed heap usage (bytes)=490733568
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=363
FileOutput Format Counters
Bytes Written=365
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
COOCCURRENCES=47
PRUNED_COOCCURRENCES=0
17/05/15 19:36:16 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:36:16 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:36:17 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:36:17 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:36:17 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0019
17/05/15 19:36:18 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0019
17/05/15 19:36:18 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0019/
17/05/15 19:36:18 INFO mapreduce.Job: Running job:job_1494874269419_0019
17/05/15 19:36:25 INFO mapreduce.Job: Jobjob_1494874269419_0019 running in uber mode : false
17/05/15 19:36:25 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:36:31 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:36:37 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:36:38 INFO mapreduce.Job: Jobjob_1494874269419_0019 completed successfully
17/05/15 19:36:38 INFO mapreduce.Job: Counters: 49
FileSystem Counters
FILE: Number of bytes read=249
FILE: Number of bytes written=273343
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=505
HDFS: Number of bytes written=500
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4469
Total time spent by all reduces in occupied slots (ms)=6400
Total time spent by all map tasks (ms)=4469
Total time spent by all reduce tasks (ms)=3200
Total vcore-seconds taken by all map tasks=4469
Total vcore-seconds taken by all reduce tasks=3200
Total megabyte-seconds taken by all map tasks=2288128
Total megabyte-seconds taken by all reduce tasks=3276800
Map-Reduce Framework
Map input records=7
Map output records=22
Map output bytes=512
Map output materialized bytes=241
Input split bytes=140
Combine input records=22
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=241
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=113
CPU time spent (ms)=3290
Physical memory (bytes) snapshot=665755648
Virtual memory (bytes) snapshot=5179650048
Total committed heap usage (bytes)=486014976
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=365
FileOutput Format Counters
Bytes Written=500
17/05/15 19:36:38 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:36:38 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:36:38 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:36:38 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:36:39 INFO mapreduce.JobSubmitter: numberof splits:2
17/05/15 19:36:39 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1494874269419_0020
17/05/15 19:36:39 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0020
17/05/15 19:36:39 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0020/
17/05/15 19:36:39 INFO mapreduce.Job: Running job:job_1494874269419_0020
17/05/15 19:36:47 INFO mapreduce.Job: Jobjob_1494874269419_0020 running in uber mode : false
17/05/15 19:36:47 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:36:54 INFO mapreduce.Job: map 50% reduce 0%
17/05/15 19:36:55 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:37:00 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:37:01 INFO mapreduce.Job: Jobjob_1494874269419_0020 completed successfully
17/05/15 19:37:01 INFO mapreduce.Job: Counters: 49
FileSystem Counters
FILE: Number of bytes read=309
FILE: Number of bytes written=410207
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1453
HDFS: Number of bytes written=542
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupiedslots (ms)=11488
Total time spent by all reduces in occupied slots (ms)=6586
Total time spent by all map tasks (ms)=11488
Total time spent by all reduce tasks (ms)=3293
Total vcore-seconds taken by all map tasks=11488
Total vcore-seconds taken by all reduce tasks=3293
Total megabyte-seconds taken by all map tasks=5881856
Total megabyte-seconds taken by all reduce tasks=3372032
Map-Reduce Framework
Map input records=12
Map output records=28
Map output bytes=423
Map output materialized bytes=306
Input split bytes=665
Combine input records=0
Combine output records=0
Reduce input groups=7
Reduce shuffle bytes=306
Reduce input records=28
Reduce output records=7
Spilled Records=56
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=244
CPU time spent (ms)=6050
Physical memory (bytes) snapshot=1123991552
Virtual memory (bytes) snapshot=7530958848
Total committed heap usage (bytes)=849870848
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=0
FileOutput Format Counters
Bytes Written=542
17/05/15 19:37:01 INFO impl.TimelineClientImpl:Timeline service address: http://slaver1:8188/ws/v1/timeline/
17/05/15 19:37:01 INFO client.RMProxy: Connecting toResourceManager at slaver1/172.16.180.123:8050
17/05/15 19:37:02 INFO input.FileInputFormat: Totalinput paths to process : 1
17/05/15 19:37:03 INFO mapreduce.JobSubmitter: numberof splits:1
17/05/15 19:37:03 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1494874269419_0021
17/05/15 19:37:03 INFO impl.YarnClientImpl: Submittedapplication application_1494874269419_0021
17/05/15 19:37:03 INFO mapreduce.Job: The url to trackthe job: http://slaver1:8088/proxy/application_1494874269419_0021/
17/05/15 19:37:03 INFO mapreduce.Job: Running job:job_1494874269419_0021
17/05/15 19:37:10 INFO mapreduce.Job: Jobjob_1494874269419_0021 running in uber mode : false
17/05/15 19:37:10 INFO mapreduce.Job: map 0% reduce 0%
17/05/15 19:37:17 INFO mapreduce.Job: map 100% reduce 0%
17/05/15 19:37:24 INFO mapreduce.Job: map 100% reduce 100%
17/05/15 19:37:25 INFO mapreduce.Job: Job job_1494874269419_0021completed successfully
17/05/15 19:37:25 INFO mapreduce.Job: Counters: 49
FileSystem Counters
FILE: Number of bytes read=274
FILE: Number of bytes written=273455
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=866
HDFS: Number of bytes written=185
HDFS: Number of read operations=10
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
JobCounters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4874
Total time spent by all reduces in occupied slots (ms)=6604
Total time spent by all map tasks (ms)=4874
Total time spent by all reduce tasks (ms)=3302
Total vcore-seconds taken by all map tasks=4874
Total vcore-seconds taken by all reduce tasks=3302
Total megabyte-seconds taken by all map tasks=2495488
Total megabyte-seconds taken by all reduce tasks=3381248
Map-Reduce Framework
Map input records=7
Map output records=19
Map output bytes=768
Map output materialized bytes=266
Input split bytes=137
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=266
Reduce input records=19
Reduce output records=5
Spilled Records=38
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=124
CPU time spent (ms)=2150
Physical memory (bytes) snapshot=597028864
Virtual memory (bytes) snapshot=5181710336
Total committed heap usage (bytes)=401080320
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInput Format Counters
Bytes Read=542
FileOutput Format Counters
Bytes Written=185
17/05/15 19:37:25 INFO driver.MahoutDriver: Programtook 259068 ms (Minutes: 4.3178)
[hdfs@master ~]$ hadoop fs -cat/data/mahout/project/output/part-r-00000
1 [105:3.5941463,104:3.4639049]
2 [106:3.5,105:2.714964,107:2.0]
3 [103:3.59246,102:3.458911]
4 [107:4.7381864,105:4.2794304,102:4.170158]
5 [103:3.8962872,102:3.8564017,107:3.7692602]