Running recommendations with Hadoop
The glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the series of MapReduce jobs discussed previously. These MapReduces and their relationships are illustrated below:
Run the example with wikipida data
bin/hadoop fs -put links-simple-sorted.txt input/input.txt
bin/hadoop fs -put users.txt input/users.txt
In order to run RecommenderJob , and allow Hadoop to run these jobs, you need to combine all of this code into one JAR file, along with all of the code it depends upon.This can be accomplished easily by running mvn clean package from the core/ directory in the Mahout distribution—this will produce a file like mahout-core-0.5-job.jar. Or you can use a precompiled job JAR from Mahout’s distribution.
hadoop jar mahout-core-0.5-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
-Dmapred.input.dir=input/input.txt \
-Dmapred.output.dir=output --usersFile input/users.txt --booleanData
NOTE
The obove command will not run the example codes in <<Mahout In Action>> Ch.06 instead using default related codes in mahout. So I thought this is the big drawback in this book. To run the example codes in Ch.06, you should recreate a project based on org.apache.mahout.cf.taste.hadoop.xxxx in mahout-core project and alter some codes.
OR
Using Tool class WikipediaDataConverter in the example codes to convert links-simple-sorted.txt to the default input format userid, itemid. Then using the obove commands to run it. But this way will hide all in the cover that you learned from the Ch.06. So the best way is to recreate a new project to run the example codes.
------------------
How to alter the codes shows as below:
org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob
//convert items to an internal index
Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX),
TextInputFormat.class, ItemIDIndexMapper.class,
VarIntWritable.class, VarLongWritable.class,
ItemIDIndexReducer.class, VarIntWritable.class,
VarLongWritable.class, SequenceFileOutputFormat.class);
itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
=====>
//convert items to an internal index
Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX),
TextInputFormat.class, WikipediaItemIDIndexMapper.class,
VarIntWritable.class, VarLongWritable.class,
ItemIDIndexReducer.class, VarIntWritable.class,
VarLongWritable.class, SequenceFileOutputFormat.class);
itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
--
//convert user preferences into a vector per user
Job toUserVectors = prepareJob(getInputPath(),
getOutputPath(USER_VECTORS),
TextInputFormat.class,
ToItemPrefsMapper.class,
VarLongWritable.class,
booleanData ? VarLongWritable.class : EntityPrefWritable.class,
ToUserVectorsReducer.class,
VarLongWritable.class,
VectorWritable.class,
SequenceFileOutputFormat.class);
=====>
//convert user preferences into a vector per user
Job toUserVectors = prepareJob(getInputPath(),
getOutputPath(USER_VECTORS),
TextInputFormat.class,
WikipediaToItemPrefsMapper.class,
VarLongWritable.class,
booleanData ? VarLongWritable.class : EntityPrefWritable.class,
WikipediaToUserVectorReducer.class,
VarLongWritable.class,
VectorWritable.class,
SequenceFileOutputFormat.class);
Run samples on hadoop
Env: mahout 0.9 hadoop2.3.0
mvn clean package -Dhadoop2.version=2.3.0
-DskipTests
mvn clean package -Dhadoop.version=2.3.0
-DskipTests
References
https://www.ibm.com/developerworks/library/j-mahout/