libfm的基本使用

参考文献:http://blog.csdn.net/chloezhao/article/details/53462411

话说fm、ffm等为代表的算法在ctr预测方面已经是大显身手,fm、ffm的开源实现台湾国立大学,业界常用的LibSVM, Liblinear等都是他们开发的,开源代码的效率和质量都非常高,本篇文章也是为了介绍libfm的基本使用,libfm可以用于推荐系统,也可以在计算广告行业用于ctr预测。


下载编译:

http://www.libfm.org/

下载之后进入src目录点击make命令,进行编译,在bin目录下会生成三个可执行的文件的

convert		libFM		transpose
 
  
convert︰ 将文字文件转换成二进制格式的工具
 libFM:   用于训练模型的工具
transpos︰ 转换二进制设计矩阵的一个工具

参数情况:

先运行./libFM -help看看其自带的参数:
----------------------------------------------------------------------------
libFM
  Version: 1.4.2
  Author:  Steffen Rendle, [email protected]
  WWW:     http://www.libfm.org/
This program comes with ABSOLUTELY NO WARRANTY; for details see license.txt.
This is free software, and you are welcome to redistribute it under certain
conditions; for details see license.txt.
----------------------------------------------------------------------------
-cache_size     cache size for data storage (only applicable if data is
                in binary format), default=infty
-dim            'k0,k1,k2': k0=use bias, k1=use 1-way interactions,
                k2=dim of 2-way interactions; default=1,1,8
-help           this screen
-init_stdev     stdev for initialization of 2-way factors; default=0.1
-iter           number of iterations; default=100
-learn_rate     learn_rate for SGD; default=0.1
-meta           filename for meta information about data set
-method         learning method (SGD, SGDA, ALS, MCMC); default=MCMC
-out            filename for output
-regular        'r0,r1,r2' for SGD and ALS: r0=bias regularization,
                r1=1-way regularization, r2=2-way regularization
-relation       BS: filenames for the relations, default=''
-rlog           write measurements within iterations to a file;
                default=''
-task           r=regression, c=binary classification [MANDATORY]
-test           filename for test data [MANDATORY]
-train          filename for training data [MANDATORY]
-validation     filename for validation data (only for SGDA)
-verbosity      how much infos to print; default=0

可以看到-task参数可以选择是r  和 c ,其中r可以用于回归,c可以用于分类,r可以用于推荐系统中等场景,
选择c可以用于ctr预测等场景,-iter表示迭代次数,默认为100次,-method表示优化的方法,可以用sgb,也可以用ALS,好像spark里面自带一个推荐系统的算法就是用的ALS,默认是MCMC这个优化方法没有任何正则选项, -learn_rate是学习率,这个参数一定大于0,一般小于1,-train表示训练的文本文件,这个参数对数据格式要求有两种一种是libsvm格式文件,另外一种是二进制文件(说实话我也不知道这是什么格式文件,可以参考我所参考的wen'xian)
-out选项用于测试文件的预测结果,



我用我的数据做下文本分类,训练数据和测试数据是之前转化好libsvm格式文件:
./libFM -task r -train  train.txt  -test test.txt  -dim ’1,1,8’ -iter 100 -method sgd -learn_rate 0.01 -regular ’0,0,0.01’   -out model.txt


训练过程如下:

----------------------------------------------------------------------------
libFM
  Version: 1.4.2
  Author:  Steffen Rendle, [email protected]
  WWW:     http://www.libfm.org/
This program comes with ABSOLUTELY NO WARRANTY; for details see license.txt.
This is free software, and you are welcome to redistribute it under certain
conditions; for details see license.txt.
----------------------------------------------------------------------------
Loading train...	
has x = 1
has xt = 0
num_rows=61576	num_values=3998509	num_features=209260	min_target=0	max_target=1
Loading test... 	
has x = 1
has xt = 0
num_rows=15436	num_values=1005321	num_features=209258	min_target=0	max_target=1
#relations: 0
Loading meta data...	
learnrate=0.01
learnrates=0.01,0.01,0.01
#iterations=100
SGD: DON'T FORGET TO SHUFFLE THE ROWS IN TRAINING DATA TO GET THE BEST RESULTS.
#Iter=  0	Train=0.347366	Test=0.350806
#Iter=  1	Train=0.288766	Test=0.294584
#Iter=  2	Train=0.260503	Test=0.268643
#Iter=  3	Train=0.243147	Test=0.253436
#Iter=  4	Train=0.231104	Test=0.243393
#Iter=  5	Train=0.22209	Test=0.236236
#Iter=  6	Train=0.214967	Test=0.230852
#Iter=  7	Train=0.209114	Test=0.226627
#Iter=  8	Train=0.204162	Test=0.22322
#Iter=  9	Train=0.199886	Test=0.220407
#Iter= 10	Train=0.196128	Test=0.218044
#Iter= 11	Train=0.192778	Test=0.216032
#Iter= 12	Train=0.189756	Test=0.214291
#Iter= 13	Train=0.187004	Test=0.212767
#Iter= 14	Train=0.184479	Test=0.211426
#Iter= 15	Train=0.182146	Test=0.210239
#Iter= 16	Train=0.179977	Test=0.209181
#Iter= 17	Train=0.177952	Test=0.208231
#Iter= 18	Train=0.176051	Test=0.20737
#Iter= 19	Train=0.17426	Test=0.206581
#Iter= 20	Train=0.172567	Test=0.205863
#Iter= 21	Train=0.170962	Test=0.205207
#Iter= 22	Train=0.169436	Test=0.204604
#Iter= 23	Train=0.167981	Test=0.204051
#Iter= 24	Train=0.166591	Test=0.203541
#Iter= 25	Train=0.165259	Test=0.203069
#Iter= 26	Train=0.163982	Test=0.202631
#Iter= 27	Train=0.162755	Test=0.202224
#Iter= 28	Train=0.161574	Test=0.201845
#Iter= 29	Train=0.160435	Test=0.201489
#Iter= 30	Train=0.159336	Test=0.201155
#Iter= 31	Train=0.158274	Test=0.200843
#Iter= 32	Train=0.157246	Test=0.200546
#Iter= 33	Train=0.15625	Test=0.200266
#Iter= 34	Train=0.155285	Test=0.200002
#Iter= 35	Train=0.154348	Test=0.199752
#Iter= 36	Train=0.153438	Test=0.199518
#Iter= 37	Train=0.152553	Test=0.199298
#Iter= 38	Train=0.151692	Test=0.199091
#Iter= 39	Train=0.150851	Test=0.198896
#Iter= 40	Train=0.150031	Test=0.198712
#Iter= 41	Train=0.149231	Test=0.198539
#Iter= 42	Train=0.148449	Test=0.198375
#Iter= 43	Train=0.147683	Test=0.198219
#Iter= 44	Train=0.146933	Test=0.198071
#Iter= 45	Train=0.1462	Test=0.197931
#Iter= 46	Train=0.145483	Test=0.197799
#Iter= 47	Train=0.144781	Test=0.197673
#Iter= 48	Train=0.144094	Test=0.197553
#Iter= 49	Train=0.143421	Test=0.197441
#Iter= 50	Train=0.142761	Test=0.197334
#Iter= 51	Train=0.142114	Test=0.197233
#Iter= 52	Train=0.14148	Test=0.197138
#Iter= 53	Train=0.140855	Test=0.197044
#Iter= 54	Train=0.140241	Test=0.196956
#Iter= 55	Train=0.139637	Test=0.196871
#Iter= 56	Train=0.139043	Test=0.19679
#Iter= 57	Train=0.138459	Test=0.196714
#Iter= 58	Train=0.137885	Test=0.196642
#Iter= 59	Train=0.13732	Test=0.196574
#Iter= 60	Train=0.136765	Test=0.196509
#Iter= 61	Train=0.136218	Test=0.196448
#Iter= 62	Train=0.135678	Test=0.196391
#Iter= 63	Train=0.135147	Test=0.196337
#Iter= 64	Train=0.134623	Test=0.196285
#Iter= 65	Train=0.134107	Test=0.196236
#Iter= 66	Train=0.133598	Test=0.196189
#Iter= 67	Train=0.133096	Test=0.196144
#Iter= 68	Train=0.132601	Test=0.196103
#Iter= 69	Train=0.132113	Test=0.196064
#Iter= 70	Train=0.131631	Test=0.196027
#Iter= 71	Train=0.131156	Test=0.195992
#Iter= 72	Train=0.130687	Test=0.195959
#Iter= 73	Train=0.130225	Test=0.195928
#Iter= 74	Train=0.129768	Test=0.195899
#Iter= 75	Train=0.129316	Test=0.195871
#Iter= 76	Train=0.12887	Test=0.195844
#Iter= 77	Train=0.12843	Test=0.195819
#Iter= 78	Train=0.127995	Test=0.195797
#Iter= 79	Train=0.127566	Test=0.195775
#Iter= 80	Train=0.127141	Test=0.195756
#Iter= 81	Train=0.126722	Test=0.195737
#Iter= 82	Train=0.126308	Test=0.195721
#Iter= 83	Train=0.125899	Test=0.195705
#Iter= 84	Train=0.125495	Test=0.19569
#Iter= 85	Train=0.125095	Test=0.195676
#Iter= 86	Train=0.124699	Test=0.195664
#Iter= 87	Train=0.124308	Test=0.195653
#Iter= 88	Train=0.123922	Test=0.195643
#Iter= 89	Train=0.123539	Test=0.195634
#Iter= 90	Train=0.123161	Test=0.195624
#Iter= 91	Train=0.122786	Test=0.195616
#Iter= 92	Train=0.122415	Test=0.195609
#Iter= 93	Train=0.122047	Test=0.195603
#Iter= 94	Train=0.121683	Test=0.195598
#Iter= 95	Train=0.121323	Test=0.195595
#Iter= 96	Train=0.120965	Test=0.195592
#Iter= 97	Train=0.120611	Test=0.19559
#Iter= 98	Train=0.120259	Test=0.19559
#Iter= 99	Train=0.119911	Test=0.19559
Final	Train=0.119911	Test=0.19559

看下模型预测结果文件:
0
1
1
0
1
1
0
0.260587
0
1
1
0.717071
0
1

预测出来的是一个概率


没有完善,后面继续更新libffm,


你可能感兴趣的:(机器学习)