非负矩阵分解中基于L1和L2范式的稀疏性约束

L1、L2范式

    假设需要求解的目标函数为:

                    E(x) = f(x) + r(x)

    其中f(x)为损失函数,用来评价模型训练损失,必须是任意的可微凸函数,r(x)为规范化约束因子,用来对模型进行限制,根据模型参数的概率分布不同,r(x)一般有:L1范式约束(模型服从高斯分布),L2范式约束(模型服从拉普拉斯分布);其它的约束一般为两者组合形式。

    L1范式约束一般为:

        

    L2范式约束一般为:

            

     L1范式可以产生比较稀疏的解,具备一定的特征选择的能力,在对高维特征空间进行求解的时候比较有用;L2范式主要是为了防止过拟合。

稀疏性约束

    在文章Non-negative Matrix Factorization With Sparseness Constraints中,将L1范式和L2范式组合起来形成新的约束条件,用稀疏度来表示L1范式和L2范式之间的关系:

                    

    当向量x中只有一个非零的值时,稀疏度为1,当所有元素非零且相等的时候稀疏度为0。n表示向量x的维度。不同稀疏度的向量表示如下:

                    非负矩阵分解中基于L1和L2范式的稀疏性约束_第1张图片

    NMF with Sparseness Constraint

    目标函数:

                    非负矩阵分解中基于L1和L2范式的稀疏性约束_第2张图片

    算法流程如下:

                                    非负矩阵分解中基于L1和L2范式的稀疏性约束_第3张图片

    算法中一个很重要的步骤是投影算法,即给定向量x和L2、L1值,找到给定稀疏度的投影向量。投影算法如下:

                            非负矩阵分解中基于L1和L2范式的稀疏性约束_第4张图片

      算法至多迭代dim(x)次就会收敛,因为每次迭代的时候至少会产生一个新的非零值,所以速度还是很快的。算法的matlab代码在 http://www.cs.helsinki.fi/patrik.hoyer/上,投影部分的python代码如下:

#!/usr/bin/python #-*-coding:utf-8-*- from __future__ import division import math import sys #import numpy """desiredsparseness can be set [0.1,0.2,0.3,0.4,0.5]""" def l1sparse(dimension,desiredsparseness): return math.sqrt(dimension) - (math.sqrt(dimension)-1)*desiredsparseness def vsum(vector): sum = 0 for v in vector: sum += v return sum def v2sum(vector): sum = 0 for v in vector: sum += v*v return sum def vadd(vector,factor): vresult = [] for v in vector: v += factor vresult.append(v) return vresult def vmultip(vector,factor): vresult = [] for v in vector: v = v*factor vresult.append(v) return v def ones(dimension,num): v = [] for i in xrange(dimension): v.append(num) return v def vdec(svector,dvector): vresult = [] for i in xrange(len(svector)): t = svector[i]-dvector[i] vresult.append(t) return vresult def vaddv(svector,dvector): vresult = [] for i in xrange(len(svector)): t = svector[i] + dvector[i] vresult.append(t) return vresult """This should inverse svector first svector:N*1 dvector 1*N""" def vmultipv(svector,dvector): sum = 0 for i in xrange(svector): sum += svector[i]*dvector[i] return sum def checknon(svector): valid = True for v in svector: if v<0: valid = False break return valid def findne(svector): vresult = [] for i in xrange(len(svector)): if svector[i]<0: vresult.append(i) return vresult """ This function solves following : Given a vector svector,find a vector k which having sum(abs(k))=l1norm;sum(k,2)=l2norm ; and is closest to svector in euclidian distance if nn is set to 1 ,and the elements of k is restricted to non-nagative""" def projfuc(svector,l1norm,l2norm,nn): N = len(svector) sum = vsum(svector) factor = (l1norm-sum)/N v = vadd(svector,factor) zerov = [] j = 0#iter times while 1: p = ones(N,1) factor = l1norm/(N-len(zerov)) midpoint = vmultip(p,factor) for vp in zerov: midpoint[vp] = 0 w = vdec(v,midpoint) a = v2sum(w) b = vmultipv(w,v)*2 c = v2sum(v)-l2norm alphap = (-b+float(math.sqrt(b*b-4*a*c)))/(2*a) v1 = vmultip(w,alphap) vnew = vaddv(v1,v) valid = checknon(vnew) if valid: j +=1 v = vnew break; j+=1 zerov = findne(vnew) for vp in zerov: vnew[vp] = 0 sum = vsum(vnew) factor = (l1norm-sum)/(N-len(zerov)) v = vadd(vnew,factor) for vp in zerov: v[vp] = 0 return v


你可能感兴趣的:(工作,稀疏,L1,norm,L2,norm)