测试数据是随机生成的
randn('state',0);
m = 16; n = 8;
A = randn(m,n);
b = randn(m,1);
M = 2;
m i n i m i z e β ∑ i = 1 m h u b e r ( β T x i − y i ) h u b e r ( u ) = { u 2 , ∣ u ∣ < = M M ( 2 ∣ u ∣ − M ) , ∣ u ∣ > M \underset{\beta}{minimize}\sum_{i=1}^{m} huber(\beta^Tx_i - y_i)\\ huber(u)=\left\{\begin{matrix} u^2 ,&|u| <= M \\ M(2|u| - M),& |u| > M \end{matrix}\right. βminimizei=1∑mhuber(βTxi−yi)huber(u)={u2,M(2∣u∣−M),∣u∣<=M∣u∣>M
这里是对他们的误差进行限制,以免回归系数过度偏向误差大的样本造成过拟合,M是一个常数。
当 ∣ β T x i − y i ∣ > M |\beta^Tx_i - y_i|>M ∣βTxi−yi∣>M,令 ∣ u ∣ = ∣ β T x i − y i ∣ = M + v , v > 0 |u|=|\beta^Tx_i - y_i|=M+v,v>0 ∣u∣=∣βTxi−yi∣=M+v,v>0,则有 M ( 2 ∣ u ∣ − M ) = M ( 2 ( M + v ) − M ) = M 2 + 2 M v = ( M + v ) 2 − v 2 M(2|u| - M)=M(2(M+v)-M)=M^2+2Mv=(M+v)^2-v^2 M(2∣u∣−M)=M(2(M+v)−M)=M2+2Mv=(M+v)2−v2
v视为超出M的部分,为了放缓残差的增长速率,这个函数实际时扔掉了v的二次项
disp('Computing the solution of the robust least-squares problem...');
cvx_begin
variable x1(n)
minimize( sum(huber(A*x1-b,M)) )
cvx_end
权值优化
disp('Computing the solution of the least-squares problem with variable weights...');
cvx_begin
variable x2(n)
variable w(m)
minimize( sum(quad_over_lin(diag(A*x2-b),w'+1)) + M^2*ones(1,m)*w)
w >= 0;
cvx_end
这个形式感觉有一些突兀,没有第一种来得直观
先看看误差函数 f ( w ) = u 2 / ( w + 1 ) + M 2 ∗ w , u = β T x i − y i , w > = 0 f(w)=u^2/(w+1)+M^2*w,u =\beta^Tx_i - y_i,w>=0 f(w)=u2/(w+1)+M2∗w,u=βTxi−yi,w>=0
f ′ ( w ) = M 2 − u 2 / ( w + 1 ) 2 f'(w)=M^2 - u^2/(w + 1)^2 f′(w)=M2−u2/(w+1)2
可以看到,当 ∣ u ∣ < M , f ′ ( w ) > 0 , f ( w ) |u|
因此带入 f ( w ) f(w) f(w)得到
f ( w ) = { u 2 , ∣ u ∣ < = M M ( 2 ∣ u ∣ − M ) , ∣ u ∣ > M f(w)=\left\{\begin{matrix} u^2 ,&|u| <= M \\ M(2|u| - M),& |u| > M \end{matrix}\right. f(w)={u2,M(2∣u∣−M),∣u∣<=M∣u∣>M
等价于huber函数
这里需要说明的是
quad_over_lin(diag(A*x2-b),w’+1)的意思
A ∗ x 2 − b ∈ R m × 1 A*x2-b \in \mathbb{R}^{m \times 1} A∗x2−b∈Rm×1属于向量, d i a g diag diag将其转为对角矩阵,对应其对角元素。个人理解纯粹是为了quad_over_lin(x,y)计算
f quad_over_lin ( x , y ) = { x T x / y y > 0 + ∞ y ≤ 0 f_{\text{quad\_over\_lin}}(x,y) = \begin{cases} x^Tx/y & y > 0 \\ +\infty & y\leq 0 \end{cases} fquad_over_lin(x,y)={xTx/y+∞y>0y≤0
disp('Computing the solution of the quadratic program...');
cvx_begin
variable x3(n)
variable u(m)
variable v(m)
minimize( sum(square(u) + 2*M*v) )
A*x3 - b <= u + v;
A*x3 - b >= -u - v;
u >= 0;
u <= M;
v >= 0;
cvx_end
目标值关于v,u的单调递增的函数,因此u和v越小越好
假设有 t i = ∣ A i T x 3 − b i ∣ t_i=|A_i^Tx_3 - b_i| ti=∣AiTx3−bi∣,则有 0 < = t i < = u i + v i 0<=t_i<=u_i+v_i 0<=ti<=ui+vi
极值条件下,必有 u i + v i = t i u_i+v_i=t_i ui+vi=ti
v i = t i − u i , v i > = 0 ⇒ u i < = t i v_i = t_i-u_i,v_i>=0\Rightarrow u_i<=t_i vi=ti−ui,vi>=0⇒ui<=ti。
优化目标 f ( u i ) = s q u a r e ( u i ) + 2 ∗ M ∗ v i ) = u i 2 − 2 M u i + 2 M t i f(u_i)=square(u_i) + 2*M*v_i)=u_i^2-2Mu_i+2Mt_i f(ui)=square(ui)+2∗M∗vi)=ui2−2Mui+2Mti
一阶导得到 f ′ ( u i ) = 2 u i − 2 M f'(u_i)=2u_i-2M f′(ui)=2ui−2M,由于 ∣ 0 ∣ < = u i < = M |0|<=u_i<=M ∣0∣<=ui<=M
当 u i < = M u_i<=M ui<=M目标单调递减,目标的极值位置取决于t_i
u i ∗ = { ∣ t i ∣ , t i < = M M , t i > M u_i^*=\left\{\begin{matrix} |t_i| ,&t_i <= M \\ M,& t_i > M \end{matrix}\right. ui∗={∣ti∣,M,ti<=Mti>M
相对应的
v i ∗ = { 0 , t i < = M t i − M , t i > M v_i^*=\left\{\begin{matrix} 0 ,&t_i <= M \\ t_i-M,& t_i > M \end{matrix}\right. vi∗={0,ti−M,ti<=Mti>M
将最优解带入到优化目标种可以得到
f ( u , v ) = { t i 2 , t i < = M M ( 2 t i − M ) , t i > M f(u,v)=\left\{\begin{matrix} t_i^2 ,&t_i <= M \\ M(2 t_i-M),& t_i > M \end{matrix}\right. f(u,v)={ti2,M(2ti−M),ti<=Mti>M
http://web.cvxr.com/cvx/examples/cvxbook/Ch04_cvx_opt_probs/html/ex_4_5.html#source