关于信赖域方法的完整介绍,请参照博客信赖域方法。
此处我们简要介绍理论计算:
给定无约束优化问题 min x ∈ R n f ( x ) , \min\limits_{x\in \mathbb{R}^n}f(x), x∈Rnminf(x),
其中 f ( x ) : R n → R f(x):\mathbb{R}^n\to \mathbb{R} f(x):Rn→R 是一个 n n n维实值函数。
由Taylor定理知 f ( x k + p ) = f k + g k T p + 1 2 p T f ( x k + t p ) p , f(x_k+p)=f_k+g_k^Tp+\frac{1}{2}p^Tf(x_k+tp)p, f(xk+p)=fk+gkTp+21pTf(xk+tp)p,
其中 f k = f ( x k ) , g k = ∇ f ( x k ) , t ∈ ( 0 , 1 ) . f_k=f(x_k),g_k=\nabla f(x_k),t \in (0,1). fk=f(xk),gk=∇f(xk),t∈(0,1).
通过用 B k B_k Bk来逼近上述式子中的二阶项, 我们得到模型函数
m k ( p ) = f k + g k T p + 1 2 p T B k p . m_k(p)=f_k+g_k^Tp+\frac{1}{2}p^TB_kp. mk(p)=fk+gkTp+21pTBkp.
则信赖域方法便是求解下述子问题
min p ∈ R n m k ( p ) = f k + g k T p + 1 2 p T B k p s . t . ∥ p ∥ ≤ Δ . \min\limits_{p\in \mathbb{R}^n}\quad m_k(p)=f_k+g_k^Tp+\frac{1}{2}p^TB_kp\quad s.t.\Vert p\Vert \leq \Delta. p∈Rnminmk(p)=fk+gkTp+21pTBkps.t.∥p∥≤Δ.
此处我们令 B k B_k Bk为精确的Hessian, 采用 D o g l e g Dogleg Dogleg 方法来求解此问题。
对于模型函数的无约束优化问题(忽略下标 k k k), 可以得到全局极小点 p B = − B − 1 g , p_B=-B^{-1}g, pB=−B−1g,
在 ∇ f ( x ) \nabla f(x) ∇f(x)(原函数最速下降)方向得到此方向上的极小点 p U = − g T g g T B g g . p_U=\frac-{g^Tg}{g^TBg}g. pU=gTg−gTBgg.
若 ∥ p B ∥ ≤ Δ , \Vert p_B\Vert\leq \Delta, ∥pB∥≤Δ, 取 p = p B p=p_B p=pB 便是子问题的解; 若 ∥ p U ∥ ≥ Δ , \Vert p_U\Vert \geq \Delta, ∥pU∥≥Δ, 取 p = Δ ∥ p U ∥ p U p=\frac{\Delta}{\Vert p_U\Vert}p_U p=∥pU∥ΔpU便是子问题的解; 否则,求解关于 τ \tau τ一元二次方程组 ∥ p U + ( τ − 1 ) ( p B − p U ) ∥ = Δ \Vert p_U+(\tau-1)(p_B-p_U)\Vert=\Delta ∥pU+(τ−1)(pB−pU)∥=Δ便是此子问题的解。
我们记 s k = a r g min p ∈ R n m k ( p ) = f k + g k T p + 1 2 p T B k p s . t . ∥ p ∥ ≤ Δ . s_k =arg\min\limits_{p\in \mathbb{R}^n}\quad m_k(p)=f_k+g_k^Tp+\frac{1}{2}p^TB_kp\quad s.t.\Vert p\Vert \leq \Delta. sk=argp∈Rnminmk(p)=fk+gkTp+21pTBkps.t.∥p∥≤Δ.
ρ k = f ( x k ) − f ( x k + s k ) m k ( 0 ) − m k ( s k ) , \rho_k=\frac{f(x_k)-f(x_k+s_k)}{m_k(0)-m_k(s_k)}, ρk=mk(0)−mk(sk)f(xk)−f(xk+sk),
则信赖域方法的算法框架为
我们先考虑一个实例
f(x) =100*(x(2)-x(1)^2)^2+(x(1)-1)^2
容易计算其极小值点为 ( 1 , 1 ) . (1,1). (1,1). 其 MATLAB 代码为
%% Trust Region Method for a simple example
% Using Dogleg method and exact Hessian matrix
function y = TrustRegion1()
% initialization
eps = 1e-8;
x0 = [500;500];
maxiter = 1000;
eta1 = 0.25;
eta2 = 0.75;
delta = 50;
delta_max =100;
eta = 0;
x = x0;
i = 0;
while i < maxiter
g = gradient(x);
if norm(g)<= eps
disp(i);
break;
end
h = hessian(x);
p = subproblem(g,h,delta);
rho =-(f(x)-f(x+p))/(g'*p+0.5*p'*h*p);
if rho < eta1
delta = eta1*delta;
elseif rho>eta2 && abs(norm(p)-delta)<1e-8
delta = min(2*delta,delta_max);
else
% do nothing;
end
if rho>eta
x = x+p;
else
% do nothing;
end
i = i+1;
end
y = x;
end
% solve subproblem m_k(p)=f_k+g_k^Tp+0.5*p^TB_kp, s.t. p <= Delta.
%% Dogleg Method
function p = subproblem(g,h,delta)
p_B = -inv(h)*g;
p_U = -g'*g/(g'*h*g)*g;
if norm(p_B)<= delta
p = p_B;
elseif norm(p_U) >= delta
p = delta/norm(p_U)*p_U;
else
p_BU = p_B-p_U;
tau1 = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
if tau1>=1 && tau1<=2
tau = tau1;
else
tau = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
end
p = p_U+tau*p_BU;
end
end
%objective function
function f = f(x)
f = 100*(x(1)^2 - x(2))^2 + (x(1) - 1)^2;
end
% The gradient of f(x)
function g = gradient(x)
dg_dx1 = 2*(x(1)-1)+400*x(1)*(x(1)^2-x(2));
dg_dx2 = 200*(x(2)-x(1)^2);
g = [dg_dx1;dg_dx2];
end
% The Hessian of f(x)
function h = hessian(x)
h_11 = 1200*x(1)^2-400*x(2)+2;
h_12 = -400*x(1);
h_21 = h_12;
h_22 = 200;
h = [h_11,h_12;h_21,h_22];
end
对上述程序稍加修改, 我们便得到了一个一般化的求解无约束优化问题的程序
function y = TrustRegion3(f,x0,delta,delta_max,maxiter,eps)
% f: function;
%x0: initial value;
%delta: the initial radius of trust region;
%delta_max: maximum redius of trust region;
%maxiter: maximum number of iteration;
% eps: Accuracy (the norm of gradient of f);
if nargin<1
eps = 1e-5;
maxiter =1000;
delta_max =100;
delta = 5;
x0 = [50;50];
f = @(x1,x2)100*(x2-x1^2)^2 + (1-x1)^2;
end
eps = 1e-8;
x0 = [500;500];
maxiter = 1000;
eta1 = 0.25;
eta2 = 0.75;
delta = 50;
delta_max =100;
eta = 0;
x = x0;
grad = mygradient(f);
syms x1 x2;
f_s = f(x1,x2);
hess = hessian(f_s);
i = 0;
while i < maxiter
g = grad(x(1),x(2));
if norm(g)<= eps
disp(i);
break;
end
h = subs(hess,{x1,x2},{x(1),x(2)});
h = double(h);
p = subproblem(g,h,delta);
rho =-(f(x(1),x(2))-f(x(1)+p(1),x(2)+p(2)))/(g'*p+0.5*p'*h*p);
if rho < eta1
delta = eta1*delta;
elseif rho>eta2 && abs(norm(p)-delta)<1e-8
delta = min(2*delta,delta_max);
else
% do nothing;
end
if rho>eta
x = x+p;
else
% do nothing;
end
i = i+1;
end
y = x;
end
%% Dogleg Method
function p = subproblem(g,h,delta)
p_B = -inv(h)*g;
p_U = -g'*g/(g'*h*g)*g;
if norm(p_B)<= delta
p = p_B;
elseif norm(p_U) >= delta
p = delta/norm(p_U)*p_U;
else
p_BU = p_B-p_U;
tau1 = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
if tau1>=1 && tau1<=2
tau = tau1;
else
tau = (-p_U'*p_BU+sqrt((p_U'*p_BU)^2-p_BU'*p_BU*(p_U'*p_U-delta^2)))/(p_BU'*p_BU);
end
p = p_U+tau*p_BU;
end
end
function df_dx = mygradient(f)
syms x1 x2;
dfdx=[diff(f,x1);diff(f,x2)];
df_dx = matlabFunction(dfdx);
end