小批量梯度下降法(Mini-batch Gradient Descent,Mini-Batch GD)是随机梯度下降法(Stochastic Gradient Descent,SGD)和批量梯度下降法(Batch Gradient Descent,BGD)的折中,相比较于SGD计算代价函数梯度的时候只考虑一个样本和BGD考虑所有样本,Mini-Batch GD计算代价函数梯度的时候考虑一小批样本,兼顾了寻优速度和收敛性。
函数原图像:
源码如下:
%% function test
clear;clc;close all
%% figure of function
rangeX = linspace(-20,20,200);
rangeY = linspace(-20,20,200);
[x0,y0] = meshgrid(rangeX,rangeY);
z = y0.*sin(x0) - x0.*cos(y0);
% plot
figure(1)
surf(x0,y0,z)
shading flat
colorbar
hold on
%% initialize
% generate scattered points
m=1000; %number of points
x = -20+ 40*rand(1,m);
y = -20+ 40*rand(1,m);
z = f(x,y);
% basic variables
t = 0.1; %pause time
n = length(x);
newx = zeros(1,n);
newy = zeros(1,n);
%% gardient descent loop
for j=1:110
figure(2)
cla;
plot3(x,y,z,'.')
axis([-20 20, -20 20])
pause(t)
%% calculate and update points
for i=1:n
[newx(i), newy(i)] = GraDes(x(i), y(i));
end
axis([-20 20, -20 20])
x = newx;
y = newy;
z = f(x,y);
axis([-20 20, -20 20])
end
% plot the final points on the figure of function
figure(1)
plot3(x,y,z,'pr')
%% function: GraDes
function [px,py] = GraDes(x, y)
d = 0.00001;
step = 0.01;
% value
eval0= f(x,y);
evalx= f(x+d,y);
evaly= f(x,y+d);
% derivative
derivativex = (evalx-eval0)/d;
derivativey = (evaly-eval0)/d;
% move
px = x - step*derivativex;
py = y - step*derivativey;
end
%% function: f
function [z] = f(x,y)
z = y.*sin(x) - x.*cos(y);
end
效果图:
可以看到极值搜索效果良好