单层神经网络有线性回归和Softmax回归,这篇博客先讨论一下线性回归问题
y = X*W + b
其中W为权重weight,b为偏差bias
l(W, b) = 1/2 * (y - ^y)^2
(W*, b*) = argmin(l(W, b))
本文采用的优化算法为小批量随机梯度下降mini-batch stochastic gradient descent,进行计算数值解
algorithm:先选取一组模型参数初始化,例如随机选取,本文采用高斯随机过程;接下来对参数进行多次迭代,降低loss function;在每次迭代中随机采样样本集,即min-batch;然后求小批量中数据样本的平均损失有关模型参数的导数(梯度);最后用该值乘以学习率learning rate作为迭代减少量
本文采用MXNet框架autograd对参数进行求导
from mxnet import nd, autograd
import random
num_inputs = 2 #特征向量维数
num_samples = 1000 #样本点数目
true_W = nd.array([2, -3.4]) #真实W值
true_b = 3.4 #真实bias
#产生均值为0,方差为1的数据
features = nd.random.normal(loc = 0, scale = 1, shape = (num_samples, num_inputs))
labels = nd.dot(features, true_W.T) + true_b
#对label加入方差为0.01的噪声
labels += nd.random.normal(loc = 0, scale = 0.01, shape = labels.shape)
#W,b 初始化,W方差为0.01的随机产生,b为0
W = nd.random.normal(loc = 0, scale = 0.01, shape = (num_inputs, 1))
b = nd.zeros(shape=(1,))
#随机产生min-batch的数据集
def data_set(batch_size, features, labels):
num_features = len(features)
temp = list(range(num_features))
random.shuffle(temp)
for i in range(0, num_features, batch_size):
j = nd.array(temp[i: min(i+batch_size, num_features)])
yield features.take(j), labels.take(j)
# L2范式的损失函数
def square_loss(y_hat, y):
return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2
#创建W,b的梯度
W.attach_grad()
b.attach_grad()
#线性回归预测
def linreg(X, W , b):
return nd.dot(X, W) + b
#随机梯度下降,更新参数params
def sgd(params, lr, batch_size):
for param in params:
#print(param.grad)
param[:] = param - lr * param.grad / batch_size
def main():
batch_size = 10 #min-bacth的大小
epochs = 10 #迭代次数
loss = square_loss
net = linreg
lr = 0.1 #学习率
for epoch in range(epochs):
for X, y in data_set(batch_size, features, labels):
with autograd.record():
l = loss(net(X, W, b), y) #计算loss function
l.backward() #反向传播
sgd([W, b], lr, batch_size) #更新参数
train_loss = loss(net(features, W, b), labels)
print('epoch %d, loss %f' % (epoch+1, train_loss.mean().asnumpy()))
print(W, b)
main()