线性回归实际上即是数学中的线性拟合,原理十分简单,不再介绍.本文采用PyTorch进行实现,除了线性模块和归一化为自己实现,其余部分均为PyTorch已经封装好的,值得注意的是,实际上PyTorch已经对线性回归进行了封装,调用方式为nn.Linear(input_channel, output_channel).
import torch
import numpy as np
import torch.utils.data as Dataset
import torch.nn as nn
# generate training data
features = torch.randn((20,2), dtype=torch.float)
true_w = torch.tensor([[2],[5]],dtype=torch.float)
true_y = torch.mm(features,true_w)
# adding noise
true_y += torch.tensor(np.random.normal(0,0.01,true_y.shape), dtype=torch.float)
# load dataset
dataset = Dataset.TensorDataset(features, true_y)
train_iter = Dataset.DataLoader(dataset, batch_size=20)
# define the net
def Linear(X):
return torch.mm(X, w) + b
# batch regularization
def FeatureNormalization(X, epx = 0):
mean = X.mean(0, keepdim=True)
std = X.std(0, keepdim=True)
return (X - mean) / std
# implement training function
def train(net,train_iter,epochs, loss, optim):
for epoch in range(epochs):
for X,y in train_iter:
# X = FeatureNormalization(X)
y_hat = net(X)
l = loss(y_hat, y).sum()
if (epoch+1) % 20 == 0:
print("epoch %d : loss:%2f" % (epoch+1,l.item()))
# initialize the parameters
w = torch.randn((2,1),dtype=torch.float,requires_grad=True)
b = torch.zeros((1),dtype=torch.float,requires_grad=True)
# define the hyperparameters
net = Linear
epochs = 100
loss = nn.MSELoss()
optim = torch.optim.SGD([w,b], lr=0.1)
# train model
# show the result
线性回归的表达式为 y h a t = X W + b y_{hat} =XW + b yhat=XW+b,其中b为常数项.通过在X中增加额外的特征(该特征值始终为1),W也需要相应的增加与之对应的参数,可以将表达式简化为 y h a t = X W y_{hat} = XW yhat=XW.由平方损失函数可知, y h a t = y y_{hat} = y yhat=y时损失值最小,为0.因此,需要求解的方程即是:
X W = y . XW = y. XW=y.
方程中X和y为已知数,而W为未知数,值得注意的是,上述方程为矩阵方程,而非一般的普通方程.看到该方程,本能的想到,在方程两边同时乘 X − 1 X^{-1} X−1即可直接得到结果:
X − 1 X W = X − 1 y W = X − 1 y X^{-1}XW = X^{-1}y\newline W = X^{-1}y X−1XW=X−1yW=X−1y
但是这样做是有问题的,矩阵X不一定为方阵,而非方阵的情况下, X − 1 X^{-1} X−1并不存在.此时,我们需要明确自己的目标,我们需要一个矩阵A,可以满足 A X = E AX=E AX=E,而至于A是不是 X − 1 X^{-1} X−1并不是我们关注的.由于 X T X X^TX XTX必然是一个方阵,因此我们可以借助 X T X X^TX XTX求解我们需要的矩阵.
( X T X ) − 1 X T X = E (X^TX)^{-1}X^TX=E (XTX)−1XTX=E
故 A = ( X T X ) − 1 X T A=(X^TX)^{-1}X^T A=(XTX)−1XT,此时问题已经解决, W = A y = ( X T X ) − 1 X T y W=Ay=(X^TX)^{-1}X^Ty W=Ay=(XTX)−1XTy
theta = torch.tensor(np.mat(torch.mm(features.t(),features).numpy()).I)
theta = torch.mm(theta, features.t())
theta = torch.mm(theta, true_y)