线性回归模型——范数、L1,L2正则化及其手工实现

一、p范数及实现

对于线性模型 Y = X W + b Y=XW+b Y=XW+b,其中 X ∈ R n × d X\in R^{n \times d} XRn×d n n n为样本数, d d d为每个样本的特征维度, W ∈ R d × 1 W \in R^{d \times 1} WRd×1 Y ∈ R n × 1 Y \in R^{n \times 1} YRn×1。可以使用权重向量 W W W的某个范数来衡量该模型的复杂度。 W = ( w 1 , w 2 , . . . , w d ) W =(w_1,w_2,...,w_d) W=(w1,w2,...,wd)
1-范数: ∣ ∣ W ∣ ∣ 1 \vert\vert W\vert\vert_1 W1= ∣ w 1 ∣ \vert w_1\vert w1+ ∣ w 2 ∣ \vert w_2\vert w2 +…+ ∣ w d ∣ \vert w_d\vert wd
2-范数: ∣ ∣ W ∣ ∣ 2 \vert\vert W\vert\vert_2 W2= ∣ w 1 ∣ 2 + ∣ w 2 ∣ 2 + . . . + ∣ w d ∣ 2 \sqrt{\vert w_1\vert^2+\vert w_2\vert^2+...+\vert w_d\vert^2} w12+w22+...+wd2
p-范数: ∣ ∣ W ∣ ∣ p \vert\vert W\vert\vert_p Wp= ( ∣ w 1 ∣ p + ∣ w 2 ∣ p + . . . + ∣ w d ∣ p ) 1 p (\vert w_1\vert^p+\vert w_2\vert^p+...+\vert w_d\vert^p)^{\frac{1}{p}} (w1p+w2p+...+wdp)p1

def my_pnorm(w, norm_size):
    w_abs = abs(w)
    w_norm = w_abs ** norm_size
    ans = w_norm.sum() ** (1/norm_size)
    return ans

fea_dim = 20
norm_size = 2
w = torch.normal(0, 1, size=(fea_dim, 1), requires_grad=True)

my_ans = my_pnorm(w, norm_size)
torch_ans = torch.norm(w, p=norm_size) # pytorch求p范数

二、正则化

L1正则化(Lasso regularization):损失函数使用1-范数作为惩罚项。 l o s s = l o s s ( x w + b , y ) + λ ∣ ∣ w ∣ ∣ 1 = l o s s ( x w + b , y ) + λ ∑ i d ∣ w i ∣ loss=loss(xw+b,y)+\lambda \vert\vert w\vert\vert_1=loss(xw+b,y)+\lambda\sum_i^d\vert w_i\vert loss=loss(xw+b,y)+λw1=loss(xw+b,y)+λidwi
L2正则化(Ridge regularization):损失函数使用2-范数的平方项 ∣ ∣ w ∣ ∣ 2 2 \vert\vert w\vert\vert_2^2 w22作为惩罚项(分母为2方便求导消去该系数)。 l o s s = l o s s ( x w + b , y ) + λ 2 ∣ ∣ w ∣ ∣ 2 2 = l o s s ( x w + b , y ) + λ 2 ∑ i d ∣ w i ∣ 2 loss=loss(xw+b,y)+\frac {\lambda}{2} \vert\vert w\vert\vert_2^2=loss(xw+b,y)+\frac {\lambda}{2}\sum_i^d\vert w_i\vert^2 loss=loss(xw+b,y)+2λw22=loss(xw+b,y)+2λidwi2

三、线性回归模型中正则化手工实现

拟合下述公式,其中特征维度 d d d=200: y = 0.05 + ∑ i = 1 d 0.01 x i + n o i s e w h e r e n o i s e ∈ N ( 0 , 0. 1 2 ) y=0.05+\sum_{i=1}^{d}0.01x_i+noise\quad where \quad noise \in N(0,0.1^2) y=0.05+i=1d0.01xi+noisewherenoiseN(0,0.12)

import torch
import random

def generate_data(w, num_examples, dim):
    X = torch.normal(0, 1, (num_examples, dim))
    labels = torch.matmul(X, w) + 0.05
    labels += torch.normal(0, 0.01, labels.shape)
    return X, labels

def data_iterater(X, y, batch_size, fea_dim):
    num = len(X)
    indices = list(range(num))
    random.shuffle(indices)  # 将顺序打乱
    batch_X = torch.zeros([num//batch_size, batch_size, fea_dim])
    batch_y = torch.zeros([num//batch_size, batch_size, 1])
    for id, i in enumerate(range(0, num, batch_size)):
        batch_indices = torch.tensor(indices[i: min(i + batch_size, num)])
        batch_X[id,:,:] = (X[batch_indices])
        batch_y[id,:,:] = y[batch_indices]
    return batch_X, batch_y

def init_params(fea_dim):
    w = torch.normal(0, 1, size=(fea_dim , 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)
    return [w, b]

def l2_penalty(w):
    return torch.sum(w.pow(2)) / 2

def Linear_Model(X, w, b):
    return torch.matmul(X, w) + b

def loss_sq(predict, y):
    return (predict - y) ** 2 / 2

def sgd(params, lr, batch_size):
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

def train(lambd, lr, batch_size, epochs):
    [w, b] = init_params(fea_dim)
    batch_X, batch_y = data_iterater(train_features, train_labels, batch_size, fea_dim)
    for epoch in range(epochs):
        for (X, y) in zip(batch_X, batch_y):
            predict = Linear_Model(X, w, b)
            with torch.enable_grad():
                loss_batch = loss_sq(predict, y) + lambd * l2_penalty(w)
            loss_batch.sum().backward()
            sgd([w, b], lr, batch_size)
        with torch.no_grad():
            train_l = loss_sq(Linear_Model(train_features, w, b), train_labels)
            test_l = loss_sq(Linear_Model(test_features, w, b), test_labels)
            print(f'epoch {epoch + 1}, train_loss {float(train_l.mean()):f}, test_loss {float(test_l.mean()):f}')
    print('w的L2范数是:', torch.norm(w).item())
    print('w的前5维: ', w[0:5])
    print('b:', b)


lambd = 0
batch_size = 50
lr = 0.03
epochs = 20
fea_dim = 200
num_examples = 1000
true_w = 0.01 * torch.ones((fea_dim,1))
train_features, train_labels = generate_data(true_w,  num_examples, fea_dim)
test_features, test_labels = generate_data(true_w,  num_examples, fea_dim)

train(lambd, lr, batch_size, epochs)

lambd = 0:

epoch 1, train_loss 24.436758, test_loss 27.161360
epoch 2, train_loss 7.634212, test_loss 10.278726
epoch 3, train_loss 2.866694, test_loss 4.450128
epoch 4, train_loss 1.214711, test_loss 2.087337
epoch 5, train_loss 0.556567, test_loss 1.030553
epoch 6, train_loss 0.268969, test_loss 0.527635
epoch 7, train_loss 0.135160, test_loss 0.277830
epoch 8, train_loss 0.070024, test_loss 0.149674
epoch 9, train_loss 0.037195, test_loss 0.082194
epoch 10, train_loss 0.020178, test_loss 0.045883
epoch 11, train_loss 0.011149, test_loss 0.025981
epoch 12, train_loss 0.006262, test_loss 0.014898
epoch 13, train_loss 0.003571, test_loss 0.008642
epoch 14, train_loss 0.002068, test_loss 0.005068
epoch 15, train_loss 0.001217, test_loss 0.003006
epoch 16, train_loss 0.000730, test_loss 0.001805
epoch 17, train_loss 0.000448, test_loss 0.001101
epoch 18, train_loss 0.000284, test_loss 0.000685
epoch 19, train_loss 0.000188, test_loss 0.000438
epoch 20, train_loss 0.000130, test_loss 0.000291
w的L2范数是: 0.141380175948143
w的前5维:  tensor([[0.0096],
        [0.0095],
        [0.0096],
        [0.0097],
        [0.0105]], grad_fn=<SliceBackward>)
b: tensor([0.0515], requires_grad=True)

lambd = 3:

epoch 1, train_loss 0.462465, test_loss 0.633717
epoch 2, train_loss 0.013955, test_loss 0.019294
epoch 3, train_loss 0.007136, test_loss 0.008692
epoch 4, train_loss 0.005868, test_loss 0.007064
epoch 5, train_loss 0.005469, test_loss 0.006553
epoch 6, train_loss 0.005339, test_loss 0.006377
epoch 7, train_loss 0.005294, test_loss 0.006311
epoch 8, train_loss 0.005277, test_loss 0.006284
epoch 9, train_loss 0.005271, test_loss 0.006272
epoch 10, train_loss 0.005268, test_loss 0.006267
epoch 11, train_loss 0.005266, test_loss 0.006264
epoch 12, train_loss 0.005266, test_loss 0.006262
epoch 13, train_loss 0.005265, test_loss 0.006262
epoch 14, train_loss 0.005265, test_loss 0.006261
epoch 15, train_loss 0.005265, test_loss 0.006261
epoch 16, train_loss 0.005265, test_loss 0.006261
epoch 17, train_loss 0.005265, test_loss 0.006261
epoch 18, train_loss 0.005265, test_loss 0.006261
epoch 19, train_loss 0.005265, test_loss 0.006261
epoch 20, train_loss 0.005265, test_loss 0.006261
w的L2范数是: 0.03571242466568947
w的前5维:  tensor([[0.0020],
        [0.0027],
        [0.0024],
        [0.0038],
        [0.0030]], grad_fn=<SliceBackward>)
b: tensor([0.0463], requires_grad=True)

你可能感兴趣的:(深度学习理论基础)