8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强

文章目录

  • 前言
  • 一、过拟合&欠拟合
  • 二、Train-Val-Test 划分
  • 三、Regularization
    • 1、L1-regularization
    • 2、L2-regularization
  • 四、动量与学习衰减率
  • 五、Early stop & Dropout
  • 六、卷积神经网络
  • 七、Down/up sample
    • 1、Max pooling & Avg pooling
    • 2、F.interpolate
    • 3、ReLU
  • 八、Batch Normalization
  • 九、经典卷积网络
  • 十、nn.Module
    • 1、Container
    • 2、.parameters
    • 3、modules
    • 4、to(device)
    • 5、save and load
    • 6、train / test
  • 十一、数据增强


前言

本文为8月3日Pytorch笔记,分为十一个章节:

  • 过拟合&欠拟合;
  • Train-Val-Test 划分;
  • Regularization:L1-regularization、L2-regularization
  • 动量与学习衰减率;
  • Early stop & Dropout;
  • 卷积神经网络;
  • Down/up sample:Max pooling & Avg pooling、F.interpolate、ReLU;
  • Batch Normalization;
  • 经典卷积网络;
  • nn.Module:Container、.parameters、modules、to(device)、save and load、train / test;
  • 数据增强。

一、过拟合&欠拟合

  • Underfitting:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第1张图片
  • Overfitting:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第2张图片

二、Train-Val-Test 划分

  • Train Set & Test Set:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第3张图片
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081))
                   ])),
    batch_size=batch_size, shuffle=True)

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])),
    batch_size=batch_size, shuffle=True)
  • test while train:
for epoch in range(epochs):

    for batch_idx, (data, target) in enumerate(train_loader):
        data = data.view(-1, 28*28)
        data, target = data.to(device), target.cuda()

        logits = net(data)
        loss = criteon(logits, target)

        optimizer.zero_grad()
        loss.backward()
        # print(w1.grad.norm(), w2.grad.norm())
        optimizer.step()

        if batch_idx % 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))
  • Train Set & Val Set & Test Set:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第4张图片
print('train: ', len(train_db), 'test: ', len(test_db))
train_db, val_db = torch.utils.data.random_dplit(train_db, [50000, 10000])

print('db1: ', len(train_db), 'db2: ', len(val_db))
train_loader = torch.utils.data.DataLoader(
    train_db,
    batch_size = batch_size, shuffle=True)

val_loader = torch.utils.data.DataLoader(
    val_db,
    batch_size = batch_size, shuffle=True)

三、Regularization

1、L1-regularization

J ( θ ) = − 1 m ∑ i = 1 m [ y i l n y ^ i + ( 1 − y i ) l n ( 1 − y ^ i ) ] + λ ∑ i = 1 n ∣ θ i ∣ J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} [y_i ln\hat{y}_i + (1-y_i)ln(1 - \hat{y}_i) ] + \lambda \sum_{i=1}^{n} |\theta _i| J(θ)=m1i=1m[yilny^i+(1yi)ln(1y^i)]+λi=1nθi

regularization = 0
for param in model.parameters():
    regularization_loss += torch.sum(torch.abs(param))
    
classify_loss = criteon(logits, target)
loss = classify_loss + 0.01 * regularization_loss

optimizer.zero_grad()
loss.backward()
optimizer.step()

2、L2-regularization

J ( W ; X , y ) + 1 2 λ ⋅ ∣ ∣ W ∣ ∣ 2 J(W; X, y) + \frac{1}{2}\lambda \cdot ||W||^2 J(W;X,y)+21λ∣∣W2

device = torch.device('cuda:0')
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rates, weight_decay=0.01)
criteon = nn.CrossEntropyLoss().to(device)

四、动量与学习衰减率

  • Momentum:
    w k + 1 = w k − α ▽ f ( w k ) z k + 1 = β z k + ▽ f ( w k ) w k + 1 = w k − α z k + 1 w^{k+1} = w^k - \alpha \bigtriangledown f(w^k)\\ z^{k+1} = \beta z^k + \bigtriangledown f(w^k)\\ w^{k+1} = w^k - \alpha z^{k+1} wk+1=wkαf(wk)zk+1=βzk+f(wk)wk+1=wkαzk+1
optimizer = torch.optim.SGD(model.parameters(), args.lr,
                                   momentum=args.momentum,
                                   weight_decay=args.weight_decay)
scheduler = ReduceLROnPlateau(optimizer, 'min')

for epoch in xrange(args.start_epoch, args.epochs):
    train(train_loader, model, criterion, optimizer, epoch)
    result_avg, loss_val = validate(val_loader, model, criterion, epoch)
    scheduler.step(loss_val)
  • Learning rate decay:
# Assuming optimizer uses lr=0.05 for all groups
# lr = 0.05			if epch < 30
# lr = 0.005		if 30 <= epoch <60
# lr = 0.0005		if 60 <=epoch < 90
# ……
scheduler = StepLR(optimizer, step_size=30, gamma=0.1):
	scheduler.step()
	train(……)
	validate(……)

五、Early stop & Dropout

  • Early stop:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第5张图片

    • How-to:
      1. Validation set to select parameters;
      2. Monitor validation performance;
      3. Stop at the highest val perf.
  • Dropout:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第6张图片

net_dropped = torch.nn.Sequential(
    torch.nn.Linear(784, 200),
    torch.nn.Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(200, 200),
    torch.nn.Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(200, 10)
)

六、卷积神经网络

8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第7张图片
上图:

  • Kernel size = 3 × 3;
  • Stride = 1;
  • Padding = 1;
  • Input: ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) (N,Cin,Hin,Win);
  • Output: ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout).

Multi-Kernels:
8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第8张图片
上图:

  • x: [ b , 3 , 28 , 28 ] [b, 3, 28, 28] [b,3,28,28];
  • one k: [ 3 , 3 , 3 ] [3, 3, 3] [3,3,3];
  • multi-k: [ 16 , 3 , 3 , 3 ] [16, 3, 3, 3] [16,3,3,3];
  • bias: [ 16 ] [16] [16];
  • out: [ b , 16 , 28 , 28 ] [b, 16, 28, 28] [b,16,28,28]

nn.Conv2d:

layer = nn.Conv2d(1, 3, kernel_size=3, stride=1, padding=0)

x = torch.rand(1, 1, 28, 28)
out = layer.forward(x)

out.shape
>>> torch.Size([1, 3, 26, 26])

layer = nn.Conv2d(1, 3, kernel_size=3, stride=2, padding=1)
out = layer.forward(x)

out.shape
>>> torch.Size([1, 3, 14, 14])

out = layer(x)
out.shape
>>> torch.Size([1, 3, 14, 14])
w = torch.rand(16, 3, 5, 5)
b = torch.rand(16)
x = torch.rand(1, 3, 28, 28)

out = F.conv2d(x, w, b, stride=1, padding=1)
out.shape
>>> torch.Size([1, 16, 26, 26])

out = F.conv2d(x, w, b, stride=2, padding=2)
out.shape
>>> torch.Size([1, 16, 14, 14])

七、Down/up sample

1、Max pooling & Avg pooling

8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第9张图片
8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第10张图片

x.shape
>>> torch.Size([1, 16, 14, 14])

layer = nn.MaxPool2d(2, stride=2)
out = layer(x)

out.shape
>>> torch.Size([1, 16, 7, 7])

out = F.avg_pool2d(x, 2, stride=2)
out.shape
>>> torch.Size([1, 16, 7, 7])

2、F.interpolate

8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第11张图片

x = out
out = F.interpolate(x, scale_factor=2, mode='nearest')

out.shape
>>> torch.Size([1, 16, 14, 14])

out = F.interpolate(x, scale_factor=3, mode='nearest')
out.shape
>>> torch.Size([1, 16, 21, 21])

3、ReLU

x.shape
>>> torch.Size([1, 16, 7, 7])

layer = nn.ReLU(inplace=True)
out = layer(x)
out.shape
>>> torch.Size([1, 16, 7, 7])

八、Batch Normalization

8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第12张图片
8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第13张图片
8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第14张图片
z ~ i = z i − μ σ z ^ i = γ ⊙ z ~ i + β \tilde{z}^i = \frac{z^i - \mu}{\sigma}\\ \hat{z}^i = \gamma \odot \tilde{z}^i + \beta z~i=σziμz^i=γz~i+β

x = torch.rand(100, 16, 784)
layer = nn.BatchNorm1d(16)
out = layer(x)

layer.running_mean
>>> tensor([0.0501, 0.0501, 0.0501, 0.0501, 0.0499, 0.0500, 0.0501, 0.0501, 0.0499, 0.0502, 0.0500, 0.0501, 0.0500, 0.0498, 0.0500, 0.0501])

layer.running_var
>>> tensor([0.9084, 0.9083, 0.9083, 0.9084, 0.9083, 0.9083, 0.9083, 0.9083, 0.9083, 0.9083, 0.9084, 0.9083, 0.9083, 0.9084, 0.9083, 0.9084])
x.shape
>>> torch.Size([1, 16, 7, 7])

layer = nn.BatchNorm2d(16)
out = layer(x)
out.shape
>>> torch.Size([1, 16, 7, 7])

layer.weight
Parameter containing:
>>> tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True)

layer.weight.shape
>>> torch.Size([16])

layer.bias.shape
>>> torch.Size([16])

vars(layer)
>>> {'training': True, '_parameters': OrderedDict([('weight', Parameter containing:
>>> tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       requires_grad=True)), ('bias', Parameter containing:
>>> tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       requires_grad=True))]), '_buffers': OrderedDict([('running_mean', tensor([0.0443, 0.0511, 0.0542, 0.0462, 0.0566, 0.0538, 0.0524, 0.0507, 0.0521,
        0.0516, 0.0570, 0.0474, 0.0460, 0.0429, 0.0508, 0.0450])), ('running_var', tensor([0.9069, 0.9097, 0.9070, 0.9081, 0.9086, 0.9102, 0.9069, 0.9080, 0.9093,
        0.9101, 0.9081, 0.9092, 0.9083, 0.9079, 0.9072, 0.9071])), ('num_batches_tracked', tensor(1))]), '_non_persistent_buffers_set': set(), '_backward_hooks': OrderedDict(), '_is_full_backward_hook': None, '_forward_hooks': OrderedDict(), '_forward_pre_hooks': OrderedDict(), '_state_dict_hooks': OrderedDict(), '_load_state_dict_pre_hooks': OrderedDict(), '_load_state_dict_post_hooks': OrderedDict(), '_modules': OrderedDict(), 'num_features': 16, 'eps': 1e-05, 'momentum': 0.1, 'affine': True, 'track_running_stats': True}

九、经典卷积网络

  1. LeNet-5
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第15张图片

  2. AlexNet
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第16张图片

  3. VGG
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第17张图片

  4. GoogLeNet
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第18张图片

  5. ResNet
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第19张图片

class ResBlk(nn.Module):
    def __init__(self, ch_in, ch_out):
        self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(ch_out)
        self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(ch_out)
        
        self.extra = nn.Sequential()
        if ch_out != ch_in:
            # [b, ch_in, h, w] ==> [b, ch_out, h, w]
            self.extra = nn.Sequential(
                nn.Conv2d(ch_in, ch_out, kernel_size=1, stride=1),
                nn.BatchNorm2d(ch_out)
            )
            
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out = self.extra(x) + out
        return 

十、nn.Module

1、Container

self.net = nn.Sequential(
    nn.Conv2d(1, 32, 5, 1, 1)
    nn.MaxPool2d(2, 2),
    nn.ReLU(True),
    nn.BatchNorm2d(32),
    
    nn.Conv2d(32, 64, 3, 1, 1),
    nn.ReLU(True),
    nn.BatchNorm2d(64),
    
    nn.Conv2d(64, 64, 3, 1, 1),
    nn.MaxPool2d(2, 2),
    nn.ReLU(True),
    nn.BatchNorm2d(64),
    
    nn.Conv2d(64, 128, 3, 1, 1),
    nn.ReLU(True),
    nn.BatchNorm2d(128)
)

2、.parameters

net = nn.Sequential(nn.Linear(4, 2), nn.Linear(2, 2))
list(net.parameters())[0].shape
torch.Size([2, 4])

3、modules

class BasicNet(nn.Module):
    def __init__(self):
        super(BasicNet, self).__init__()
        self.net = nn.Linear(4, 3)

    def forward(self, x):
        return self.net(x)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.net = nn.Sequential(BasicNet(),
                                 nn.ReLU(),
                                 nn.Linear(3, 2))

    def forward(self, x):
        return self.net(x)

4、to(device)

device = torch.device('cuda')
net = Net()
net.to(device)

5、save and load

device = torch.device('cuda')
net = Net()
net.to(device)

net.load_state_dect(torch.load('ckpt.mdl'))

# train…
torch.save(net.state_dict(), 'ckpt.mdl')

6、train / test

device = torch.device('cuda')
net = Net()
net.to(device)

net.load_state_dect(torch.load('ckpt.mdl'))

# train…
torch.save(net.state_dict(), 'ckpt.mdl')

# test
net.eval()

十一、数据增强

  • Flip:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第20张图片
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.RandomHorizontalFlip(),
                       transforms.RandomVerticalFlip(),
                       transforms.ToTensor(),
                   ])),
    batch_size=batch_size, shuffle=True
)
  • Rotate:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第21张图片
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.RandomHorizontalFlip(),
                       transforms.RandomVerticalFlip(),
                       transforms.RandomRotation([90, 180, 270])
                       transforms.ToTensor(),
                   ])),
    batch_size=batch_size, shuffle=True
)
  • Scale:
    8月3日Pytorch笔记——Regularization、卷积神经网络、数据增强_第22张图片
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.RandomHorizontalFlip(),
                       transforms.RandomVerticalFlip(),
                       transforms.RandomRotation([90, 180, 270])
                       transforms.Resize([32, 32])
                       transforms.ToTensor(),
                   ])),
    batch_size=batch_size, shuffle=True
)
  • Crop part:
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.RandomHorizontalFlip(),
                       transforms.RandomVerticalFlip(),
                       transforms.RandomRotation([90, 180, 270])
                       transforms.Resize([32, 32],
                       transforms.RandomCrop([28, 28])
                       transforms.ToTensor(),
                   ])),
    batch_size=batch_size, shuffle=True
)

你可能感兴趣的:(pytorch,cnn,深度学习)