Entropy
Cross Entropy
H ( p , q ) = ∑ p ( x ) l o g q ( x ) H(p,q) = \sum p(x)log q(x) H(p,q)=∑p(x)logq(x)
H ( p , q ) = H ( p ) + D K L ( p ∣ q ) H(p,q) = H(p)+D_{KL}(p|q) H(p,q)=H(p)+DKL(p∣q)
p = q , C r o s s e n t r o p y = e n t r o p y p=q, ~~~Cross~entropy=entropy p=q, Cross entropy=entropy
H ( P , Q ) = − ( y l o g p + ( 1 − y ) l o g ( 1 − p ) ) H(P,Q)=-(ylogp+(1-y)log(1-p)) H(P,Q)=−(ylogp+(1−y)log(1−p))
对于分类什么不用MSE :
小结
l o g i t − − > s o f t m a x − − > c r o s s e n t r o p y logit-->softmax-->cross~entropy logit−−>softmax−−>cross entropy
一般不建议自己单独使用 softmax 与 cross entropy。使用 pytorch 组合的框架。
import torch
from torch.nn import functional as F
x = torch.randn(1,784)
w = torch.randn(10,784)
logits = x@w.t()
pred = F.softmax(logits, dim=1)
pred_log = torch.log(pred)
F.nll_loss(pred_log, torch.tensor([1]))
#F.cross_entropy=F.softmax+log+F.nll_loss
F.cross_entropy(logits, torch.tensor([1]))
w1, b1 = torch.randn(200, 784, requires_grad=True), torch.zeros(200,
requires_grad=True)
w2, b2 = torch.randn(200, 200, requires_grad=True), torch.zeros(200,
requires_grad=True)
w3, b3 = torch.randn(10, 200, requires_grad=True), torch.zeros(10,
requires_grad=True)
#向前传播
def forward(x):
x = x@w1.t()+b1
x = F.relu(x)
x = x@w2.t()+b2
x = F.relu(x)
x = x@w3.t()+b3
x = F.relu(x)
return x
训练 :
optimizer = optim.SGD([w1, b1, w2, b2, w3, b3], lr=learning_rate)
criteon = nn.CrossEntropyLoss()
for epoch in range(epochs):
#训练
for batch_idx, (data, target) in enumerate(train_loader):
data = data.reshape(-1, 28*28) #torch.Size([200, 784])
logits = forward(data) #torch.Size([200, 10])
loss = criteon(logits, target)
optimizer.zero_grad() #梯度清零
loss.backward() #反向回传
# print(w1.grad.norm(), w2.grad.norm())
optimizer.step() #更新梯度值
通过打印 weight.grad.norm(),得知常常陷入局部极小值。解决这个问题可以利用何
凯明的初始化方法。
torch.nn.init.kaiming_normal_(w1)
torch.nn.init.kaiming_normal_(w2)
torch.nn.init.kaiming_normal_(w3)
在上节中,我们未用 PyTorch 封装的 API 去建立神经网络,这节来使用 PyTorch
封装的 API。
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.model = nn.Sequential(
nn.Linear(784, 200),
nn.ReLU(inplace=True),
nn.Linear(200, 200),
nn.ReLU(inplace=True),
nn.Linear(200, 10),
nn.ReLU(inplace=True),
)
def forward(self, x):
x = self.model(x)
return x
PyTorch 两 种 风 格 的API
没有遇到上节遇到的初始化问题,参数未暴露给用户,拥有自己的初始化体系,一
般来说够用,否则自己必须编写相应的初始化代码。
device = torch.device(cuda)
net = MLP().to(device)
optimizer = optim.SGD(net.parameters(), lr=learning_rate)
criteon = nn.CrossEntropyLoss().to(device)
time1 = time.time()
for epoch in range(epochs):
for batch_idx, (data, target) in enumerate(train_loader):
data = data.view(-1, 28*28)
data, target = data.to(device), target.cuda()
logits = net(data)
loss = criteon(logits, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
神经网络的表达能力非 强,容易过拟合,所以要测试 。不能单一观测测Loss 的大小判断模型的好坏,还要观测测试集的准确度,确保模型的泛化能力。
test_loss = 0
correct = 0
for data, target in test_loader:
data = data.view(-1, 28 * 28)
data, target = data.to(device), target.cuda()
logits = net(data)
test_loss += criteon(logits, target).item()
pred = logits.argmax(1)
correct += pred.eq(target.data).sum()
test_loss /= len(test_loader.dataset)
通过pip install visdom等方式成功安装完之后,开启服务
python -m visdom.server
出现了Mnist图片 viz.images() 全黑故障,是由于样本 normalization 引起的,要进行反归一化才可正常显示。