Python深度学习生态的演变非常迅猛,一些强大的框架随着深度学习领域的发展逐渐被大量的人使用而流行起来。本文对一些常见的深度学习框架进行介绍,对它们的特点进行对比。
Tensorflow
Tensorflow是Google开源的深度学习框架,有着非常广泛的应用。使用C++语言开发,使用数据流图DAG的形式进行计算。图中的节点代表数学计算。
TensorFlow命令式的编程语言,是静态的。首先必须构建一个神经网络结构,然后一次又一次地使用同样的结构。如果想要改变网络结构,就必须从头开始。
其灵活的架构可以部署在一个或多个CPU\GPU服务器中,或使用单一的API应用在移动设备中。有Python和C++的接口。
由于其语言比较底层,有很多基于tensorflow的第三方库将Tensorflow的函数进行了封装。如:Keras,TFlearn,TensorFlow-slim,TensorLayer,Estimator。
其中TensorFlow-slim是Google官方给出的高层封装库。包含在TensorFlow的库中。
TFflearn是相比TF-slim更简洁的高层封装。没有集成在TensorFlow的包中,需要单独安装。其不仅对神经网络结构定义进行了封装,还对模型训练的过程进行了封装。
Keras
Keras底层可以支持TensorFlow、MXNet、Theano。和TFLearn API类似,也对模型定义、损失函数、训练过程进行了封装。整个训练过程可分为数据处理、模型定义和模型训练三个部分。可以快速地搭建深度网络,灵活地选取训练参数来进行网路训练。Keras已被直接引入到TensorFlow的核心代码库,成为官方提供的高层封装之一。
Keras API训练模型可以线定义一个Sequential类,在Sequential实例中,通过add函数添加网络层。通过compile函数 指定 优化函数、损失函数、训练过程中的监控指标。之后,Sequential实例可以通过fit函数来训练模型。
Caffe
Caffe全称为Convolutional Architecture for Fast Feature Embedding,是一个清晰而高
效的开惊深度学习框架,由伯克利视觉学中心进行维护。
和TensorFlow一样用C++编写,早期只提供C++接口,随着后来的发展才提供了python的接口。
Caffe对CNN的支持非常好,很多ImageNet比赛里面使用的网络都是用Caffe写的,这也是Caffe得以流行的原因。它的缺点是不够灵活,并且内存占用高。
定义损失函数 $loss = (\hat{y} -y)^2 = (a\cdot x+b-y)^2$, 可以看出,只要找到了正确的a值,就可以得到一个较小的loss函数值.
首先,我们采用穷举的方式查找最小的损失函数
本例题的目标就是找到一个组合适的$x, b$使得上式中的$loss$最小
import numpy as np
from matplotlib import pyplot as plt
data_x = [1, 2, 3]
data_y = [2, 4, 6]
loss_list = list()
a_list = list()
def forward(x):
return a * x
def lossFunction(x, y):
y_pred = forward(x)
loss = (y_pred - y) ** 2
return loss
def predict(x,a_):
return a_*x
if __name__ == '__main__':
for a in np.arange(0, 4, 0.1):
sum_loss = 0
for i in range(3):
sum_loss += lossFunction(data_x[i], data_y[i])
loss_list.append(sum_loss / 3)
a_list.append(a)
plt.figure()
plt.plot(a_list, loss_list)
# plt.title("")
plt.xlabel('a')
plt.ylabel('loss')
# plt.show()
min_value = min(loss_list)
index_lossMin = loss_list.index(min_value)
print(index_lossMin)
proper_a = a_list[index_lossMin]
print(proper_a)
print("Please input the desired x:")
desired_x = input()
print(f"The predict output for the linear model is {predict(float(desired_x),proper_a)}")
下面给出采用sklearn的方法来实现线性回归的效果
import numpy as np
from matplotlib import pyplot as plt
from sklearn import linear_model
import pandas as pd
lrm = linear_model.LinearRegression()
x_data = np.array([1,2,3])
y_data = np.array([2,4,6])
z_data = np.zeros([3,2])
m_data = np.zeros([3,2])
z_data[:,0] = x_data
z_data[:,1] = y_data
m_data[:,0] = x_data
m_data[:,1] = x_data
lrm.fit(m_data,z_data)
print(lrm.predict([[4,4]]))
给定训练集为 x=1, y=6.8 x=2, y=9.8 x=3, y=13.2 x=4, y=16.2 测试集 x=5, y=? '''
import numpy as np
from matplotlib import pyplot as plt
x_data = [1,2,3,4]
y_data = [6.8,9.8,13.2,16.2]
loss_list = list()
def forward(a,x,b):
return a*x+b
def lossFunction(a,x,y,b):
y_pred = forward(a,x,b)
loss = (y_pred - y)**2
return loss
a_list = list()
b_list = list()
if __name__ == '__main__':
for a in np.arange(0,6,0.1):
for b in np.arange(0,6,0.1):
sum_loss = 0
for i in range(4):
sum_loss += lossFunction(a, x_data[i], y_data[i],b)
loss_list.append(sum_loss/4)
a_list.append(a)
b_list.append(b)
plt.plot(a_list,loss_list)
plt.xlabel('a')
plt.ylabel('loss')
print(min(loss_list))
loss_min_index = loss_list.index(min(loss_list))
print(loss_min_index)
a_wanted = a_list[loss_min_index]
b_wanted = b_list[loss_min_index]
print(f'a_wanted = {a_wanted}, b_wanted ={b_wanted}')
# plt.show()
# a_wanted = a_list[loss_list.index(min(loss_list))]
# print(forward(a_wanted, 4))
print(forward(a_wanted, 5, b_wanted))
通过下面的代码可以直观的看出,当取得不同$b$的时候,得到的直线的样子。
def LinearFunction(x,a=3.2,b=3.4):
return a*x+b
def LinearFunction2(x,a=3.2,b=3.5):
return a*x+b
x_data = [1,2,3,4]
y_data = [6.8, 9.8, 13.2, 16.2]
z_data = [6,12,18,24]
n_data = np.arange(5)
m_data = np.zeros([5,1])
l_data = np.zeros([5,1])
for i in range(5):
m_data[i] = LinearFunction(n_data[i])
l_data[i] = LinearFunction2(n_data[i])
plt.scatter(x_data,y_data)
plt.plot(n_data,m_data,'r')
plt.plot(n_data,l_data,'g')
plt.show()
局部最小值与全局最小值, 我们可以认为上面的方法很有可能找到一个局部最小值. 画图说明局部最小值的问题,接下来,我们采用梯度的方法进行网络的改写
import numpy as np
from matplotlib import pyplot as plt
data_x = [1, 2, 3]
data_y = [2, 4, 6]
loss_list = list()
a_list = list()
alpha = 0.01
def forward(x):
return a * x
def lossFunction(x, y):
y_pred = forward(x)
loss = (y_pred - y) ** 2
return loss
def predict(x, a_):
return a_ * x
def gradient(a, x, y):
a = a - alpha * 2 * (a * x - y) * x
return a
if __name__ == '__main__':
a = 0
for epoch in range(1000):
# for a in np.arange(0, 4, 0.1):
sum_loss = 0
for i in range(3):
sum_loss += lossFunction(data_x[i], data_y[i])
a = gradient(a, data_x[i], data_y[i])
loss_list.append(sum_loss / 3)
a_list.append(a)
plt.subplot(211)
plt.plot(a_list)
plt.subplot(212)
plt.plot(loss_list)
plt.show()
plt.figure()
plt.plot(a_list, loss_list)
plt.xlabel('a')
plt.ylabel('loss')
plt.show()
min_value = min(loss_list)
index_lossMin = loss_list.index(min_value)
print(index_lossMin)
proper_a = a_list[index_lossMin]
print(proper_a)
print("Please input the desired x:")
desired_x = input()
print(f"The predict output for the linear model is {predict(float(desired_x), proper_a)}")
import time
import numpy as np
from matplotlib import pyplot as plt
import random
x_data = [1,2,3]
y_data = [2,4,6]
loss_list = list()
a_b_list = list()
def forward(a, x):
return a*x
def lossFunction(a,x,y):
y_pred = forward(a,x)
loss = (y - y_pred)**2
return loss
alpha = 0.1
def gradient(x, a, y):
a = a - alpha*2*x*(x*a -y)
return a
a_list = list()
b_list = list()
if __name__ == '__main__':
a = random.randint(0, 10)
for epoch in range(100):
sum_loss = 0
for i in range(3):
sum_loss += lossFunction(a, x_data[i], y_data[i])
a = gradient(x_data[i], a, y_data[i])
loss_list.append(sum_loss/3)
a_list.append(a)
b_list.append(epoch)
print(f'epoch = {epoch}, a = {a}, loss = {sum_loss/3}')
time.sleep(0.5)
# time.sleep(0.5)
# plt.plot(a_list, loss_list)
plt.plot(b_list, loss_list)
plt.show()
在pytorch中,tensor的数据类型中,一方面包含有数据的数值,还有一个数据包含有数据的梯度。 如下例子
import torch
a = torch.tensor([2, 3], requires_grad=True, dtype=torch.float)
b = torch.tensor([6, 4], requires_grad=True, dtype=torch.float)
Q = 3 * a ** 3 - b ** 2
extern_gradient = torch.tensor([1, 1])
Q.backward(gradient=extern_gradient)
print(a.grad)
print(b.grad)
我们定义$Q=3a^3 - b^2$, 则可以计算出 $\frac{\partial{Q}}{\partial{a}} = 9a^2$ 且 $\frac{\partial{Q}}{\partial{b}}=-2b$, 然后我们将$a,b$的值带入就可以计算出对应的梯度值。
import torch
x = torch.tensor(3, dtype=torch.float32, requires_grad=True)
y = torch.tensor(4, dtype=torch.float32, requires_grad=True)
b = torch.tensor(5, dtype=torch.float32, requires_grad=True)
z = x*y + b
"Z = xy"
print(z)
z.backward()
print(z.requires_grad, x.grad, y.grad, b.grad)
在pytorch中,可以通过backward()自动的反向计算梯度,对应的数据类型应该是torch中的tenser. 在上一个例子中,我们需要计算的是对应变量$a$的梯度,因此,我们对上一个例子进行修改。首先,引入pytorch的包, import torch.
定义数据类型为 a= torch.Tensor([7.0]) 且声明 a. requires_grad=True
from matplotlib import pyplot as plt
import torch
data_x = [1, 2, 3]
data_y = [2, 4, 6]
loss_list = list()
a_list = list()
alpha = 0.01
def forward(x):
return a * x
def lossFunction(x, y):
y_pred = forward(x)
loss = (y_pred - y) ** 2
return loss
if __name__ == '__main__':
a = torch.Tensor([7.0])
a.requires_grad = True
for epoch in range(1000):
# for a in np.arange(0, 4, 0.1):
sum_loss = 0
for i in range(3):
sum_loss += lossFunction(data_x[i], data_y[i])
l = lossFunction(data_x[i],data_y[i])
l.backward()
a.data = a.data - alpha*a.grad
a.grad = None
a_list.append(a.data)
# a = gradient(a, data_x[i], data_y[i])
loss_list.append(sum_loss / 3)
print(a_list)
plt.subplot(211)
plt.plot(a_list)
plt.subplot(212)
plt.plot(loss_list)
plt.show()
我们将采用pytorch标准模块来实现上面的例子。计算流程如下所示:
for input, target in dataset:
output = model(input)
loss = loss_fn(output, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
同时之前的例子可以改写为:
import torch
from matplotlib import pyplot as plt
x_data = torch.tensor([[1], [2], [3]], dtype=torch.float)
y_data = torch.tensor([[2], [4], [6]], dtype=torch.float)
class LinearExample(torch.nn.Module):
def __init__(self):
super(LinearExample, self).__init__()
self.linear = torch.nn.Linear(1, 1)
def forward(self, x):
y_pred = self.linear(x)
return y_pred
model = LinearExample()
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
epoch_list = list()
a_list = list()
if __name__ == '__main__':
for epoch in range(100):
y_hat = model(x_data)
loss = criterion(y_hat, y_data)
a_list.append(model.linear.weight.item())
epoch_list.append(epoch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(epoch_list, a_list)
plt.show()
逻辑斯特回归问题:用于处理二分类问题-->二分类表示输出或者是0, 或者是1, 输出只有两种选择。
损失函数采用交叉熵
import time
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn.functional as F
x_data = torch.tensor([[1],[2],[3]],dtype=torch.float32)
y_data = torch.tensor([[0],[0],[1]],dtype=torch.float32)
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(1,1)
def forward(self,x):
y_pred = self.linear(x)
return torch.sigmoid(y_pred)
model = Model()
criterion = torch.nn.BCELoss(size_average=False)
optim = torch.optim.SGD(model.parameters(),lr=0.01)
for epoch in range(1000):
y_pred = model(x_data)
loss = criterion(y_pred,y_data)
print(epoch, loss.item())
# time.sleep(0.1)
optim.zero_grad()
loss.backward()
optim.step()
x = np.linspace(0,10,200)
x_t = torch.tensor(x,dtype=torch.float32).view((200,1))
y_t = model(x_t)
y = y_t.data.numpy()
plt.plot(x,y)
plt.show()
下面的例子显示一个糖尿病例子的预测问题:
import numpy as np
from matplotlib import pyplot as plt
import torch
data_xy = np.loadtxt('/home/chasing/Documents/pytorchbooklit/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = torch.from_numpy(data_xy[:,:-1])
y_data = torch.from_numpy(data_xy[:,-1]).reshape(-1,1)
class LinearExample(torch.nn.Module):
def __init__(self):
super(LinearExample, self).__init__()
self.linear1 = torch.nn.Linear(8,6)
self.linear2 = torch.nn.Linear(6,4)
self.linear3 = torch.nn.Linear(4,1)
# self.linear4 = torch.nn.Linear(2,1)
self.sigmoid = torch.nn.Sigmoid()
self.relu = torch.nn.ReLU()
def forward(self,x):
x = self.relu(self.linear1(x))
x = self.relu(self.linear2(x))
x = self.linear3(x)
# x = self.linear4(x)
return self.relu(x)
# return self.sigmoid(x)
model = LinearExample()
criterion = torch.nn.BCELoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(),lr=1e-2)
loss_list = list()
if __name__ == '__main__':
for epoch in range(300):
y_pred = model(x_data)
loss = criterion(y_pred, y_data)
loss_list.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
plt.plot(loss_list)
plt.show()
采用一个batch会提升并行计算能力,提升计算的速度。采用一组数据会避免陷入鞍点(局部最优), 本历程中,我们采用dataset构造数据集。
from torch.utils.data import DataLoader
from torch.utils.data import Dataset
import numpy as np
import torch
from matplotlib import pyplot as plt
class LinearExample(torch.nn.Module):
def __init__(self):
super(LinearExample, self).__init__()
self.linear1 = torch.nn.Linear(8,6)
self.linear2 = torch.nn.Linear(6,4)
self.linear3 = torch.nn.Linear(4,1)
# self.linear4 = torch.nn.Linear(2,1)
self.sigmoid = torch.nn.Sigmoid()
self.relu = torch.nn.ReLU()
def forward(self,x):
x = self.relu(self.linear1(x))
x = self.relu(self.linear2(x))
x = self.linear3(x)
# x = self.linear4(x)
return self.relu(x)
# return self.sigmoid(x)
class DiabetesDatset(Dataset):
def __init__(self):
data_xy = np.loadtxt('/home/chasing/Documents/pytorchbooklit/diabetes.csv.gz', delimiter=',', dtype=np.float32)
self.len = data_xy.shape[0]
self.data_x = torch.from_numpy(data_xy[:,:-1])
self.data_y = torch.from_numpy(data_xy[:,-1]).reshape(-1,1)
def __getitem__(self, index):
return self.data_x[index], self.data_y[index]
def __len__(self):
return self.len
model = LinearExample()
dataset = DiabetesDatset()
train_loader = DataLoader(dataset=dataset, batch_size=32, shuffle=True, num_workers=2)
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
loss_list = list()
if __name__ == '__main__':
for epoch in range(100):
for i, data in enumerate(train_loader, 0):
inputs, labels = data
y_pred = model(inputs)
loss = criterion(y_pred, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_list.append(loss)
plt.plot(loss_list)
plt.show()
采用softmax函数进行
$$P(y=i) = \frac{e^{z_i}}{\sum_{j=0}^{k}{e^{z_j}}} \quad j=0,1,2,3...i...k$$
保证了求和后为1, 且每个值都是正数, $e^x$是一个单调增函数
同时计算损失的时候,之前的BCE已经不再适用(该问题是针对的二分类问题), 所以引入一种新的损失计算方法 CrossEntropyLoss(), 计算表达式为
$-y log\hat{y}$
import torch
criterion = torch.nn.CrossEntropyLoss()
y = torch.LongTensor([2, 0, 1])
y1 = torch.tensor([[0.1,0.2,0.9],
[1.1,0.1,0.2],
[0.2,2.1,0.1]])
y2 = torch.tensor([[0.8,0.2,0.3],
[0.2,0.3,0.5],
[0.2,0.2,0.5]])
loss1 = criterion(y1, y)
loss2 = criterion(y2, y)
print(f'loss1= {loss1}, loss2={loss2}')
手写数字识别例题:
import torch
from matplotlib import pyplot as plt
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.optim as optim
import numpy as np
batch_size = 64
batch_size_test = 100
data_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
minist_tainloader = datasets.MNIST(root='./', train=True, download=True, transform=data_transform)
minist_testloader = datasets.MNIST(root='./', train=False, download=True, transform=data_transform)
trainloader = DataLoader(minist_tainloader, batch_size=batch_size, shuffle=True)
testloader = DataLoader(minist_testloader, batch_size=batch_size_test, shuffle=False)
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear1 = torch.nn.Linear(784, 512)
self.linear2 = torch.nn.Linear(512, 256)
self.linear3 = torch.nn.Linear(256, 128)
self.linear4 = torch.nn.Linear(128, 64)
self.linear5 = torch.nn.Linear(64, 10)
self.relu = torch.nn.ReLU()
def forward(self, x):
x = x.view(-1, 784)
x = self.relu(self.linear1(x))
x = self.relu(self.linear2(x))
x = self.relu(self.linear3(x))
x = self.relu(self.linear4(x))
return self.linear5(x)
model = Model()
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.5)
loss_list = list()
def test_accuracy():
correct = 0
with torch.no_grad():
for data in testloader:
images, labels = data
pred = model(images)
total_num = 0
correct = 0
for i in range(batch_size_test):
labels_np = labels.numpy().tolist()
pred_np = pred.numpy().tolist()
total_num += 1
if labels_np[i] == pred_np[i].index(max(pred_np[i])):
correct += 1
print(f'Accuracy = {correct/total_num}, i = {i}')
if __name__ == '__main__':
for epoch in range(10):
for i, data in enumerate(trainloader, 0):
inputs, label = data
outputs = model(inputs)
optimizer.zero_grad()
loss = criterion(outputs, label)
loss_list.append(loss)
loss.backward()
optimizer.step()
print(f'[{epoch}]: loss = {loss}')
plt.plot(loss_list)
plt.show()
test_accuracy()
通过PIL识别图像
import numpy as np
from PIL import Image
a = Image.open('test.jpg')
c = a.convert('L')
c.show()
# print(c)
im = np.array(a)
im_gray = np.array(c)
print(im_gray.shape)
print(im_gray)
print(im.shape)
# print(im)
b = np.array([[[1,2,3],[2,3,3],[3,4,5]],[[2,1,2],[3,4,5],[4,5,6]]])
# print(b.shape)
# a.show()
# print(a)
卷积神经网络
首先需要完成卷积网络的维度的推断
import torch
width, height = 28, 28
in_channle = 1
batch_size = 1
inputs = torch.randn(batch_size, in_channle,
width, height)
print(inputs.shape)
conv_lay1 = torch.nn.Conv2d(in_channels=1,
out_channels=10,
kernel_size=5)
output1 = conv_lay1(inputs)
print(output1.shape)
maxpool_lay = torch.nn.MaxPool2d(kernel_size=2)
output2 = maxpool_lay(output1)
print(output2.shape)
conv_lay2 = torch.nn.Conv2d(in_channels=10,
out_channels=20,
kernel_size=5)
output3 = conv_lay2(output2)
print(output3.shape)
output4 = maxpool_lay(output3)
print(output4.shape)
output5 = output4.view(1, -1)
linear_lay = torch.nn.Linear(320, 10)
output6 = linear_lay(output5)
print(output6.shape)
下面将手写数字识别的程序修改成带有卷积操作的深度神经网络结构
import torch
from matplotlib import pyplot as plt
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.optim as optim
import numpy as np
batch_size = 64
data_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
minist_tainloader = datasets.MNIST(root='./', train=True, download=True, transform=data_transform)
minist_testloader = datasets.MNIST(root='./', train=False, download=True, transform=data_transform)
trainloader = DataLoader(minist_tainloader, batch_size=batch_size, shuffle=True)
testloader = DataLoader(minist_testloader, batch_size=batch_size, shuffle=False)
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
self.pooling = torch.nn.MaxPool2d(kernel_size=2)
self.linear = torch.nn.Linear(320, 10)
self.relu = torch.nn.ReLU()
def forward(self, x):
batch_size = x.size(0)
x = self.relu(self.pooling(self.conv1(x)))
x = self.relu(self.pooling(self.conv2(x)))
x = x.view(batch_size, -1)
x = self.linear(x)
return x
model = Model()
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.5)
loss_list = list()
def test_accuracy():
correct = 0
with torch.no_grad():
for data in testloader:
images, labels = data
pred = model(images)
total_num = 0
correct = 0
for i in range(batch_size):
labels_np = labels.numpy().tolist()
pred_np = pred.numpy().tolist()
total_num += 1
if labels_np[i] == pred_np[i].index(max(pred_np[i])):
correct += 1
print(f'Accuracy = {correct / total_num}')
if __name__ == '__main__':
for epoch in range(3):
for i, data in enumerate(trainloader, 0):
inputs, label = data
outputs = model(inputs)
optimizer.zero_grad()
loss = criterion(outputs, label)
loss_list.append(loss)
loss.backward()
optimizer.step()
print(f'[{epoch}]: loss = {loss}')
plt.plot(loss_list)
plt.show()
test_accuracy()
RNN 循环神经网络
主要用于处理序灌问题(输入数据之间存在一定的相关性)
下面给出一个学习实例, 主要完成的输入数据为"hello" -->"ohlol"
import torch
batch_size = 1
seq_len = 5
input_size = 4
hidden_size = 4
num_layer = 1
idx2char = ['e', 'h', 'l', 'o']
x_data = [1, 0, 2, 2, 3]
y_data = [3, 1, 2, 3, 2]
one_hot_lookup = [[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
x_one_hot = [one_hot_lookup[x] for x in x_data]
inputs = torch.Tensor(x_one_hot).view(seq_len, batch_size, input_size)
labels = torch.LongTensor(y_data)
class NLPModel(torch.nn.Module):
def __init__(self):
super(NLPModel, self).__init__()
self.rnn = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size,
num_layers=num_layer)
def forward(self, x):
hidden = torch.zeros(num_layer, batch_size, hidden_size)
out, _ = self.rnn(x, hidden)
return out.view(-1, hidden_size)
model = NLPModel()
criterion = torch.nn.CrossEntropyLoss()
optim = torch.optim.Adam(model.parameters(), lr=0.05)
for epoch in range(35):
outputs = model(inputs)
loss = criterion(outputs, labels)
optim.zero_grad()
loss.backward()
optim.step()
_, idex = outputs.max(dim= 1)
idx = idex.data.numpy()
print('Predicted:', ''.join([idx2char[x] for x in idx]), end='')
print(f'\t epoch={epoch}, loss={loss.item()}')