Author :Horizon Max
✨ 编程技巧篇:各种操作小结
机器视觉篇:会变魔术 OpenCV
深度学习篇:简单入门 PyTorch
神经网络篇:经典网络模型
算法篇:再忙也别忘了 LeetCode
视频链接:Lecture 08 Dataset_and_Dataloader
文档资料:
//Here is the link:
课件链接:https://pan.baidu.com/s/1vZ27gKp8Pl-qICn_p2PaSw
提取码:cxe4
几个常用名词:
Epoch: 所有的数据输入网络后完成一次前向传播和反向传播的过程;
Batch-Size: 每一次送到网络当中的数据个数;
Iteration: 完成一次Epoch所需要训练的次数 = 样本总数 / Batch-Size
利用 Dataloader 读取数据 基本步骤 :
1)创建 Dataset 对象
2)将 Dataset 对象作为参数传递到 Dataloader 中
结合以下代码进行阐述 :
# Here is the code :
import torch
import numpy as np
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
# 1 prepare dataset
class DiabetesDataset(Dataset): # 创建 Dataset 对象
def __init__(self, filepath):
xy = np.loadtxt(filepath, delimiter=',', dtype=np.float32)
self.len = xy.shape[0] # shape(多少行,多少列)
self.x_data = torch.from_numpy(xy[:, :-1])
self.y_data = torch.from_numpy(xy[:, [-1]])
def __getitem__(self, index):
return self.x_data[index], self.y_data[index]
def __len__(self):
return self.len
dataset = DiabetesDataset('diabetes.csv') # 将 Dataset 对象作为参数传递到 Dataloader 中
train_loader = DataLoader(dataset=dataset, batch_size=32, shuffle=True, num_workers=2) #num_workers 多线程
# 2 design model using class
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear1 = torch.nn.Linear(8, 6)
self.linear2 = torch.nn.Linear(6, 4)
self.linear3 = torch.nn.Linear(4, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
x = self.sigmoid(self.linear1(x))
x = self.sigmoid(self.linear2(x))
x = self.sigmoid(self.linear3(x))
return x
model = Model()
# 3 construct loss and optimizer
criterion = torch.nn.BCELoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# 4 training cycle (forward backward update)
if __name__ == '__main__':
for epoch in range(100):
for i, data in enumerate(train_loader, 0): # train_loader 是先shuffle后mini_batch
inputs, labels = data
y_pred = model(inputs)
loss = criterion(y_pred, labels)
print(epoch, i, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
函数链接: tensor.from_numpy()
在这里如果加上多线程 num_workers=2 正常都会会报错;
解决方法: 程序 都放在 if name == ‘main’: 里面(如上代码所示)
100个Epoch可以看到模型还未收敛(网络模型结构的问题)
数据集下载: diabetes.csv.gz
提取码:6666
PyTorch 官方文档: PyTorch Documentation
PyTorch 中文手册: PyTorch Handbook
《PyTorch深度学习实践》系列链接:
Lecture01 Overview
Lecture02 Linear_Model
Lecture03 Gradient_Descent
Lecture04 Back_Propagation
Lecture05 Linear_Regression_with_PyTorch
Lecture06 Logistic_Regression
Lecture07 Multiple_Dimension_Input
Lecture08 Dataset_and_Dataloader
Lecture09 Softmax_Classifier
Lecture10 Basic_CNN
Lecture11 Advanced_CNN
Lecture12 Basic_RNN
Lecture13 RNN_Classifier