计算机视觉比赛有哪些,Kaggle计算机视觉入门比赛(pytorch)

最近在kaggle上找比赛,发现了一个图像入门比赛Digit Recognizer,你对R或Python和机器学习基础有一些经验,但你对计算机视觉还不熟悉。这个比赛旨在帮助大家熟悉计算机视觉。

比赛页面

接下来我会做一个比较完整的demo来完成这个比赛:

1.数据准备

在比赛页面的data栏可以下载到三份数据

1.sample_submission.csv 需要提交的结果文件示例

2.test.csv 测试数据

3.train.csv 训练数据

数据下载页面

我们将数据下载下来,假设你当前目录路径为/digit_recognizer, 将数据下载到/digit_recognizer/data/ 目录下。

2.查看数据

import pandas as pd

train_data = pd.read_csv('/digit_recognizer/data/train.csv')

test_data = pd.read_csv('/digit_recognizer/data/test.csv')

print(f'训练数据shape: {train_data.shape}')

print(f'测试数据shape: {test_data.shape}')

[out]:

训练数据shape: (42000, 785)

测试数据shape: (28000, 784)

可以看到测试集合比训练集少一维,因为训练数据的第0列是类标签(0-9),

手写体数据实际上是一张28*28的矩阵,这里把这个矩阵平铺开了变成784维度的数据

训练集数据

取一张图片看看

import matplotlib.pyplot as plt

one_img = test_data.iloc[0,:].values.reshape(28,28)

plt.imshow(one_img, cmap="Greys")

手写体图片

3.建模

接下来会有一个比较完整的pytorch建模过程,因为很简单,这里基本不需要特征工。

3.1 引入必要的包,都是一些比较常规的包

import torch

from torch import nn

from torch import optim

import torch.nn.functional as F

from torch.optim.lr_scheduler import StepLR

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from torchvision import transforms

from torch.utils.data import Dataset, DataLoader

3.2 加载数据

# reshape函数

class ReshapeTransform:

def __init__(self, new_size, minmax=None):

self.new_size = new_size

self.minmax = minmax

def __call__(self, img):

if self.minmax:

img = img/self.minmax # 这里需要缩放到0-1,不然transforms.Normalize会报错

img = torch.from_numpy(img)

return torch.reshape(img, self.new_size)

# 预处理Pipeline

transform = transforms.Compose([

ReshapeTransform((-1,28,28), 255), # 一维向量变为28*28图片并且缩放(0-255)到0-1

transforms.Normalize((0.1307,), (0.3081,)) # 均值方差标准化, (0.1307,), (0.3081,)是一个经验值不必纠结

# Dataset类,配合DataLoader使用

class myDataset(Dataset):

def __init__(self, path, transform=None, is_train=True, seed=777):

"""

:param path: 文件路径

:param transform: 数据预处理

:param train: 是否是训练集

"""

self.data = pd.read_csv(path) # 读取数据

# 一般来说训练集会分为训练集和验证集,这里拆分比例为8: 2

if is_train:

self.data, _ = train_test_split(self.data, train_size=0.8, random_state=seed)

else:

_, self.data = train_test_split(self.data, train_size=0.8, random_state=seed)

self.transform = transform # 数据转化器

self.is_train = is_train

def __len__(self):

# 返回data长度

return len(self.data)

def __getitem__(self, idx):

# 根据index返回一行

data, lab = self.data.iloc[idx, 1:].values, self.data.iloc[idx, 0]

if self.transform:

data = self.transform(data)

return data, lab

])

# 加载训练集

train_data = myDataset('digit_recognizer/train.csv', transform, True)

train = DataLoader(train_data, batch_size=64, shuffle=True, num_workers=4)

vail_data = myDataset('digit_recognizer/train.csv', transform, False)

vail = DataLoader(vail_data, batch_size=64, shuffle=True, num_workers=4)

# 加载测试集

test_data = pd.read_csv('digit_recognizer/test.csv')

test_data = transform(test_data.values)

到此数据已经加载完毕,一般来说推荐使用Dataset, DataLoader类配合transforms来封装数据,transforms中转换函数可以自己定义,定义方法如ReshapeTransform就可以了

3.3 定义网络

为了简单起见,这里定义一个两层卷积,两层全连接的网络

# 初始化权重

def _weight_init(m):

if isinstance(m, nn.Linear):

nn.init.xavier_uniform_(m.weight)

nn.init.constant_(m.bias, 0)

elif isinstance(m, nn.Conv2d):

nn.init.xavier_uniform_(m.weight)

elif isinstance(m, nn.BatchNorm1d):

nn.init.constant_(m.weight, 1)

nn.init.constant_(m.bias, 0)

# 建立网络

class Net(nn.Module):

def __init__(self):

super(Net, self).__init__()

self.conv1 = nn.Conv2d(1, 10, kernel_size=3)

self.conv2 = nn.Conv2d(10, 20, kernel_size=3)

self.drop2d = nn.Dropout2d(p=0.2)

self.linr1 = nn.Linear(20*5*5, 32)

self.linr2 = nn.Linear(32, 10)

self.apply(_weight_init) # 初始化权重

# 正向传播

def forward(self, x):

x = F.relu(self.drop2d(self.conv1(x)))

x = F.max_pool2d(x, 2)

x = F.relu(self.drop2d(self.conv2(x)))

x = F.max_pool2d(x, 2)

x = x.view(-1, 20*5*5) # 卷积接全连接需要计算好卷积输出维度,将卷积输出结果平铺开

x = self.linr1(x)

x = F.dropout(x,p=0.5)

x = self.linr2(x)

return x

net = Net()

这里使用TensorBoard画出网络图,如下:

net网络

3.4 优化器和损失函数

# 优化器和损失函数

optimizer = optim.Adam(net.parameters(), lr=0.0005) # 使用Adam作为优化器

criterion = nn.CrossEntropyLoss() # 损失函数为CrossEntropyLoss,CrossEntropyLoss()=log_softmax() + NLLLoss()

scheduler = StepLR(optimizer, step_size=10, gamma=0.5) # 这里使用StepLR,每十步学习率lr衰减50%

3.5训练数据

# 转化为GPU(可选)

device = 'cuda' if torch.cuda.is_available else 'cpu'

if torch.cuda.is_available:

net = net.to(device)

criterion = criterion.to(device)

epochs = 100

loss_history = []

# 训练模型

for epoch in range(epochs):

train_loss = []

val_loss = []

with torch.set_grad_enabled(True):

net.train()

for batch, (data, target) in enumerate(train):

data = data.to(device).float()

target = target.to(device)

optimizer.zero_grad()

predict = net(data)

loss = criterion(predict, target)

loss.backward()

optimizer.step()

train_loss.append(loss.item())

scheduler.step() # 经过一个epoch,步长+1

with torch.set_grad_enabled(False):

net.eval() # 网络中有drop层,需要使用eval模式

for batch, (data, target) in enumerate(vail):

data = data.to(device).float()

target = target.to(device)

predict = net(data)

loss = criterion(predict, target)

val_loss.append(loss.item())

loss_history.append([np.mean(train_loss), np.mean(val_loss)])

print('epoch:%d train_loss: %.5f val_loss: %.5f' %(epoch+1, np.mean(train_loss), np.mean(val_loss)))

['out']:

epoch:1 train_loss: 0.96523 val_loss: 0.35177

epoch:2 train_loss: 0.37922 val_loss: 0.22583

epoch:3 train_loss: 0.28509 val_loss: 0.18644

epoch:4 train_loss: 0.24072 val_loss: 0.15961

epoch:5 train_loss: 0.20989 val_loss: 0.13630

epoch:6 train_loss: 0.19612 val_loss: 0.12432

epoch:7 train_loss: 0.17479 val_loss: 0.11251

epoch:8 train_loss: 0.16251 val_loss: 0.10917

epoch:9 train_loss: 0.15625 val_loss: 0.10470

.

.

.

现在模型训练好了我们使用TensorBoard看一下训练集损失和验证集损失,蓝色为训练集损失,红色为验证集损失

损失函数

3.6 上传结果至kaggle

net.eval()

label = net(test_data.to(device).float().unsqueeze(1))

label = torch.argmax(label, dim=1)

submission = pd.read_csv('/digit_recognizer/sample_submission.csv')

submission.Label = label.cpu().numpy()

submission.to_csv('/digit_recognizer/version.csv', index=False)

kaggle上传页面

最终可以查看我们的得分

评分结果

你可能感兴趣的:(计算机视觉比赛有哪些)