最近需要把眼底照进行左右眼的分类,参考了网上比较多的文章,比较多的是蚂蚁和蜜蜂分类的,跟我的需求比较类似,参考着就弄了一个,vgg太大了不考虑,尝试了resnet18和inceptionV3两种模型,效果都非常好。以下我就只写inceptionV3的模型代码。
两种迁移方法,我是用的Finetuning the convnet,也就是不冻结前面的层权重,冻结的话,准确率最高只达到80%,
眼底照总共大概快1000张吧,数据量不大,不过做这种简单的训练也够了,留了100张做val,其他全部扔去train了,代码直接放上,基本都大同小异,只不过我也用到了垂直翻转和调色之类的,在这里需要注意,千万不能水平翻转,因为识别左右眼,你水平翻转的话,就等着准确率只有50%吧
data_transforms = {
'train': transforms.Compose([
transforms.Resize(342),
transforms.CenterCrop(299),
transforms.RandomVerticalFlip(),
transforms.ColorJitter(brightness=0.5, contrast=0.5, hue=0.5),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(342),
transforms.CenterCrop(299),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
data_transforms[x]) for x in ['train', 'val']}
# wrap your data and label into Tensor
# batch我设置为128,如果你的显存不足,可以考虑缩小到64或者32
dataloders = {x: torch.utils.data.DataLoader(image_datasets[x],
batch_size=128,
shuffle=True,
num_workers=4) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
这里想吐槽下,在网上的博客遇到太多的坑,也不知道是不是pytorch版本不一的问题,具体会在下面说明
设置用GPU,然后pretrained记得打开,不然不会调用权值,param.requires_grad = False则是允许修改之前的权值
use_gpu = torch.cuda.is_available()
# get model and replace the original fc layer with your fc layer
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
model_ft = models.inception_v3(pretrained=True)
num_aux_ftrs = model_ft.AuxLogits.fc.in_features
num_ftrs = model_ft.fc.in_features
for param in model_ft.parameters():
param.requires_grad = False
识别左右眼,输出设置为2,很神奇的是学习率为0.001的时候,不用两次epoch,val的准确率就可以去到1,而设置为0.0001时,即使训练再多的epoch,始终只到0.99
model_ft.AuxLogits.fc = nn.Linear(num_aux_ftrs, 2)
model_ft.fc = nn.Linear(num_ftrs, 2)
# model_ft.aux_logits = False
model_ft = model_ft.cuda(device=device)
# define loss function
criterion = nn.CrossEntropyLoss().cuda(device=device)
# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
model_ft = train_model(model=model_ft,
criterion=criterion,
optimizer=optimizer_ft,
scheduler=exp_lr_scheduler,
num_epochs=25)
以下是模型训练,只贴了关键的部分,inceprionV3有个aux_logits,train的时候会返回,val的时候不会,所以设置loss的时候要注意区分你的代码走到哪一步
# get the inputs
inputs, labels = data
inputs = Variable(inputs.cuda(device=device))
labels = Variable(labels.cuda(device=device))
# zero the parameter gradients
optimizer.zero_grad()
# forward
with torch.set_grad_enabled(phase == 'train'):
if phase == 'train':
outputs, aux_outputs = model(inputs)
loss1 = criterion(outputs, labels)
loss2 = criterion(aux_outputs, labels)
loss = loss1 + 0.4 * loss2
else:
outputs = model(inputs)
loss = criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data).to(torch.float32)
# running_corrects += torch.sum(preds == labels.data)
我的模型训练了460分钟,其实到了230分钟就得到了最优解,拿去实际测试了很多次,准确率都非常可观
后来我又用resnet18训练了一个模型,其实这个用resnet就足够了,占用的显存更低,而且因为层数没那么深,速度比inception也有所提升。现在在考虑自己构建模型好了,也许不用resnet,简单的几层CNN,都完全够用,不过也要等有空再试了