SlowFastNet(SlowFast) finetune(微调)

SlowFastNet github(最近放出来的):
https://github.com/facebookresearch/SlowFast

配置环境要求:
https://github.com/facebookresearch/SlowFast/blob/master/INSTALL.md
这里的两个包PyAv和fvcore比较不好装;
fvcore的github上推荐的是使用pip install ‘git+https://github.com/facebookresearch/fvcore’,但由于加密系统的问题,git用不了;所以只能下载下来,解压后进入文件夹使用python setup.py install指令通过编译来安装;
PyAv使用推荐的conda install av -c conda-forge出现了段错误,段错误解决具体参考我另一篇https://blog.csdn.net/weixin_42388228/article/details/102882607;同样这里也可以使用先下载下来再python setup.py install来安装,这样安装会报错误,查了下PyAv github里issue列表是因为缺少一些依赖,具体参考我另一篇https://blog.csdn.net/weixin_42388228/article/details/102817959;
这样安装就完事了

权重文件:
https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md
我下载的kinetics中的倒数第三个SLOWFAST_8x8_R50,最后两个暂时还没提供

坑1:
使用权重文件对应的yaml文件时,是用…/SlowFast-master/configs/Kinetics/SLOWFAST_8x8_R50.yaml来配置config,还需参考…/SlowFast-master/configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml文件更改第一个yaml文件。(有一处改动,kernel_size)

坑2:
slowfast的输入为一个list,list第一个元素的shape为[batch_size,3,8,224,224],第二个元素的shape为[batch_size,3,32,224,224]

坑3:
我这的实现在yaml文件中修改了gpu个数为1

import sys
sys.path.append('.../SlowFast-master/slowfast/config/')
sys.path.append('.../SlowFast-master/slowfast/models/')
sys.path.append('.../SlowFast-master/slowfast/utils/')
import slowfast.models.optimizer as optim
import slowfast.utils.checkpoint as cu
from defaults import _C
from model_builder import _MODEL_TYPES
from slowfast.models import model_builder
from slowfast.utils.c2_model_loading import get_name_convert_func
import torch
import torch.nn as nn
import os
import cv2
import numpy as np
import pickle
import yaml
torch.cuda.set_device(7)
###########################################         data preparation         ###########################################
data1,label=data4file(batch_size=32,stride=70)
data2,_=data4file(batch_size=8,stride=70)
data1=torch.from_numpy(data1).float()
data2=torch.from_numpy(data2).float()
label=torch.from_numpy(label).long()
###########################################     customized config file       ###########################################
f1=open('.../SlowFast-master/configs/Kinetics/SLOWFAST_8x8_R50.yaml')
d1=yaml.load(f1)
for i in d1.keys():
    if not isinstance(d1[i],dict):
        _C[i]=d1[i]
    else:
        for j in d1[i].keys():
            _C[i][j]=d1[i][j]
################################################     model finetune     ################################################
model=model.builder.build_model(_C)
print('Model built.')
# print(*list(model.children())[-1:])
optimizer = optim.construct_optimizer(model, _C)
cu.load_checkpoint('.../SlowFast-master/SLOWFAST_8x8_R50.pkl', model, data_parallel=False, optimizer=optimizer, inflation=False, convert_from_caffe2=True,)
print('Model loaded.')
num_pairs=len(data1)
for epoch in range(10):
    indicies = list(range(num_pairs))
    np.random.shuffle(indicies)
    for j in np.arange(num_pairs):
        images = [data2[indicies[j]].reshape(1,3,8,224,224).cuda(non_blocking=True),data1[indicies[j]].reshape(1,3,32,224,224).cuda(non_blocking=True)]
        labels = label[indicies[j]].reshape(1).cuda()

        # Forward pass
        preds = model(images)
        loss = nn.CrossEntropyLoss(reduction="mean")(preds, labels)
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print('success')
    if i==0:
        torch.save(model.state_dict(),'.../slowfast_weight.pkl')
        

----------------------------------------------------------2019.11.14更新----------------------------------------------------------------
更新主要是在slowfast使用多GPU训练,SlowFast-master中的model_builder.py文件如果用在多GPU是有问题的,作者没有写完整,所以会产生下面我这篇博客的问题
https://blog.csdn.net/weixin_42388228/article/details/103067973
具体更改在model_builder.py的build_model函数中,具体更改如下(自己改的,可能改的比较简单)

"""Model construction functions."""

import torch

from slowfast.models.video_model_builder import ResNetModel, SlowFastModel
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1,2'
device = torch.device('cuda:0')
_MODEL_TYPES = {
    "slowfast": SlowFastModel,
    "slowonly": ResNetModel,
    "c2d": ResNetModel,
    "i3d": ResNetModel,
}

def build_model(cfg):
    assert (
        cfg.MODEL.ARCH in _MODEL_TYPES.keys()
    ), "Model type '{}' not supported".format(cfg.MODEL.ARCH)
    assert (
        cfg.NUM_GPUS <= torch.cuda.device_count()
    ), "Cannot use more GPU devices than available"
    model = _MODEL_TYPES[cfg.MODEL.ARCH](cfg)
    if cfg.NUM_GPUS > 1:
        torch.distributed.init_process_group('nccl',init_method='file:///home/.../my_file',world_size=1,rank=0)
        model = torch.nn.parallel.DistributedDataParallel(module=model.to(device),find_unused_parameters=True)
    return model

改的比较多,所以最好再备用一个原始的model_builder.py文件用于其他情况,比如说最基本的单GPU训练或多机多卡分布式训练。
DistributedDataParallel函数参数意义参考:
https://github.com/pytorch/examples/tree/master/imagenet

你可能感兴趣的:(深度学习,视频分类,动作识别)