SlowFastNet github(最近放出来的):
https://github.com/facebookresearch/SlowFast
配置环境要求:
https://github.com/facebookresearch/SlowFast/blob/master/INSTALL.md
这里的两个包PyAv和fvcore比较不好装;
fvcore的github上推荐的是使用pip install ‘git+https://github.com/facebookresearch/fvcore’,但由于加密系统的问题,git用不了;所以只能下载下来,解压后进入文件夹使用python setup.py install指令通过编译来安装;
PyAv使用推荐的conda install av -c conda-forge出现了段错误,段错误解决具体参考我另一篇https://blog.csdn.net/weixin_42388228/article/details/102882607;同样这里也可以使用先下载下来再python setup.py install来安装,这样安装会报错误,查了下PyAv github里issue列表是因为缺少一些依赖,具体参考我另一篇https://blog.csdn.net/weixin_42388228/article/details/102817959;
这样安装就完事了
权重文件:
https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md
我下载的kinetics中的倒数第三个SLOWFAST_8x8_R50,最后两个暂时还没提供
坑1:
使用权重文件对应的yaml文件时,是用…/SlowFast-master/configs/Kinetics/SLOWFAST_8x8_R50.yaml来配置config,还需参考…/SlowFast-master/configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml文件更改第一个yaml文件。(有一处改动,kernel_size)
坑2:
slowfast的输入为一个list,list第一个元素的shape为[batch_size,3,8,224,224],第二个元素的shape为[batch_size,3,32,224,224]
坑3:
我这的实现在yaml文件中修改了gpu个数为1
import sys
sys.path.append('.../SlowFast-master/slowfast/config/')
sys.path.append('.../SlowFast-master/slowfast/models/')
sys.path.append('.../SlowFast-master/slowfast/utils/')
import slowfast.models.optimizer as optim
import slowfast.utils.checkpoint as cu
from defaults import _C
from model_builder import _MODEL_TYPES
from slowfast.models import model_builder
from slowfast.utils.c2_model_loading import get_name_convert_func
import torch
import torch.nn as nn
import os
import cv2
import numpy as np
import pickle
import yaml
torch.cuda.set_device(7)
########################################### data preparation ###########################################
data1,label=data4file(batch_size=32,stride=70)
data2,_=data4file(batch_size=8,stride=70)
data1=torch.from_numpy(data1).float()
data2=torch.from_numpy(data2).float()
label=torch.from_numpy(label).long()
########################################### customized config file ###########################################
f1=open('.../SlowFast-master/configs/Kinetics/SLOWFAST_8x8_R50.yaml')
d1=yaml.load(f1)
for i in d1.keys():
if not isinstance(d1[i],dict):
_C[i]=d1[i]
else:
for j in d1[i].keys():
_C[i][j]=d1[i][j]
################################################ model finetune ################################################
model=model.builder.build_model(_C)
print('Model built.')
# print(*list(model.children())[-1:])
optimizer = optim.construct_optimizer(model, _C)
cu.load_checkpoint('.../SlowFast-master/SLOWFAST_8x8_R50.pkl', model, data_parallel=False, optimizer=optimizer, inflation=False, convert_from_caffe2=True,)
print('Model loaded.')
num_pairs=len(data1)
for epoch in range(10):
indicies = list(range(num_pairs))
np.random.shuffle(indicies)
for j in np.arange(num_pairs):
images = [data2[indicies[j]].reshape(1,3,8,224,224).cuda(non_blocking=True),data1[indicies[j]].reshape(1,3,32,224,224).cuda(non_blocking=True)]
labels = label[indicies[j]].reshape(1).cuda()
# Forward pass
preds = model(images)
loss = nn.CrossEntropyLoss(reduction="mean")(preds, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('success')
if i==0:
torch.save(model.state_dict(),'.../slowfast_weight.pkl')
----------------------------------------------------------2019.11.14更新----------------------------------------------------------------
更新主要是在slowfast使用多GPU训练,SlowFast-master中的model_builder.py文件如果用在多GPU是有问题的,作者没有写完整,所以会产生下面我这篇博客的问题
https://blog.csdn.net/weixin_42388228/article/details/103067973
具体更改在model_builder.py的build_model函数中,具体更改如下(自己改的,可能改的比较简单)
"""Model construction functions."""
import torch
from slowfast.models.video_model_builder import ResNetModel, SlowFastModel
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1,2'
device = torch.device('cuda:0')
_MODEL_TYPES = {
"slowfast": SlowFastModel,
"slowonly": ResNetModel,
"c2d": ResNetModel,
"i3d": ResNetModel,
}
def build_model(cfg):
assert (
cfg.MODEL.ARCH in _MODEL_TYPES.keys()
), "Model type '{}' not supported".format(cfg.MODEL.ARCH)
assert (
cfg.NUM_GPUS <= torch.cuda.device_count()
), "Cannot use more GPU devices than available"
model = _MODEL_TYPES[cfg.MODEL.ARCH](cfg)
if cfg.NUM_GPUS > 1:
torch.distributed.init_process_group('nccl',init_method='file:///home/.../my_file',world_size=1,rank=0)
model = torch.nn.parallel.DistributedDataParallel(module=model.to(device),find_unused_parameters=True)
return model
改的比较多,所以最好再备用一个原始的model_builder.py文件用于其他情况,比如说最基本的单GPU训练或多机多卡分布式训练。
DistributedDataParallel函数参数意义参考:
https://github.com/pytorch/examples/tree/master/imagenet