StNet模型框架为ActivityNet Kinetics Challenge 2018中夺冠的基础网络框架,本次开源的是基于ResNet50实现的StNet模型,基于其他backbone网络的框架用户可以依样配置。该模型提出“super-image"的概念,在super-image上进行2D卷积,建模视频中局部时空相关性。另外通过temporal modeling block建模视频的全局时空依赖,最后用一个temporal Xception block对抽取的特征序列进行长时序建模。StNet主体网络结构如下图所示:
详细内容请参考AAAI’2019年论文StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition
StNet的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。数据下载及准备请参考数据说明
数据准备完毕后,可以通过如下两种方式启动训练:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET \
--config=./configs/stnet.yaml \
--log_interval=10 \
--valid_interval=1 \
--use_gpu=True \
--save_dir=./data/checkpoints \
--fix_random_seed=False
--pretrain=$PATH_TO_PRETRAIN_MODEL
bash run.sh train STNET ./configs/stnet.yaml
从头开始训练,需要加载在ImageNet上训练的ResNet50权重作为初始化参数,请下载此模型参数并解压,将上面启动命令行或者run.sh脚本中的pretrain参数设置为解压之后的模型参数存放路径。如果没有手动下载并设置pretrain参数,则程序会自动下载并将参数保存在~/.paddle/weights/ResNet50_pretrained目录下面
可下载已发布模型model通过–resume指定权重存放路径进行finetune等开发
**数据读取器说明: **模型读取Kinetics-400数据集中的mp4数据,每条数据抽取seg_num段,每段抽取seg_len帧图像,对每帧图像做随机增强后,缩放至target_size。
注:该源码含有其他的模型,我们只用的到models/PaddleCV/video
代码。
git clone https://github.com/PaddlePaddle/models.git
cd models/PaddleCV/video
我是下载已发布模型model通过–resume指定权重存放路径进行finetune等开发:https://paddlemodels.bj.bcebos.com/video_classification/STNET.pdparams
放入到文件夹~/.paddle/weights/
下。
Kinetics数据集下载:可以按照https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/video/data/dataset/README.md#Kinetics数据集 这个链接里的说明准备数据集的
数据需要转化为pkl格式:
# 首先生成预处理需要的数据集标签文件
python generate_label.py kinetics-400_train.csv kinetics400_label.txt
# 然后执行如下程序:
python video2pkl.py kinetics-400_train.csv $Source_dir $Target_dir 8 #以8个进程为例
# 对于train数据,
Source_dir = $Code_Root/data/dataset/kinetics/data_k400/train_mp4
Target_dir = $Code_Root/data/dataset/kinetics/data_k400/train_pkl
# 对于val数据,
Source_dir = $Code_Root/data/dataset/kinetics/data_k400/val_mp4
Target_dir = $Code_Root/data/dataset/kinetics/data_k400/val_pkl
# 这样即可将mp4文件解码并保存为pkl文件。
生成训练和验证集list:
cd $Code_Root/data/dataset/kinetics
ls $Code_Root/data/dataset/kinetics/data_k400/train_pkl/* > train.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > val.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > test.list
ls $Code_Root/data/dataset/kinetics/data_k400/val_pkl/* > infer.list
# 即可生成相应的文件列表,train.list和val.list的每一行表示一个pkl文件的绝对路径,示例如下:
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/train_pkl/data_batch_100-097
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/train_pkl/data_batch_100-114
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/train_pkl/data_batch_100-118
# 或者
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/val_pkl/data_batch_102-085
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/val_pkl/data_batch_102-086
/ssd1/user/models/PaddleCV/PaddleVideo/data/dataset/kinetics/data_k400/val_pkl/data_batch_102-090
准备:
1.我使用的显卡是RTX2070,显存8G,需要修改configs/stnet.yaml
:
TRAIN: batch_size: 64
2.因为我使用了单个GPU,所以需要修改configs/stnet.yaml
:
TRAIN: num_gpus: 1
3.我运行的时候‘train.py’有代码会报错,需要修改170行
为:
assert os.path.exists(args.pretrain + ".pdparams"), \
训练:
可以通过如下两种方式启动训练:
# 单卡训练:
export CUDA_VISIBLE_DEVICES=0
# 多卡训练:
# export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET \
--model_name=STNET \
--config=./configs/stnet.yaml \
--log_interval=10 \
--valid_interval=1 \
--use_gpu=True \
--save_dir=./data/checkpoints \
--fix_random_seed=False \
--pretrain=/home/dell/.paddle/weights/STNET
bash run.sh train STNET ./configs/stnet.yaml
因为我是找朋友要的Kinetics的pkl数据集来训练,数据量不是很多,很快就训练完成了。
测试不是怎么好测试,暂时就没做了,至少训练是没问题了,如果想用自己的数据来训练,我后续会进行研究。
/home/dell/miniconda3/bin/python3.7 /home/dell/PycharmProjects/stnet_train_paddle/train.py --model_name=STNET --config=./configs/stnet.yaml --log_interval=10 --valid_interval=1 --use_gpu=True --save_dir=./data/checkpoints --fix_random_seed=False --pretrain=/home/dell/.paddle/weights/STNET
DALI is not installed, you can improve performance if use DALI
[INFO: train.py: 254]: Namespace(batch_size=None, config='./configs/stnet.yaml', epoch=None, fix_random_seed=False, is_profiler=0, learning_rate=None, log_interval=10, model_name='STNET', no_memory_optimize=False, pretrain='/home/dell/.paddle/weights/STNET', profiler_path='./', resume=None, save_dir='./data/checkpoints', use_gpu=True, valid_interval=1)
[INFO: config_utils.py: 70]: ---------------- Train Arguments ----------------
[INFO: config_utils.py: 72]: MODEL:
[INFO: config_utils.py: 74]: name:STNET
[INFO: config_utils.py: 74]: format:pkl
[INFO: config_utils.py: 74]: num_classes:400
[INFO: config_utils.py: 74]: seg_num:7
[INFO: config_utils.py: 74]: seglen:5
[INFO: config_utils.py: 74]: image_mean:[0.485, 0.456, 0.406]
[INFO: config_utils.py: 74]: image_std:[0.229, 0.224, 0.225]
[INFO: config_utils.py: 74]: num_layers:50
[INFO: config_utils.py: 74]: topk:5
[INFO: config_utils.py: 72]: TRAIN:
[INFO: config_utils.py: 74]: epoch:60
[INFO: config_utils.py: 74]: short_size:256
[INFO: config_utils.py: 74]: target_size:224
[INFO: config_utils.py: 74]: num_reader_threads:12
[INFO: config_utils.py: 74]: buf_size:1024
[INFO: config_utils.py: 74]: batch_size:64
[INFO: config_utils.py: 74]: num_gpus:8
[INFO: config_utils.py: 74]: use_gpu:True
[INFO: config_utils.py: 74]: filelist:./data/dataset/kinetics/train.list
[INFO: config_utils.py: 74]: learning_rate:0.01
[INFO: config_utils.py: 74]: learning_rate_decay:0.1
[INFO: config_utils.py: 74]: l2_weight_decay:0.0001
[INFO: config_utils.py: 74]: momentum:0.9
[INFO: config_utils.py: 74]: total_videos:224684
[INFO: config_utils.py: 74]: pretrain_base:./data/dataset/pretrained/ResNet50_pretrained
[INFO: config_utils.py: 72]: VALID:
[INFO: config_utils.py: 74]: short_size:256
[INFO: config_utils.py: 74]: target_size:224
[INFO: config_utils.py: 74]: num_reader_threads:12
[INFO: config_utils.py: 74]: buf_size:1024
[INFO: config_utils.py: 74]: batch_size:128
[INFO: config_utils.py: 74]: filelist:./data/dataset/kinetics/val.list
[INFO: config_utils.py: 72]: TEST:
[INFO: config_utils.py: 74]: seg_num:25
[INFO: config_utils.py: 74]: short_size:256
[INFO: config_utils.py: 74]: target_size:256
[INFO: config_utils.py: 74]: num_reader_threads:12
[INFO: config_utils.py: 74]: buf_size:1024
[INFO: config_utils.py: 74]: batch_size:4
[INFO: config_utils.py: 74]: filelist:./data/dataset/kinetics/test.list
[INFO: config_utils.py: 72]: INFER:
[INFO: config_utils.py: 74]: seg_num:25
[INFO: config_utils.py: 74]: short_size:256
[INFO: config_utils.py: 74]: target_size:256
[INFO: config_utils.py: 74]: num_reader_threads:12
[INFO: config_utils.py: 74]: buf_size:1024
[INFO: config_utils.py: 74]: batch_size:1
[INFO: config_utils.py: 74]: filelist:./data/dataset/kinetics/infer.list
[INFO: config_utils.py: 74]: video_path:
[INFO: config_utils.py: 74]: kinetics_labels:./data/dataset/kinetics_labels.json
[INFO: config_utils.py: 75]: -------------------------------------------------
W0520 19:19:57.004699 29621 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.1, Runtime API Version: 10.0
W0520 19:19:57.007618 29621 device_context.cc:244] device: 0, cuDNN Version: 7.5.
W0520 19:19:57.007634 29621 device_context.cc:270] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
[INFO: stnet.py: 163]: Load pretrain weights from /home/dell/.paddle/weights/STNET, exclude fc, batch_norm, xception, conv3d layers.
[INFO: stnet.py: 173]: Delete conv3d_0.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete conv3d_0.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_24.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_24.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_24.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_24.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete conv3d_1.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete conv3d_1.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_44.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_44.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_44.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete batch_norm_44.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att_conv.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att_conv.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att_2.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att_2.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bndw.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bndw.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bndw.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bndw.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att1.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att1.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att1_2.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_att1_2.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_dw.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_dw.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn2.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn2.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn2.w_1 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete xception_bn2.w_2 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete fc_0.w_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 173]: Delete fc_0.b_0 from pretrained parameters. Do not load it
[INFO: stnet.py: 179]: conv1_weights is transformed from [Cout, 3, Kh, Kw] into [Cout, 3*seglen, Kh, Kw]
[INFO: accuracy_metrics.py: 34]: Resetting train metrics...
[INFO: accuracy_metrics.py: 34]: Resetting valid metrics...
[INFO: train_utils.py: 46]: ------- learning rate [0.], learning rate counter [-1] -----
reader shuffle seed 0
[INFO: kinetics_reader.py: 249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py: 253]: read images from 0, length: 756, lines length: 756, total: 756
I0520 19:20:00.296499 29621 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0520 19:20:00.323364 29621 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0520 19:20:00.353277 29621 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0520 19:20:00.369305 29621 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:20:01] Epoch 0, iter 0, time 2.6954545974731445, Loss: 6.386939, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:20:11] Epoch 0, iter 10, time 1.144268274307251, Loss: 7.805868, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:20:25] Epoch 0, iter 20, time 1.8410155773162842, Loss: 12.061253, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:20:41] Epoch 0, iter 30, time 1.1101765632629395, Loss: 9.782310, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:20:57] Epoch 0, iter 40, time 1.6426472663879395, Loss: 6.662434, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:21:12] Epoch 0, iter 50, time 1.0377476215362549, Loss: 7.927030, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:21:27] Epoch 0, iter 60, time 1.8886034488677979, Loss: 7.618662, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:21:42] Epoch 0, iter 70, time 1.4986093044281006, Loss: 10.829721, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:21:57] Epoch 0, iter 80, time 1.5821115970611572, Loss: 10.367525, top1_acc: 0.00, top5_acc: 0.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 19:22:12] Epoch 0, iter 90, time 1.1715724468231201, Loss: 10.460176, top1_acc: 0.00, top5_acc: 0.00
[INFO: train_utils.py: 122]: [TRAIN] Epoch 0 training finished, average time: 1.4606082644513858
[INFO: accuracy_metrics.py: 34]: Resetting valid metrics...
[INFO: kinetics_reader.py: 249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py: 253]: read images from 0, length: 76, lines length: 76, total: 76
share_vars_from is set, scope is ignored.
I0520 19:22:21.868980 29621 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies
I0520 19:22:21.874469 29621 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I0520 19:22:21.879205 29621 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0520 19:22:21.883241 29621 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
[INFO: metrics_util.py: 143]: [TEST] test_iter 0 Loss: 24.683359, top1_acc: 0.00, top5_acc: 12.50
[INFO: metrics_util.py: 184]: [TEST] Finish Loss: 24.371124, top1_acc: 1.56, top5_acc: 12.50
[INFO: train_utils.py: 46]: ------- learning rate [0.01], learning rate counter [93] -----
reader shuffle seed 1
...
...(中间省略)
...
[INFO: kinetics_reader.py: 249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py: 253]: read images from 0, length: 756, lines length: 756, total: 756
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:48:33] Epoch 58, iter 0, time 4.258568286895752, Loss: 0.576913, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:48:45] Epoch 58, iter 10, time 1.3456335067749023, Loss: 0.317392, top1_acc: 100.00, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:48:59] Epoch 58, iter 20, time 1.1674671173095703, Loss: 0.671914, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:49:14] Epoch 58, iter 30, time 1.9933085441589355, Loss: 0.784231, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:49:29] Epoch 58, iter 40, time 1.1111698150634766, Loss: 0.930491, top1_acc: 87.50, top5_acc: 87.50
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:49:42] Epoch 58, iter 50, time 1.8673505783081055, Loss: 0.543070, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:49:57] Epoch 58, iter 60, time 1.634033203125, Loss: 0.919805, top1_acc: 62.50, top5_acc: 87.50
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:50:11] Epoch 58, iter 70, time 0.9776091575622559, Loss: 0.418453, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:50:27] Epoch 58, iter 80, time 0.9865224361419678, Loss: 1.184469, top1_acc: 62.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:50:42] Epoch 58, iter 90, time 1.0359406471252441, Loss: 0.816228, top1_acc: 75.00, top5_acc: 100.00
[INFO: train_utils.py: 122]: [TRAIN] Epoch 58 training finished, average time: 1.4280521023658015
[INFO: accuracy_metrics.py: 34]: Resetting valid metrics...
[INFO: kinetics_reader.py: 249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py: 253]: read images from 0, length: 76, lines length: 76, total: 76
[INFO: metrics_util.py: 143]: [TEST] test_iter 0 Loss: 6.010135, top1_acc: 18.75, top5_acc: 37.50
[INFO: metrics_util.py: 184]: [TEST] Finish Loss: 10.055953, top1_acc: 6.25, top5_acc: 28.12
[INFO: train_utils.py: 46]: ------- learning rate [0.01], learning rate counter [5545] -----
reader shuffle seed 59
[INFO: kinetics_reader.py: 249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py: 253]: read images from 0, length: 756, lines length: 756, total: 756
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:51:02] Epoch 59, iter 0, time 2.8308911323547363, Loss: 0.166549, top1_acc: 100.00, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:51:18] Epoch 59, iter 10, time 1.7925031185150146, Loss: 0.996779, top1_acc: 75.00, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:51:31] Epoch 59, iter 20, time 1.660839319229126, Loss: 0.654242, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:51:45] Epoch 59, iter 30, time 2.043001651763916, Loss: 0.896705, top1_acc: 75.00, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:51:57] Epoch 59, iter 40, time 1.0093119144439697, Loss: 0.860474, top1_acc: 75.00, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:52:11] Epoch 59, iter 50, time 1.7284026145935059, Loss: 0.901385, top1_acc: 62.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:52:26] Epoch 59, iter 60, time 1.1205298900604248, Loss: 0.465495, top1_acc: 87.50, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:52:41] Epoch 59, iter 70, time 1.2147243022918701, Loss: 0.953047, top1_acc: 75.00, top5_acc: 100.00
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:52:56] Epoch 59, iter 80, time 1.788550853729248, Loss: 0.777208, top1_acc: 75.00, top5_acc: 87.50
[INFO: metrics_util.py: 143]: [TRAIN 2020-05-20 21:53:11] Epoch 59, iter 90, time 0.9238030910491943, Loss: 1.572572, top1_acc: 50.00, top5_acc: 87.50
[INFO: train_utils.py: 122]: [TRAIN] Epoch 59 training finished, average time: 1.4281317880076747
[INFO: accuracy_metrics.py: 34]: Resetting valid metrics...
[INFO: kinetics_reader.py: 249]: trainerid 0, trainer_count 1
[INFO: kinetics_reader.py: 253]: read images from 0, length: 76, lines length: 76, total: 76
[INFO: metrics_util.py: 143]: [TEST] test_iter 0 Loss: 4.260227, top1_acc: 18.75, top5_acc: 43.75
[INFO: metrics_util.py: 184]: [TEST] Finish Loss: 8.431045, top1_acc: 9.38, top5_acc: 26.56
Process finished with exit code 0
如果我们要使用自己的数据来训练怎么办?
那么就要按照Kinetics数据集的方式,将视频转换为pkl文件,我们需要了解Kinetics数据集的pkl文件是怎样生成的,我们可以查看源码:data/dataset/kinetics/video2pkl.py
根据运行该脚本的命令来分析:
python video2pkl.py kinetics-400_train.csv \
data/dataset/kinetics/data_k400/train_mp4 \
data/dataset/kinetics/data_k400/train_pkl \
8
kinetics-400_train.csv
:kinetics-400数据集列表,包含视频源、视频截取信息、标签等。
data/dataset/kinetics/data_k400/train_mp4
:源视频目录
data/dataset/kinetics/data_k400/train_pkl
:目标pkl文件目录
8
:线程数
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import glob
try:
import cPickle as pickle
except:
import pickle
from multiprocessing import Pool
# example command line: python generate_k400_pkl.py kinetics-400_train.csv 8
#
# kinetics-400_train.csv is the training set file of K400 official release
# each line contains laebl,youtube_id,time_start,time_end,split,is_cc
assert (len(sys.argv) == 5)
# 打开kinetics-400_train.csv文件并读出列表
f = open(sys.argv[1])
source_dir = sys.argv[2]
target_dir = sys.argv[3]
num_threads = sys.argv[4]
all_video_entries = [x.strip().split(',') for x in f.readlines()]
all_video_entries = all_video_entries[1:]
f.close()
# 读取kinetics400标签信息
category_label_map = {}
f = open('kinetics400_label.txt')
for line in f:
ens = line.strip().split(' ')
category = " ".join(ens[0:-1])
label = int(ens[-1])
category_label_map[category] = label
f.close()
def generate_pkl(entry):
mode = entry[4]
category = entry[0].strip('"')
category_dir = category
video_path = os.path.join(
'./',
entry[1] + "_%06d" % int(entry[2]) + "_%06d" % int(entry[3]) + ".mp4")
video_path = os.path.join(source_dir, category_dir, video_path)
label = category_label_map[category]
vid = './' + video_path.split('/')[-1].split('.')[0]
if os.path.exists(video_path):
if not os.path.exists(vid):
os.makedirs(vid)
# 这里是将视频分离为图片
os.system('ffmpeg -i ' + video_path + ' -q 0 ' + vid + '/%06d.jpg')
else:
print("File not exists {}".format(video_path))
return
images = sorted(glob.glob(vid + '/*.jpg'))
ims = []
for img in images:
f = open(img, 'rb')
# 这里是将这段10s视频的很多张图片,放入ims
ims.append(f.read())
f.close()
output_pkl = vid + ".pkl"
output_pkl = os.path.join(target_dir, output_pkl)
f = open(output_pkl, 'wb')
# 这里可以发现:生成的pkl文件格式为3项:vid, 标签, 图片列表
pickle.dump((vid, label, ims), f, protocol=2)
f.close()
os.system('rm -rf %s' % vid)
pool = Pool(processes=int(sys.argv[4]))
pool.map(generate_pkl, all_video_entries)
pool.close()
pool.join()
例如:我尝试打开某个pkl文件:0-nxKQTMo-Y_000000_000010.pkl
import six.moves.cPickle as pickle
inf = pickle.load(open(r'0-nxKQTMo-Y_000000_000010.pkl', 'rb'))
print(inf)
# 以下是输出:
<class 'tuple'>: ('./0-nxKQTMo-Y_000000_000010', 183, [b'\xff\xd8\xff...\xe0\x00\x10]
可以发现是一个tuple:
tuple[0]
是文件名。
tuple[1]
是对应的标签index(可查看kinetics400_label.txt)
tuple[2]
是一个图片集list:
(如果是30帧率的视频源,则10s的数据,这里就是300张图)
由于后面我得到的数据的标签和标准Kinetics不同,导致很多没有转换出来,所以根据我的数据的情况改写了video2pkl.py:
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Modified by Toson on May 22, 2020.
#
import os
import glob
try:
import cPickle as pickle
except:
import pickle
from multiprocessing import Pool
import argparse
def parse_args():
parser = argparse.ArgumentParser(description='Video to pkl')
parser.add_argument('--csv_file', default='kinetics-400-mini_train.csv', help='csv file')
parser.add_argument('--label_file', default='kinetics400_label.txt', help='label file')
parser.add_argument('--mp4_dir', default='../data_k400_modified/mini_train/', help='dataset input directory')
parser.add_argument('--threads_num', type=int, default=12, help='Threads number')
args = parser.parse_args()
return args
args = parse_args()
source_dir = args.mp4_dir
source_dir = source_dir[:-1] if source_dir[-1] == '/' else source_dir
if source_dir[-1] == '/':
print('--mp4_dir is error:', args.mp4_dir)
exit(-1)
target_dir = source_dir + '_pkl'
if os.path.exists(target_dir) is False:
os.system('mkdir -p ' + target_dir)
num_threads = args.threads_num
category_label_map = {}
f = open(args.label_file)
for line in f:
ens = line.strip().split(' ')
category = " ".join(ens[0:-1])
label = int(ens[-1])
category_label_map[category] = label
f.close()
# 发现标签和文件夹名字不同
# list1 = os.listdir(source_dir)
# list1.sort()
# label_list.sort()
# diff = set(label_list).difference(list1)
# print(diff)
# diff = set(list1).difference(label_list)
# print(diff)
# exit(0)
errorCount_notThreadsSafe = 0
def generate_pkl(entry):
# entry[0]
mode = entry[4]
category = entry[0].strip('"')
category_dir = category.replace(' ', '_').replace('(', '_').replace(')', '_').replace('\'', '_')
video_path = os.path.join(
'./',
entry[1] + "_%06d" % int(entry[2]) + "_%06d" % int(entry[3]) + ".mp4")
video_path = os.path.join(source_dir, category_dir, video_path)
label = category_label_map[category]
vid = './' + video_path.split('/')[-1].split('.')[0]
if os.path.exists(video_path):
if not os.path.exists(vid):
os.makedirs(vid)
os.system('ffmpeg -i ' + video_path + ' -q 0 ' + vid + '/%06d.jpg')
else:
print("File not exists {}".format(video_path))
errorCount_notThreadsSafe += 1
exit(-1)
# return
images = sorted(glob.glob(vid + '/*.jpg'))
ims = []
for img in images:
f = open(img, 'rb')
ims.append(f.read())
f.close()
output_pkl = vid + ".pkl"
output_pkl = os.path.join(target_dir, output_pkl)
f = open(output_pkl, 'wb')
pickle.dump((vid, label, ims), f, protocol=2)
f.close()
if vid == '*' and vid[0] == '/':
print('ERROR!!! you want to rm -rf', vid)
exit(-2)
os.system('rm -rf %s' % vid)
f = open(args.csv_file)
all_video_entries = [x.strip().split(',') for x in f.readlines()]
all_video_entries = all_video_entries[1:]
f.close()
pool = Pool(processes=int(args.threads_num))
pool.map(generate_pkl, all_video_entries)
pool.close()
pool.join()
print('errorCount_notThreadsSafe:', errorCount_notThreadsSafe)
print('end.')
至此,对video转pkl的研究就告一段落了。
先理清一下思路,特别是处理数据集:
本来使用上述流程,不过我后来写了脚本可以直接从摄像头视频流直接处理成pkl文件,免去了很多麻烦:
代码:暂不上传了。
GitHub: https://github.com/PaddlePaddle/models/blob/release/1.8/PaddleCV/video/models/stnet/README.md
百度AI开放平台:https://www.paddlepaddle.org.cn/modelbasedetail/stnet
百度大脑:STNET可阅读版(只是该项目使用的是HMDB 51数据)
Kinetics数据集下载:可以按照https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/video/data/dataset/README.md#Kinetics数据集 这个链接里的说明准备数据集的