1、配置pytorch环境
CUDA8 安装pytorch1.0.1:
conda install pytorch torchvision cudatoolkit=8.0
2、训练模型
类别设置:类别总数+1
awesome-semantic-segmentation-pytorch/core/data/dataloader/pascal_voc.py
不然会报错:RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/
2.2模型默认路径 :
~/.torch/models
2.3训练
Single GPU training
# for example, train fcn32_vgg16_pascal_voc:
python train.py --model fcn32s --backbone vgg16 --dataset pascal_voc --lr 0.0001 --epochs 50
Multi-GPU training
# for example, train fcn32_vgg16_pascal_voc with 4 GPUs:
export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py --model fcn32s --backbone vgg16 --dataset pascal_voc --lr 0.0001 --epochs 50
Evaluation
Single GPU evaluating
# for example, evaluate fcn32_vgg16_pascal_voc
python eval.py --model fcn32s --backbone vgg16 --dataset pascal_voc
Multi-GPU evaluating
# for example, evaluate fcn32_vgg16_pascal_voc with 4 GPUs:
export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model fcn32s --backbone vgg16 --dataset pascal_voc
PSPNet-Resnet50
python train.py --model psp --backbone resnet50 --dataset pascal_voc --lr 0.0001 --epochs 500
python eval.py --model psp --backbone resnet50 --dataset pascal_voc
继续训练的话,设置--resume
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/yuanyq/.torch/models/vgg16-397923af.pth
113246208it [00:13, 5563879.00it/s]
Found 78 images in the folder ../datasets/voc/VOC2012
2019-09-20 09:03:04,978 semantic_segmentation INFO: Start training, Total Epochs: 50 = Total Iterations 5400
Found 184 images in the folder ../datasets/voc/VOC2012
2019-09-20 09:09:35,498 semantic_segmentation INFO: Start training, Total Epochs: 50 = Total Iterations 12950
iterations(迭代):每一次迭代都是一次权重更新,每一次权重更新需要batch_size个数据进行Forward运算得到损失函数,再BP算法更新参数。1个iteration等于使用batchsize个样本训练一次。
epochs被定义为向前和向后传播中所有批次的单次训练迭代。这意味着1个周期是整个输入数据的单次向前和向后传递。简单说,epochs指的就是训练过程中数据将被“轮”多少次,就这样。
举个例子
训练集有1000个样本,batchsize=10,那么:
训练完整个样本集需要:
100次iteration,1次epoch。
具体的计算公式为:
one epoch = numbers of iterations = N = 训练样本的数量/batch_size