目录
0 前言
1 环境配置
1.1 安装python包
1.2 下载detail-api
1.3 运行prepare_pcontext.py
1.4 运行 prepare_ade20k.py
2 训练模型
3 测试模型
3.1 下载模型
3.2 测试 encnet_jpu_res50_pcontext.pth.tar
3.2.1 test [single-scale] (单一尺寸:pixAcc=0.7898、mIou=0.5105)
3.2.2 test [multi-scale] (多尺寸:pixAcc=0.7964、mIou=0.5210)
3.2.3 predict [single-scale] (单一尺寸)
4 报错与解决:
4.1 detail-api编译报错
4.2 模型文件丢失
4.3 AttributeError: 'NoneType' object has no attribute 'run_slave'
参考链接:
全称:FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation----沈阳自动化所团队
论文:https://arxiv.org/abs/1903.11816
github:https://github.com/wuhuikai/FastFCN
本机:RTX3070、cuda-11.0、torch-1.7.1+cu110、python3.7
FastFCN下一篇:深度学习(8):FastFCN代码运行、测试与预测2_biter0088的博客-CSDN博客
官方测试的环境:
PyTorch >= 1.1.0 (Note: The code is test in the environment with
python=3.6, cuda=9.0
)
#master版本,我克隆的是2022年3月版本的,作者可能会有改动
git clone https://github.com/wuhuikai/FastFCN.git
cd FastFCN
创建文件requirements.txt,安装其他包
注:激活python环境 source activate yolov5py37
nose
tqdm
scipy
cython
requests
scikit-image
python3-dev
libevent-dev
cPython
pip install -r requirements.txt
下载到FastFCN目录下:
git clone https://github.com/zhanghang1989/detail-api
并注释/xx/FastFCN/scripts/prepare_pcontext.py文件如下:
def install_pcontext_api():
#repo_url = "https://github.com/zhanghang1989/detail-api"
#os.system("git clone " + repo_url)
os.system("cd detail-api/PythonAPI/ && python setup.py install")
shutil.rmtree('detail-api')
try:
import detail
except Exception:
print("Installing PASCAL Context API failed, please install it manually %s"%(repo_url))
注:执行prepare_pcontext.py后,detail-api被安装,上面箭头指的文件夹会被删除
文件目录为:/xx/FastFCN/scripts/prepare_pcontext.py,准备VOC2010数据集
python -m scripts.prepare_pcontext
会下载VOC2010数据到如下目录:
#VOC2010数据集
官方网站:http://host.robots.ox.ac.uk/pascal/VOC/voc2010/index.html
.
└── VOCdevkit #根目录
└── VOC2010 #不同年份的数据集,这里只下载了2012的,还有2007等其它年份的
├── Annotations #存放xml文件,与JPEGImages中的图片一一对应,解释图片的内容等等
├── ImageSets #该目录下存放的都是txt文件,txt文件中每一行包含一个图片的名称,末尾会加上±1表示正负样本
│ ├── Action
│ ├── Layout
│ ├── Main
│ └── Segmentation
├── JPEGImages #存放源图片
├── SegmentationClass #存放的是图片,语义分割相关,标注出每个像素的类别
└── SegmentationObject #存放的是图片,实例分割相关,标注出每个像素属于哪一个物体
下载完成后,会编译安装detail-api,安装完成后会删除前面1.2的下载文件----所以如果detail-api在终端打印输出安装成功时,下面几行就没有作用了,可以注释掉:
os.system("cd detail-api/PythonAPI/ && python setup.py install")
shutil.rmtree('detail-api')
注:prepare_pcontext.py程序再次运行时,还会重新下载一遍VOC2010数据集----一个bug(一般如果成功安装包和下载数据后,这个程序就不要运行了);如果第一遍数据下载成功后,出现了一些其他报错,需要再次运行prepare_pcontext.py去准备数据和环境包时,可以将下面几行注释掉:
if __name__ == '__main__':
args = parse_args()
#mkdir(os.path.expanduser('~/.encoding/data'))
#if args.download_dir is not None:
# if os.path.isdir(_TARGET_DIR):
# os.remove(_TARGET_DIR)
# make symlink
# os.symlink(args.download_dir, _TARGET_DIR)
#else:
# download_ade(_TARGET_DIR, overwrite=False)
install_pcontext_api()
文件目录为:/xxx/FastFCN/scripts/prepare_ade20k.py,准备ADEChallengeData2016数据集。
python -m scripts.prepare_ade20k
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ python -m scripts.prepare_ade20k
Downloading /home/meng/.encoding/data/downloads/ADEChallengeData2016.zip from http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip...
944710KB [05:23, 2923.61KB/s]
Downloading /home/meng/.encoding/data/downloads/release_test.zip from http://data.csail.mit.edu/places/ADEchallenge/release_test.zip...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 206856/206856 [04:29<00:00, 766.68KB/s]
(yolov5py37) meng@meng:~/deeplearning/FastFCN$
在训练模型之前,参考4.2和4.3进行操作
参考:FastFCN/encnet_res50_pcontext.sh at master · wuhuikai/FastFCN · GitHub
训练encnet_res_50模型的参考命令为:
#train
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext \
--model encnet --jpu [JPU|JPU_X] --aux --se-loss \
--backbone resnet50 --checkname encnet_res50_pcontext
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.train --dataset pcontext --model encnet --jpu JPU --aux --se-loss --backbone resnet50 --checkname encnet_res50_pcontext
能够训练,但RuntimeError: CUDA out of memory.
---------先不训练了
在 https://github.com/wuhuikai/FastFCN#pcontext 下载作者训练好的模型文件。(下图右侧的bash文件包含指令:训练--预测--fps计算)
在下面文件夹中存放上述文件:
#github参考输入
#test [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
--model encnet --jpu [JPU|JPU_X] --aux --se-loss \
--backbone resnet50 --resume {MODEL} --split val --mode testval
我这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
--model encnet --jpu JPU --aux --se-loss \
--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval
像素准确度pixAcc=0.7898,平均交并比mIou=0.5105,测试约10分钟。
#github参考输入
#test [multi-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
--model encnet --jpu [JPU|JPU_X] --aux --se-loss \
--backbone resnet50 --resume {MODEL} --split val --mode testval --ms
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
--model encnet --jpu JPU --aux --se-loss \
--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode testval --ms
测试耗时1小时19分钟,像素准确度pixAcc为0.7964,平均交并比为0.5210
test [multi-scale] 比test [single-scale] 多了个选项--ms,--ms在test文件里面首先改变scales
然后跳转到base.py文件,scales值被传递过来
接着在base.py文件里面进行系列的计算:
for scale in self.scales:
long_size = int(math.ceil(self.base_size * scale))#math.ceil():大于浮点数的最小整数
if h > w:
height = long_size
width = int(1.0 * w * long_size / h + 0.5) #好像是根据原长h:w来设置新长度height和width
short_size = width
else:
width = long_size
height = int(1.0 * h * long_size / w + 0.5)
short_size = height
# resize image to current size
cur_img = resize_image(image, height, width, **self.module._up_kwargs)
if long_size <= crop_size: #if 和 else 保证pad_img的长宽都不小于crop_size
pad_img = pad_image(cur_img, self.module.mean,
self.module.std, crop_size)
outputs = module_inference(self.module, pad_img, self.flip)
outputs = crop_image(outputs, 0, height, 0, width)
else:
if short_size < crop_size:
# pad if needed
pad_img = pad_image(cur_img, self.module.mean,
self.module.std, crop_size)
else:
pad_img = cur_img
_,_,ph,pw = pad_img.size()
assert(ph >= height and pw >= width)
# grid forward and normalize
h_grids = int(math.ceil(1.0 * (ph-crop_size)/stride)) + 1
w_grids = int(math.ceil(1.0 * (pw-crop_size)/stride)) + 1
with torch.cuda.device_of(image):
outputs = image.new().resize_(batch,self.nclass,ph,pw).zero_().cuda()
count_norm = image.new().resize_(batch,1,ph,pw).zero_().cuda()
# grid evaluation
for idh in range(h_grids):
for idw in range(w_grids):
h0 = idh * stride
w0 = idw * stride
h1 = min(h0 + crop_size, ph)
w1 = min(w0 + crop_size, pw)
crop_img = crop_image(pad_img, h0, h1, w0, w1)
# pad if needed
pad_crop_img = pad_image(crop_img, self.module.mean,
self.module.std, crop_size)
output = module_inference(self.module, pad_crop_img, self.flip)
outputs[:,:,h0:h1,w0:w1] += crop_image(output,
0, h1-h0, 0, w1-w0)
count_norm[:,:,h0:h1,w0:w1] += 1
assert((count_norm==0).sum()==0)
outputs = outputs / count_norm
outputs = outputs[:,:,:height,:width]
score = resize_image(outputs, h, w, **self.module._up_kwargs)
scores += score
return scores
注意在base.py里面有对scores的定义:
with torch.cuda.device_of(image):
scores = image.new().resize_(batch,self.nclass,h,w).zero_().cuda()
说明在test.py文件中调用的 MultiEvalModule函数应该是为了生成多个尺度的图像用于训练。
在知乎上一个回答是:
#github参考输入
#predict [single-scale]
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
--model encnet --jpu [JPU|JPU_X] --aux --se-loss \
--backbone resnet50 --resume {MODEL} --split val --mode test
这里输入:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
--model encnet --jpu JPU --aux --se-loss \
--backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
结果为:
(yolov5py37) meng@meng:~/deeplearning/FastFCN$ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test --dataset pcontext \
> --model encnet --jpu JPU --aux --se-loss \
> --backbone resnet50 --resume /home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar --split val --mode test
Namespace(aux=True, aux_weight=0.2, backbone='resnet50', base_size=520, batch_size=16, checkname='default', crop_size=480, cuda=True, dataset='pcontext', dilated=False, epochs=80, ft=False, jpu='JPU', lateral=False, lr=0.001, lr_scheduler='poly', mode='test', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume='/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar', save_folder='experiments/segmentation/results', se_loss=True, se_weight=0.2, seed=1, split='val', start_epoch=0, test_batch_size=16, train_split='train', weight_decay=0.0001, workers=16)
loading annotations into memory...
JSON root keys:dict_keys(['info', 'images', 'annos_segmentation', 'annos_occlusion', 'annos_boundary', 'categories', 'parts'])
Done (t=3.22s)
creating index...
index created! (t=2.42s)
mask_file: /home/meng/.encoding/data/VOCdevkit/VOC2010/val.pth
=> loaded checkpoint '/home/meng/.encoding/models/encnet_jpu_res50_pcontext.pth.tar' (epoch 79)
观察上面打印的:save_folder,去找预测的结果,进行对比(原图片在:/home/meng/.encoding/data/VOCdevkit/VOC2010/JPEGImages)
对比2008_000064图片
图片介绍文件:2008_000064.xml:
VOC2010
2008_000064.jpg
375
500
3
0
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually https://github.com/zhanghang1989/detail-api
我第一遍运行prepare_pcontext.py程序时,编译detail-api报错如下,此时我按照1.1和1.2的操作解决了问题.
gcc -pthread -B /home/meng/anaconda3/envs/yolov5py37/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include -I../common -I/home/meng/anaconda3/envs/yolov5py37/include/python3.7m -c detail/_mask.c -o build/temp.linux-x86_64-3.7/detail/_mask.o
In file included from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969:0,
from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from detail/_mask.c:461:
/home/meng/anaconda3/envs/yolov5py37/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it with " \
^~~~~~~
detail/_mask.c: In function ‘__Pyx_PyCFunction_FastCall’:
detail/_mask.c:12772:13: error: too many arguments to function ‘(PyObject * (*)(PyObject *, PyObject * const*, Py_ssize_t))meth’
return (*((__Pyx_PyCFunctionFast)meth)) (self, args, nargs, NULL);
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
detail/_mask.c: In function ‘__Pyx__ExceptionSave’:
detail/_mask.c:14254:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
*type = tstate->exc_type;
^~~~~~~~
curexc_type
detail/_mask.c:14255:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
*value = tstate->exc_value;
^~~~~~~~~
curexc_value
detail/_mask.c:14256:19: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
*tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
detail/_mask.c: In function ‘__Pyx__ExceptionReset’:
detail/_mask.c:14263:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
detail/_mask.c:14264:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
detail/_mask.c:14265:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
detail/_mask.c:14266:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = type;
^~~~~~~~
curexc_type
detail/_mask.c:14267:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = value;
^~~~~~~~~
curexc_value
detail/_mask.c:14268:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = tb;
^~~~~~~~~~~~~
curexc_traceback
detail/_mask.c: In function ‘__Pyx__GetException’:
detail/_mask.c:14323:24: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tmp_type = tstate->exc_type;
^~~~~~~~
curexc_type
detail/_mask.c:14324:25: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tmp_value = tstate->exc_value;
^~~~~~~~~
curexc_value
detail/_mask.c:14325:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tmp_tb = tstate->exc_traceback;
^~~~~~~~~~~~~
curexc_traceback
detail/_mask.c:14326:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’?
tstate->exc_type = local_type;
^~~~~~~~
curexc_type
detail/_mask.c:14327:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’?
tstate->exc_value = local_value;
^~~~~~~~~
curexc_value
detail/_mask.c:14328:13: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_traceback’; did you mean ‘curexc_traceback’?
tstate->exc_traceback = local_tb;
^~~~~~~~~~~~~
curexc_traceback
error: command 'gcc' failed with exit status 1
Installing PASCAL Context API failed, please install it manually https://github.com/zhanghang1989/detail-api
报错:RuntimeError: Failed downloading url https://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip
点开报错的链接进入:http://ttps://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip
换了几种上网方式都无法访问,大概是作者删模型文件了吧
在github上提问,作者给了三个模型的下载链接:
https://drive.google.com/drive/folders/1YFv8JR5IYol2_kDHPMXfUkjXt4Z6t_rh
将下载的文件放在下面的文件夹中
报错原因:
The reason is that you're not using multiple GPUs. Change SynBN to regular BN if you want to train on one GPU.
没有使用多个GPU进行训练,如果使用一个GPU进行训练时,将SynBN修改为regular BN
(1)修改/FastFCN/experiments/segmentation/train.py的54行
(2)修改/FastFCN/experiments/segmentation/train.py的111行
(3)移除/FastFCN/experiments/segmentation/train.py的132行
一个博主汇总的部分pytorch官方训练的resnet:
https://blog.csdn.net/sgfmby1994/article/details/103876681
在github上提问:
RuntimeError: Failed downloading url https://hangzh.s3.amazonaws.com/encoding/models/resnet50-ebb6acbb.zip · Issue #108 · wuhuikai/FastFCN · GitHub
多gpu改为单gpu:how to Change SynBN to regular BN ? · Issue #12 · wuhuikai/FastFCN · GitHub
Pascal VOC数据集分析:
Pascal Voc数据集详细分析_持久决心的博客-CSDN博客_pascal voc
知乎:关于多尺度与单一尺度的理解:
如何理解深度学习中的multi scale和single scale? - 知乎