源码:https://github.com/matterport/Mask_RCNN
主要问题:Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR和
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1/convolution}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
本来是打算tensorflow-gpu1.13.1+cuda10.1+cudnn7.5+keras,这些都是当前最新版本,配置好了后发现tensorflow1.13.1暂且不支持cuda10.1,故而换成了cuda10.0,依然报错,这里注意cudnn也要更换,我是换成了支持cuda10.0的cudnn7.5,然后运行demo,可以直接jupyter notebook中运行demo.ipynb,也可以导出成demo.py然后终端或者其它IED执行,我习惯使用终端,而且终端和jupyter notebook运行提示情形可能有详略之分,不过需要注意的是,导出的demo.py需要注释掉一行代码:get_ipython().run_line_magic('matplotlib', 'inline')。有一些依赖项需要安装,在requirements.txt中,还有其它文件需要下载,预训练的权重文件mask_rcnn_coco.h5(https://github.com/matterport/Mask_RCNN/releases),pycocotools文件(https://github.com/waleedka/coco),在PythonAPI目录下,如果使用python3这里需要修改Makefile中的python为python3,倘若默认python就是python3,那倒不必修改,总之要注意版本的对应,make运行Makefile后,会在pycocotools目录下生成一个_mask文件,这个就是程序所需要的pycocotools._mask,然后把pycocotools目录复制到samples/coco下,倘若运行demo的时候报错缺少这个模块,那么就是这方面有问题,要么是没有复制到指定位置,要么是没有make,要么是make生成的文件版本不一致,关于版本不一致需要注意(后面会提到)。
在此环境下,我运行demo时结果如下:
Using TensorFlow backend.
Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 93
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001
WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/hhm/桌面/Mask_RCNN-master/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-10 19:11:31.584481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-10 19:11:31.609012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496000000 Hz
2019-04-10 19:11:31.609640: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc617c450 executing computations on platform Host. Devices:
2019-04-10 19:11:31.609677: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-04-10 19:11:31.752051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-10 19:11:31.753130: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc61401b0 executing computations on platform CUDA. Devices:
2019-04-10 19:11:31.753182: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1
2019-04-10 19:11:31.753590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.65GiB
2019-04-10 19:11:31.753632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-10 19:11:31.782047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-10 19:11:31.782119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-10 19:11:31.782138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-10 19:11:31.799722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1461 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Processing 1 images
image shape: (375, 500, 3) min: 0.00000 max: 255.00000 uint8
molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64
image_metas shape: (1, 93) min: 0.00000 max: 1024.00000 float64
anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32
2019-04-10 19:11:43.651131: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-04-10 19:11:49.326969: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-04-10 19:11:49.341018: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "demo.py", line 130, in
results = model.detect([image], verbose=1)
File "/home/hhm/桌面/Mask_RCNN-master/mrcnn/model.py", line 2540, in detect
self.keras_model.predict([molded_images, image_metas, anchors], verbose=0)
File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
batch_outs = f(ins_batch)
File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1/convolution}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
有Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR错误,还有tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1/convolution}}]]
[[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]
我搜索了很久,什么删除主目录下隐藏的缓存文件.nv,什么cudnn版本不匹配,都没用,尝试过官网给出的能够匹配CUDA10.0的所有版本的CUDNN,均是这个结果,我迟迟没有降低tensorflow版本,想着降版本确实要麻烦些,对应的cuda,cudnn等等都要更换,而且怕有残留有影响。过了好几天这个问题还是没有解决(说实话这是白白浪费了好久的时间)。我最终尝试降低tensorflow的版本,我有两个tensorflow(一个是pip安装的,一个是conda安装的,最初使用的是pip安装的,后来出了上述问题,我就又用conda安装了,但是conda安装的和pip安装的共存了,都在,有两个tensorflow-gpu,只不过其它的比如tensorboard,tensorflow-base等等都替换成了conda安装的,但是我在conda的虚拟环境中使用的却还是pip的,这个不知道什么原因),我卸载了pip安装的tensorflow,试图使用conda的tensorflow,发现并不能,import的时候提示没有tensorflow模块,我也没有继续做多少尝试,索性全部卸载了,包括conda下安装的和tensorflow相关的软件。
然后安装tensorflow1.12.0,这里安装需要注意,不能直接:
conda install tensorflow-gpu
这样又会默认安装最新版即1.13.1,而是要如下:
conda install tensorflow-gpu=1.12
这样安装好了,再运行demo,缺少了requirements.txt中的一些依赖项,keras也需要重装,不知道为什么卸载tensorflow会导致这些包也没了,故而重新安装,我是能在conda中安装的就在conda中安装,不能就使用pip或者直接安装,安装完成后运行demo报错提示缺少pycocotools._mask模块,但是我想着这个模块并没有替换,感到奇怪,然后重新make一遍,复制过来,依然报错如此,然后发现conda环境下的python变成了3.6(之前是3.7,对应tensorflow1.13.1,这里换成tensorflow1.12.0自动换成了python3.6),然后我将Makefile中的python3修改成python3.6(之前是通过python3.7make的),这样报错,提示缺少模块,改回python3,此时conda外部环境是有python2.7、3.6、3.7,但是python3默认是3.7,使用python3make则是python3.7make,这样还是出问题,版本高于conda中的python3.6,然后激活conda下的tensorflow环境,conda activate tensorflow_3.7(这里tensorflow_3.7是我的tensorflow环境的名称,是之前自己创建的),进入此环境再make,此时生成的_mask文件是适合于conda下的python3.6,复制到samples/coco目录下,运行demo结果如下:
Using TensorFlow backend.
Configurations:
BACKBONE resnet101
BACKBONE_STRIDES [4, 8, 16, 32, 64]
BATCH_SIZE 1
BBOX_STD_DEV [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE None
DETECTION_MAX_INSTANCES 100
DETECTION_MIN_CONFIDENCE 0.7
DETECTION_NMS_THRESHOLD 0.3
FPN_CLASSIF_FC_LAYERS_SIZE 1024
GPU_COUNT 1
GRADIENT_CLIP_NORM 5.0
IMAGES_PER_GPU 1
IMAGE_CHANNEL_COUNT 3
IMAGE_MAX_DIM 1024
IMAGE_META_SIZE 93
IMAGE_MIN_DIM 800
IMAGE_MIN_SCALE 0
IMAGE_RESIZE_MODE square
IMAGE_SHAPE [1024 1024 3]
LEARNING_MOMENTUM 0.9
LEARNING_RATE 0.001
LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE 14
MASK_SHAPE [28, 28]
MAX_GT_INSTANCES 100
MEAN_PIXEL [123.7 116.8 103.9]
MINI_MASK_SHAPE (56, 56)
NAME coco
NUM_CLASSES 81
POOL_SIZE 7
POST_NMS_ROIS_INFERENCE 1000
POST_NMS_ROIS_TRAINING 2000
PRE_NMS_LIMIT 6000
ROI_POSITIVE_RATIO 0.33
RPN_ANCHOR_RATIOS [0.5, 1, 2]
RPN_ANCHOR_SCALES (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE 1
RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD 0.7
RPN_TRAIN_ANCHORS_PER_IMAGE 256
STEPS_PER_EPOCH 1000
TOP_DOWN_PYRAMID_SIZE 256
TRAIN_BN False
TRAIN_ROIS_PER_IMAGE 200
USE_MINI_MASK True
USE_RPN_ROIS True
VALIDATION_STEPS 50
WEIGHT_DECAY 0.0001
WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
2019-04-13 15:34:51.692344: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-13 15:34:52.387094: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-13 15:34:52.388045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.62GiB
2019-04-13 15:34:52.388099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-13 15:34:59.753714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-13 15:34:59.753792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-04-13 15:34:59.753815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-04-13 15:34:59.754226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Processing 1 images
image shape: (640, 480, 3) min: 0.00000 max: 255.00000 uint8
molded_images shape: (1, 1024, 1024, 3) min: -123.70000 max: 151.10000 float64
image_metas shape: (1, 93) min: 0.00000 max: 1024.00000 float64
anchors shape: (1, 261888, 4) min: -0.35390 max: 1.29134 float32
2019-04-13 15:35:11.103215: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:11.189938: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.12GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.401613: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.663684: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.697191: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.766519: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.859735: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.253020: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.627718: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.650576: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 845.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
下面的存储分配不知道是什么,看起来是显存不够,这里使用的只是2G的1050,显存太小了,如果好些的显存大的显卡可能没有这个问题,或者适当调小batchsize。不过好像成功了,没有像之前那样报错了。
conda下载tensorflow1.12.0的时候也自动下载了匹配的cuda和cudnn,分别是cuda9.2和cudnn7.3.1,我并没有手动为tensorflow1.12.0配置cuda和cudnn,没有查看tensorflow1.12.0是否支持cuda10.0,这样运行demo结果就是如上,似乎没有问题,然后我之前手动安装的cuda10.1和cuda10.0均没有变化,而且我通过nvcc --version查看也还是cuda10.0在使用,这里存在疑问。