初步使用Mask RCNN(运行demo)以及遇到的问题

源码:https://github.com/matterport/Mask_RCNN

主要问题:Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR和

tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv1/convolution}}]]
     [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

 

一、环境配置

本来是打算tensorflow-gpu1.13.1+cuda10.1+cudnn7.5+keras,这些都是当前最新版本,配置好了后发现tensorflow1.13.1暂且不支持cuda10.1,故而换成了cuda10.0,依然报错,这里注意cudnn也要更换,我是换成了支持cuda10.0的cudnn7.5,然后运行demo,可以直接jupyter notebook中运行demo.ipynb,也可以导出成demo.py然后终端或者其它IED执行,我习惯使用终端,而且终端和jupyter notebook运行提示情形可能有详略之分,不过需要注意的是,导出的demo.py需要注释掉一行代码:get_ipython().run_line_magic('matplotlib', 'inline')。有一些依赖项需要安装,在requirements.txt中,还有其它文件需要下载,预训练的权重文件mask_rcnn_coco.h5(https://github.com/matterport/Mask_RCNN/releases),pycocotools文件(https://github.com/waleedka/coco),在PythonAPI目录下,如果使用python3这里需要修改Makefile中的python为python3,倘若默认python就是python3,那倒不必修改,总之要注意版本的对应,make运行Makefile后,会在pycocotools目录下生成一个_mask文件,这个就是程序所需要的pycocotools._mask,然后把pycocotools目录复制到samples/coco下,倘若运行demo的时候报错缺少这个模块,那么就是这方面有问题,要么是没有复制到指定位置,要么是没有make,要么是make生成的文件版本不一致,关于版本不一致需要注意(后面会提到)

 

二、运行

在此环境下,我运行demo时结果如下:

Using TensorFlow backend.

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                93
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    81
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001


WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/hhm/桌面/Mask_RCNN-master/mrcnn/model.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-04-10 19:11:31.584481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-10 19:11:31.609012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496000000 Hz
2019-04-10 19:11:31.609640: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc617c450 executing computations on platform Host. Devices:
2019-04-10 19:11:31.609677: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): , 
2019-04-10 19:11:31.752051: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-10 19:11:31.753130: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x559fc61401b0 executing computations on platform CUDA. Devices:
2019-04-10 19:11:31.753182: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1050, Compute Capability 6.1
2019-04-10 19:11:31.753590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.65GiB
2019-04-10 19:11:31.753632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-10 19:11:31.782047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-10 19:11:31.782119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-10 19:11:31.782138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-10 19:11:31.799722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1461 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Processing 1 images
image                    shape: (375, 500, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
2019-04-10 19:11:43.651131: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-04-10 19:11:49.326969: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-04-10 19:11:49.341018: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "demo.py", line 130, in 
    results = model.detect([image], verbose=1)
  File "/home/hhm/桌面/Mask_RCNN-master/mrcnn/model.py", line 2540, in detect
    self.keras_model.predict([molded_images, image_metas, anchors], verbose=0)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training.py", line 1169, in predict
    steps=steps)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 294, in predict_loop
    batch_outs = f(ins_batch)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node conv1/convolution}}]]
	 [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

有Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR错误,还有tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[{{node conv1/convolution}}]]
     [[{{node mrcnn_detection/map/TensorArrayUnstack/range}}]]

我搜索了很久,什么删除主目录下隐藏的缓存文件.nv,什么cudnn版本不匹配,都没用,尝试过官网给出的能够匹配CUDA10.0的所有版本的CUDNN,均是这个结果,我迟迟没有降低tensorflow版本,想着降版本确实要麻烦些,对应的cuda,cudnn等等都要更换,而且怕有残留有影响。过了好几天这个问题还是没有解决(说实话这是白白浪费了好久的时间)。我最终尝试降低tensorflow的版本,我有两个tensorflow(一个是pip安装的,一个是conda安装的,最初使用的是pip安装的,后来出了上述问题,我就又用conda安装了,但是conda安装的和pip安装的共存了,都在,有两个tensorflow-gpu,只不过其它的比如tensorboard,tensorflow-base等等都替换成了conda安装的,但是我在conda的虚拟环境中使用的却还是pip的,这个不知道什么原因),我卸载了pip安装的tensorflow,试图使用conda的tensorflow,发现并不能,import的时候提示没有tensorflow模块,我也没有继续做多少尝试,索性全部卸载了,包括conda下安装的和tensorflow相关的软件。

然后安装tensorflow1.12.0,这里安装需要注意,不能直接:

conda install tensorflow-gpu

这样又会默认安装最新版即1.13.1,而是要如下:

conda install tensorflow-gpu=1.12

这样安装好了,再运行demo,缺少了requirements.txt中的一些依赖项,keras也需要重装,不知道为什么卸载tensorflow会导致这些包也没了,故而重新安装,我是能在conda中安装的就在conda中安装,不能就使用pip或者直接安装,安装完成后运行demo报错提示缺少pycocotools._mask模块,但是我想着这个模块并没有替换,感到奇怪,然后重新make一遍,复制过来,依然报错如此,然后发现conda环境下的python变成了3.6(之前是3.7,对应tensorflow1.13.1,这里换成tensorflow1.12.0自动换成了python3.6),然后我将Makefile中的python3修改成python3.6(之前是通过python3.7make的),这样报错,提示缺少模块,改回python3,此时conda外部环境是有python2.7、3.6、3.7,但是python3默认是3.7,使用python3make则是python3.7make,这样还是出问题,版本高于conda中的python3.6,然后激活conda下的tensorflow环境,conda activate tensorflow_3.7(这里tensorflow_3.7是我的tensorflow环境的名称,是之前自己创建的),进入此环境再make,此时生成的_mask文件是适合于conda下的python3.6,复制到samples/coco目录下,运行demo结果如下:

Using TensorFlow backend.

Configurations:
BACKBONE                       resnet101
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]
COMPUTE_BACKBONE_SHAPE         None
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
FPN_CLASSIF_FC_LAYERS_SIZE     1024
GPU_COUNT                      1
GRADIENT_CLIP_NORM             5.0
IMAGES_PER_GPU                 1
IMAGE_CHANNEL_COUNT            3
IMAGE_MAX_DIM                  1024
IMAGE_META_SIZE                93
IMAGE_MIN_DIM                  800
IMAGE_MIN_SCALE                0
IMAGE_RESIZE_MODE              square
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.001
LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [123.7 116.8 103.9]
MINI_MASK_SHAPE                (56, 56)
NAME                           coco
NUM_CLASSES                    81
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
PRE_NMS_LIMIT                  6000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)
RPN_ANCHOR_STRIDE              1
RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]
RPN_NMS_THRESHOLD              0.7
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                1000
TOP_DOWN_PYRAMID_SIZE          256
TRAIN_BN                       False
TRAIN_ROIS_PER_IMAGE           200
USE_MINI_MASK                  True
USE_RPN_ROIS                   True
VALIDATION_STEPS               50
WEIGHT_DECAY                   0.0001


WARNING:tensorflow:From /home/hhm/anaconda3/envs/tensorflow_3.7/lib/python3.6/site-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
2019-04-13 15:34:51.692344: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-13 15:34:52.387094: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-13 15:34:52.388045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.62GiB
2019-04-13 15:34:52.388099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-13 15:34:59.753714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-13 15:34:59.753792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-04-13 15:34:59.753815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-04-13 15:34:59.754226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
Processing 1 images
image                    shape: (640, 480, 3)         min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64
image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
2019-04-13 15:35:11.103215: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:11.189938: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.12GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.401613: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.663684: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.697191: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.766519: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:12.859735: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.253020: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.13GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.627718: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.71GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-04-13 15:35:13.650576: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 845.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

下面的存储分配不知道是什么,看起来是显存不够,这里使用的只是2G的1050,显存太小了,如果好些的显存大的显卡可能没有这个问题,或者适当调小batchsize。不过好像成功了,没有像之前那样报错了。

 

三、一点疑问

conda下载tensorflow1.12.0的时候也自动下载了匹配的cuda和cudnn,分别是cuda9.2和cudnn7.3.1,我并没有手动为tensorflow1.12.0配置cuda和cudnn,没有查看tensorflow1.12.0是否支持cuda10.0,这样运行demo结果就是如上,似乎没有问题,然后我之前手动安装的cuda10.1和cuda10.0均没有变化,而且我通过nvcc --version查看也还是cuda10.0在使用,这里存在疑问。

你可能感兴趣的:(Linux,C/C++)