(转)Mask_RCNN翻译和详解笔记一(原文翻译+源代码+代码使用说明)

https://blog.csdn.net/wyx100/article/details/80544083

原文:https://github.com/matterport/Mask_RCNN

欢迎指正!

名词表

Mask R-CNN(Mask Region-based Convolutional Neural Network),掩膜基于区域的卷积神经网络

RCNNs:region-based convolutional neural networks,基于区域的卷积神经网络

FPN(Feature Pyramid Network),特征金字塔网络

    论文地址:Feature Pyramid Networks for Object Detection

   中文详解(点击)

ResNet101 ,中文详解

MS COCO(Microsoft Common Objects in Context),数据集是微软团队获取的一个可以用来图像recognition+segmentation+captioning 数据集,其官方说明网址:http://mscoco.org/。

    中文详解(点击),数据集下载(点击)

px(pixel),像素

repository(缩写repo),知识库

OSM(OpenStreetMap) ,开源wiki地图,很多人们习以为常可以随便拿来用的地图,其实有很多法律和技术上的限制,这些限制使得像地图这类的地理资讯无法有创意、有效率地被再利用。开放街道地图成立动机在于希望能创造并且提供可以被自由地使用的地理资料(像街道地图)给每个想使用的人,就像自由软件所赋予使用者的自由一样。

LBP(Local Binary Pattern,局部二值模式)是一种用来描述图像局部纹理特征的算子;它具有旋转不变性和灰度不变性等显著的优点。它是首先由T. Ojala, M.Pietikäinen, 和D. Harwood 在1994年提出,用于纹理特征提取。而且,提取的特征是图像的局部的纹理特征。

HOG(Histogram of Oriented Gradient,),方向梯度直方图特征是一种在计算机视觉和图像处理中用来进行物体检测的特征描述子。它通过计算和统计图像局部区域的梯度方向直方图来构成特征。Hog特征结合SVM分类器已经被广泛应用于图像识别中,尤其在行人检测中获得了极大的成功。需要提醒的是,HOG+SVM进行行人检测的方法是法国研究人员Dalal在2005的CVPR上提出的,而如今虽然有很多行人检测算法不断提出,但基本都是以HOG+SVM的思路为主。

SPP(Spatial Pyramid Pooling),空间金字塔池化。

SSD(Single Shot MultiBox Detector),单镜头多盒(多目标)检测器

SS(selective search),选择性搜索,由[Uijlings J R R在2012年的IJCV][1]上提出。 方法以[P. F. Felzenszwalb在2004年的IJCV发表的基于图的图像分割][2]为基础,考虑色彩、纹理和尺寸的相似度,输出所有可能目标的位置,为后续的目标识别提供基础。

Haar-like特征,最早是由Papageorgiou等应用于人脸表示,Viola和Jones在此基础上,使用3种类型4种形式的特征。Haar特征分为三类:边缘特征、线性特征、中心特征和对角线特征,组合成特征模板。特征模板内有白色和黑色两种矩形,并定义该模板的特征值为白色矩形像素和减去黑色矩形像素和。Haar特征值反映了图像的灰度变化情况。例如:脸部的一些特征能由矩形特征简单的描述,如:眼睛要比脸颊颜色要深,鼻梁两侧比鼻梁颜色要深,嘴巴比周围颜色要深等。但矩形特征只对一些简单的图形结构,如边缘、线段较敏感,所以只能描述特定走向(水平、垂直、对角)的结构。

Alfred Harr,匈牙利数学家。对小波分析领域有突出贡献。Haar 运算矩阵Haar函数自从1910年就被使用了,当时是匈牙利数学家Alfred Haar介绍的。

CVPR(IEEE Conference on Computer Vision and Pattern Recognition的缩写),即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议。

IEEE(Institute of Electrical and Electronics Engineers),电气和电子工程师协会是一个国际性的电子技术与信息科学工程师的协会,是目前全球最大的非营利性专业技术学会,其会员人数超过40万人,遍布160多个国家。IEEE致力于电气、电子、计算机工程和与科学有关的领域的开发和研究,在太空、计算机、电信、生物医学、电力及消费性电子产品等领域已制定了900多个行业标准,现已发展成为具有较大影响力的国际学术组织。

译文部分

Mask R-CNN for Object Detection and Segmentation

用于物体检测和分割的掩膜循环卷积神经网络

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Mask R-CNN 是基于Python 3, Keras, and TensorFlow实现的。该模型为图片中的对象实例产生边界框和分割掩膜。该模型基于特征金字塔网络(FPN)和一个ResNet101 (一种神经网络)主结构(框架、骨架)。

The repository includes:

该库包含以下内容:

  • Source code of Mask R-CNN built on FPN and ResNet101.
  •     建立在FPN和ResNet101基础的Mask R-CNN源代码
  • Training code for MS COCO
  •     基于MS COCO数据集的训练代码
  • Pre-trained weights for MS COCO
  •     基于MS COCO数据集的预训练权重
  • Jupyter notebooks to visualize the detection pipeline at every step
  •     基于Jupyter notebooks(开发环境)可以可视化检测程序每一步的执行情况(传递途径)
  • ParallelModel class for multi-GPU training
  •      为多GPU训练提供并行模型分类
  • Evaluation on MS COCO metrics (AP)
  •     基于MS COCO数据集的评估
  • Example of training on your own dataset
  •       训练自定义数据集的实例(代码)

The code is documented and designed to be easy to extend. If you use it in your research, please consider referencing this repository. If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.

代码已经文档化(详细文档说明),很容易扩展。  如用于研究,请引用该库。如果你是从事3D(三维)可视化工作,建议你基于Matterport3D(点击跳转)发布版(效果更好,对你帮助更大)。该数据集(Matterport3D )是为了给我们的客户建立三维重建空间(3D-reconstructed spaces) ,客户同意公开并用于学术用途。了解更多实例,请点击这(here)

Getting Started

开始

  • demo.ipynb Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images. It includes code to run object detection and instance segmentation on arbitrary images.

  •  研究demo.ipynb是用于开始学习最简单方法,它提供了一个实例,基于该实例你可以使用现有模型(基于MS COCO预训练)分割你自己提供图片的目标对象。

  • train_shapes.ipynb shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.

  • train_shapes.ipynb 告诉你如何基于自有数据集训练Mask R-CNN。它引入了一个玩具数据集(形状)来演示如何基于新数据集进行训练。

  • (model.py, utils.py, config.py): These files contain the main Mask RCNN implementation.

  • 这些文件包含Mask RCNN的主要实现

  • inspect_data.ipynb. This notebook visualizes the different pre-processing steps to prepare the training data.

  • 该文件使得准备训练数据的每个预处理步骤可视化(展示)

  • inspect_model.ipynb This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.

  • 该文件深入到执行检测和分割实例对象(目标)步骤中。它使传递过程的每步都可视化。

  • inspect_weights.ipynb This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.

  • 该文件检验已训练模型的权重和查找异常和奇异的问题。   

Step by Step Detection

单步(逐步)检测

To help with debugging and understanding the model, there are 3 notebooks

(inspect_data.ipynb, inspect_model.ipynb,inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:

为了帮助大家调试和理解该模型,我们准备了3个文件(inspect_data.ipynb, inspect_model.ipynb,inspect_weights.ipynb),这些文件提供了可视化功能,并且支持逐步运行,方便查看每个节点的输出。例子如下:

1. Anchor sorting and filtering

1.目标(锚)分类和过滤

Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement. 

使初始阶段RPN每步可视化,根据目标增强的目标框图展示正面和反面目标。

 

2. Bounding Box Refinement

 

2.优化边框

This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage. 

该例子展示了最终检测的框图(点画线)和第二步优化后的框图(实线)

3. Mask Generation

3.掩膜生成

Examples of generated masks. These then get scaled and placed on the image in the right location.

掩膜生成的例子,会被调整(缩放,缩小或放大)并放置在图像的正确位置。

4.Layer activations

激活层

Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).

激活层一般用于,检查不同层的激活情况,并寻找故障的迹象(全零或随机噪声)。

5. Weight Histograms

权重直方图

Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.

检测权重直方图,是另一种有用的调试工具。这些包含在 inspect_weights.ipynb文件中

6. Composing the different pieces into a final result

6.最终合成

Training on MS COCO

基于MS COCO数据集的训练(基于linux系统)

We're providing pre-trained weights for MS COCO to make it easier to start. You can use those weights as a starting point to train your own variation on the network. Training and evaluation code is in samples/coco/coco.py. You can import this module in Jupyter notebook (see the provided notebooks for examples) or you can run it directly from the command line as such:

为了方便大家(开始)学习,我们提供了基于MS COCO数据集的预训练权重。大家可以把这些权重作为学习的起点,基于网络训练自有变量。训练和评估的代码在samples/coco/coco.py文件。大家还可以在Jupyter notebook(见提供的notebooks实例)导入模块。大家还可以用以下命令直接运行。

# Train a new model starting from pre-trained COCO weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco
 
# Train a new model starting from ImageNet weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet
 
# Continue training a model that you had trained earlier
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
 
# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last

You can also run the COCO evaluation code with:

可以使用如下指令运行COCO(数据集)评估代码

# Run COCO evaluation on the last trained model
python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last

The training schedule, learning rate, and other parameters should be set in samples/coco/coco.py.

训练任务、学习率等参数配置见samples/coco/coco.py。

Training on Your Own Dataset

训练数据集

Start by reading this blog post about the balloon color splash sample. It covers the process starting from annotating images to training to using the results in a sample application.

通过读blog post about the balloon color splash sample这篇文章开始。它覆盖了始于注释图像(annotating images)到训练的过程,在类似应用中,可以使用这些已有成果。

In summary, to train the model on your own dataset you'll need to extend two classes:

总之,在数据集上训练模型,需要扩展两个类:

Config This class contains the default configuration. Subclass it and modify the attributes you need to change.

它包含默认配置。细分并根据需要调整属性,细分分类和根据需求调整属性。

Dataset This class provides a consistent way to work with any dataset. It allows you to use new datasets for training without having to change the code of the model. It also supports loading multiple datasets at the same time, which is useful if the objects you want to detect are not all available in one dataset.

数据集 该类提供了基于数据集使用的连续性方法。通过它,可以在不改变模型代码的情况下训练新数据集。它支持同时加载多个数据集,我们期望检测的目标如果在一个数据集中不可用,它就很有用,

See examples in samples/shapes/train_shapes.ipynbsamples/coco/coco.pysamples/balloon/balloon.py, and samples/nucleus/nucleus.py.

研读以下文件:samples/shapes/train_shapes.ipynbsamples/coco/coco.pysamples/balloon/balloon.py, 和samples/nucleus/nucleus.py.

Differences from the Official Paper

和官方论文的不同

This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.

本文大部分内容遵循Mask RCNN的论文,但是为了使代码简单和泛化,存在一些我们知道的差异。如果遇到其他差异,欢迎告知,谢谢。

  • Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.

  • 图像缩放:为了支持训练每个批次的大部分图像,缩放所有图像为统一尺寸(宽和高)。例如:1024x1024像素(px)基于MS COCO数据集。保持图像宽高(纵横)比,如果图像不是正方形,用零来填充成正方形。在本文中,对图像进行了调整,使最小边800像素,最大边1000 像素。

  • Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.

    To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.

  • 边界框:有些数据集只提供边界框或者掩膜。为了支持基于多个数据集的训练,选择忽略与数据集一起产生的边界框,而重新生成它们。我们选择一个最小的框子,它将掩码的所有像素封装在边界框中。这简化了操作,并且也使得应用图像增强很容易,否则将更难应用于边界框,例如图像旋转。

  • 为了验证这种方法,我们比较了我们的计算得到的和由COCO数据集提供的边界框。我们发现,2%的边界盒相差1px以上,0.05%相差5px以上,只有0.01%的相差10px以上。

  • Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.

  • 学习率:本文使用0.02的学习率,但我们发现它太高,并且经常导致权重爆炸,特别是当使用较小的批次规模(批次中样本较少)。它可能与Caffe和TensorFlow 计算梯度(通过批次和GPU的批次、均值)之间的差异有关。或者,也许官方模型使用梯度裁剪来避免这个问题。我们确实使用了梯度裁剪,但是没有把它设置得过于激进。我们发现较小的学习速率收敛得更快,所以我们就这样做。。

Contributing

贡献

Contributions to this repository are welcome. Examples of things you can contribute:

欢迎为该库做出贡献,可以做出贡献举例:

  • Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
  • 速度提升。比如基于TensorFlow或Cython中重新编写一些Python代码。
  • Training on other datasets.
  • 基于其他数据集的训练
  • Accuracy Improvements.
  • 精度提升
  • Visualizations and examples.
  • 可视化和实例

You can also join our team and help us build even more projects like this one.

欢迎加入我们团队帮助我们建立更多类似的项目(工程)。

Requirements

环境要求

Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt.

Python 3.4, TensorFlow 1.3, Keras 2.0.8 和其他开发包(包括版本)清单见requirements.txt文件。

MS COCO Requirements:

MS COCO数据集要求

To train or test on MS COCO, you'll also need:

基于MS COCO数据集训练和测试,需要注意:

  • pycocotools (installation instructions below)
  • pycocotools( 安装说明在下面)
  • MS COCO Dataset
  • MS COCO数据集
  • Download the 5K minival and the 35K validation-minus-minival subsets. More details in the original Faster R-CNN implementation.
  • 下载5K miniva和35K validation-minus-minival 子集。更多的细节见Faster R-CNN 操作。

If you use Docker, the code has been verified to work on this Docker container.

如果使用Docker,在Docker容器上,代码已经验证过。

Installation

安装

1、Install dependencies 安装依赖的库

pip3 install -r requirements.txt

2、Clone this repository 复制(克隆)库
3、Run setup from the repository root directory 在库根目录运行安装

python3 setup.py install

4、Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page. 在此下载 预训练权重文件。

5、(Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn’t seem to be active anymore).(可选)基于MS COCO数据集训练或测试请根据官方说明安装pycocotools。有基于pycocotools的用于Python3和Windows(操作系统)的维护分支(官方知识库维护的不怎么好)

1、Linux: https://github.com/waleedka/coco
2、Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details) Windows:https://github.com/philferriere/cocoapi。环境需要安装和配置Visual C++ 2015 build tools(更多详情见官方库) 。

Projects Using this Model

使用模型的工程

If you extend this model to other datasets or build projects that use it, we'd love to hear from you.

如果将此模型扩展到其他数据集或使用它构建工程,我们很乐意收到您的建议或反馈。

4K Video Demo by Karol Majek.

4K视频演示(Karol Majek提供)

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

图像到地图:通过增加棒球、足球、网球、足球和篮球场来改进OpenStudioMax。

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

色彩渲染。一篇博客文章解释如何从头开始训练这个模型并使用它来实现色彩渲染效果。

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

显微镜图像中的核分割。建于2018年的数据科学竞赛。该竞赛介绍

Code is in the samples/nucleus directory.

代码见samples/nucleus目录

Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.

测绘挑战:转换卫星图像到地图供人道主义组织使用。

 

实验说明

注意:请使用release版本体验,否则下载到作者正在优化的代码(代码修改、权重文件调整、库不匹配等),不能实现练习效果。

1.版本选择

https://github.com/matterport/Mask_RCNN/releases中Mask R-CNN 2.1(下载Source code)体验。

2.权重文件选择

  mask_rcnn_coco.h5可用Mask R-CNN 2.0中。

3.权重和源代码存放位置

mask_rcnn_coco.h5放到jupyter中Mask R-CNN 2.1文件夹下

4.执行代码文件选择

    点击demo.ipynb文件

5.文件在jupyter中如何执行

    访问demo.ipynb文件,点击执行按钮(或ctrl+enter),执行代码。

推荐阅读:

Faster R-CNN论文笔记——FR   汇总并归纳总结

人脸检测——Faster R-CNN

tensorflow笔记:多层LSTM代码分析

https://blog.csdn.net/jiongnima/article/details/79094159

实例分割模型Mask R-CNN详解:从R-CNN,Fast R-CNN,Faster R-CNN再到Mask R-CNN

RCNN, fast RCNN, faster RCNN, mask RCNN

Mask RCNN训练自己的数据集  jupter 改为.py格式。

Mask-RCNN技术解析

使用keras版本的Mask-RCNN来训练自己的数据集,制作方法超简单

Mask R-CNN+tensorflow/keras的配置介绍、代码详解与训练自己的数据集演示

Mask_RCNN训练自己的数据,制作类似于COCO数据集中所需要的Json训练集

使用labelImg制作自己的数据集(VOC2007格式)用于Faster-RCNN训练

系统学习深度学习(四十四)--Mask R-CNN

R-CNN+SPP-NET+Fast-R-CNN+Faster-R-CNN+YOLO+SSD阅读笔记

 

 

未完待续。。。

你可能感兴趣的:(python,深度,mask-rcnn)