文章题目:《ChangeNet: A Deep Learning Architecture for Visual Change Detection》
文章引用格式:Ashley Varghese, Jayavardhana Gubbi, Akshaya Ramaswamy, and Balamuralidhar P. "ChangeNet: A Deep Learning Architecture for Visual Change Detection." European Conference on Computer Vision (ECCV), 2018.
The increasing urban population in cities necessitates the need for the development of smart cities that can offer better services to its citizens. Drone technology plays a crucial role in the smart city environment and is already involved in a number of functions in smart cities such as traffic control and construction monitoring. A major challenge in fast growing cities is the encroachment of public spaces. A robotic solution using visual change detection can be used for such purposes. For the detection of encroachment, a drone can monitor outdoor urban areas over a period of time to infer the visual changes. Visual change detection is a higher level inference task that aims at accurately identifying variations between a reference image (historical) and a new test image depicting the current scenario. In case of images, the challenges are complex considering the variations caused by environmental conditions that are actually unchanged events. Human mind interprets the change by comparing the current status with historical data at intelligence level rather than using only visual information. In this paper, we present a deep architecture called ChangeNet for detecting changes between pairs of images and express the same semantically (label the change). A parallel deep convolutional neural network (CNN) architecture for localizing and identifying the changes between image pair has been proposed in this paper. The architecture is evaluated with VL-CMU-CD street view change detection, TSUNAMI and Google Street View (GSV) datasets that resemble drone captured images. The performance of the model for different lighting and seasonal conditions are experimented quantitatively and qualitatively. The result shows that ChangeNet outperforms the state of the art by achieving 98.3% pixel accuracy, 77.35% object based Intersection over Union (IoU) and 88.9% area under Receiver Operating Characteristics (RoC) curve.
首先作者提到了智慧城市,然后谈到城市的变化非常快。为了自动检测出这种变化,作者提出了ChangeNet,它是一个并行CNN结构来检测图像对之间的变化。实验基于三个数据集VL-CMU-CD,TSUNAMI,Google Street View。实验表明ChangeNet达到了98.3%的识别精度,77.35%的IoU和88.9%的RoC。
基本上就是一个孪生网络的结构,上下两个CNN的权值共享,作者提出的这个网络和孪生网络一个最大的区别是,权重和反卷积是无关的(not tied),这可以使模型提升5%。
最后为了获得mask,需要进行上采样获得和输入相同的feature map,这里作者上采样使用了双线性内插来替代卷积(Upsampling is done with bilinear interpolation filter)。然后把两个并行网络的输出做连接。连接好的输出最后再连接一个softmax做分类就OK。
# 以下代码基于tensorflow,简单写下思路,真正在写的时候,除了train,val,test,还要考虑placeholder,loss,optimizer等等,这些都要设置
### 导入相关库
from tensorflow.contrib import slim
from tensorflow.contrib.slim.nets import resnet_v2
from tensorflow.contrib.slim.python.slim.nets.resnet_utils import resnet_arg_scope
### 读取数据
before_info = ........
after_info = ........
### 用resnet提取两组数据的特征
with slim.arg_scope(resnet_arg_scope):
net1, end_points1 = resnet_v2.resnet_v2_50(before_info)
net2, end_points2 = resnet_v2.resnet_v2_50(after_info, reuse=True)
### 拿出不同层的特征
b1 = end_points1[.......]
b2 = end_points1[.......]
a1 = end_points2[.......]
a2 = end_points2[.......]
### 然后反卷积
w = tf.constant(1.0, shape=[.......])
b1_mask = tf.nn.conv2d_transpose(......)
b2_mask = tf.nn.conv2d_transpose(......)
a1_mask = tf.nn.conv2d_transpose(......)
a2_mask = tf.nn.conv2d_transpose(......)
### 然后连接,用tf.concat(....)
### 最后softmax就可以了