源码地址:
https://github.com/xdever/RFCN-tensorflow
简单结构:
k^2(C+1)的conv: ResNet101的最后的输出是WxHx1024,用K^2(C+1)个1x1的卷积核去卷积,即可得到K^2(C+1)个大小为WxH的position sensitive的score map。这步的卷积操作就是在做prediction。这里的k=3,表示把一个ROI划分成3*3,对应的9个位置分别是:上左(左上角),上中,上右,中左,中中,中右,下左,下中,下右(右下角)
k^2(C+1)个feature map的物理意义: 共有k x k = 9个颜色,每个颜色的立体块(WxHx(C+1))表示的是不同位置存在目标的概率值(第一块黄色表示的是左上角位置,最后一块淡蓝色表示的是右下角位置)。
pooling公式
z(i,j,c)是第i+k*(j-1)个立体块上的第c个map(1<= i,j <=3)。(i,j)决定了9种位置的某一种位置,假设为左上角位置(i=j=1),c决定了哪一类,假设为person类。在z(i,j,c)这个feature map上的某一个像素的位置是(x,y),像素值是value,则value表示的是原图对应的(x,y)这个位置上可能是人(c=‘person’)且是人的左上部位(i=j=1)的概率值
- ROI pooling的输入和输出:ROI pooling操作的输入(对于C+1个类)是k^2(C+1)W' H'(W'和H'是ROI的宽度和高度)的score map上某ROI对应的那个立体块(由RPN预测的box坐标,在feature map上进行裁剪),且该立体块组成一个新的k^2(C+1)W' H'的立体块:每个颜色的立体块(C+1)都只抠出对应位置的一个bin,把这kk个bin组成新的立体块,大小为(C+1)W'H'。例如,下图中的第一块黄色只取左上角的bin,最后一块淡蓝色只取右下角的bin。所有的bin重新组合后就变成了类似右图的那个薄的立体块(图中的这个是池化后的输出,即每个面上的每个bin上已经是一个像素。池化前这个bin对应的是一个区域,是多个像素)。ROI pooling的输出为为一个(C+1)k*k的立体块
介绍结束,下面开始代码。。。
注意这是非官方版本的代码。仅仅是为了学习关于R-FCN的检测流程而去读的这份代码,代码本人并没有调试。
代码目录
Google搜索了很多博客,发现大家其实并不是特别关注R-FCN模型。就把这分代码捣鼓一下吧。
- main.py是进行train的文件,如果你想进行训练可以修改这里的代码。
- testCheckpoint.py是进行test的文件,就是测试ckpt文件中保存参数的shape的一个文件,这是与整个模型独立的一个文件,仅仅是对ckpt进行检查的文件。
- test.py 这个文件是进行检测模型测试的文件。
test.py
所有的检测代码大部分都是相同的流程,所以首先看test.py模型代码。
parser = argparse.ArgumentParser(description="RFCN tester")
parser.add_argument('-gpu', type=str, default="0", help='Train on this GPU(s)')
parser.add_argument('-n', type=str, help='Network checkpoint file')
parser.add_argument('-i', type=str, help='Input file.')
parser.add_argument('-o', type=str, default="", help='Write output here.')
parser.add_argument('-p', type=int, default=1, help='Show preview')
parser.add_argument('-threshold', type=float, default=0.5, help='Detection threshold')
parser.add_argument('-delay', type=int, default=-1, help='Delay between frames in visualization. -1 for automatic, 0 for wait for keypress.')
这里的超参数设置的比较简单。
palette = Visualize.Palette(len(categories))
image = tf.placeholder(tf.float32, [None, None, None, 3])
net = BoxInceptionResnet(image, len(categories), name="boxnet")
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
input = PreviewIO.PreviewInput(opt.i)
output = PreviewIO.PreviewOutput(opt.o, input.getFps())
这里使用BoxInceptionResnet构建R-FCN模型。
使用PreviewIO进行文件的读写。
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
这里直接通过net.getBoxes获取模型的检测输出的占位符。
只看test.py 的主要结构。
with tf.Session() as sess:
.........
# 检查模型
if not CheckpointLoader.loadCheckpoint(sess, None, opt.n, ignoreVarsInFileNotInSess=True):
..........
# 对img进行预处理
img = preprocessInput(img)
...........
# 把图片输入到检测模型进行计算
rBoxes, rScores, rClasses = sess.run([boxes, scores, classes], feed_dict={image: np.expand_dims(img, 0)})
...........
# 对输入的结果进行可视化
res = Visualize.drawBoxes(img, rBoxes, rClasses, [categories[i] for i in rClasses.tolist()], palette, scores=rScores)
main.py
main.py就是进行训练的函数,为了简化训练过程,下面只列出重要的训练过程。下面只看loss
# 数据读取
dataset = BoxLoader()
dataset.add(CocoDataset(opt.dataset, randomZoom=opt.randZoom==1, set="train"+opt.cocoVariant))
if opt.mergeValidationSet==1:
dataset.add(CocoDataset(opt.dataset, set="val"+opt.cocoVariant))
.....................
# 获取图片的标签
images, boxes, classes = Augment.augment(*dataset.get())
# 获取检测模型
net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)
# 获取模型的loss
tf.losses.add_loss(net.getLoss(boxes, classes))
# 构建loss操作
def createUpdateOp(gradClip=1):
with tf.name_scope("optimizer"):
optimizer=tf.train.AdamOptimizer(learning_rate=opt.learningRate, epsilon=opt.adamEps)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
totalLoss = tf.losses.get_total_loss()
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
if gradClip is not None:
cGrads = []
for g, v in grads:
if g is None:
print("WARNING: no grad for variable "+v.op.name)
continue
cGrads.append((tf.clip_by_value(g, -float(gradClip), float(gradClip)), v))
grads = cGrads
update_ops.append(optimizer.apply_gradients(grads))
return control_flow_ops.with_dependencies([tf.group(*update_ops)], totalLoss, name='train_op')
# 构建优化操作
trainOp=createUpdateOp()
在main.py文件中有一个while=true的死循环,使用RunManager对训练进行管理。如下:
runManager = RunManager(sess, options=runOptions, run_metadata=runMetadata)
runManager.add("train", [globalStepInc,trainOp], modRun=1)
在while中进行优化训练
......
# 进行训练
while True:
#run various parts of the network
res = runManager.modRun(i)
.....
# 可是化训练结果
visualizer.draw(res)
下面来看检测模型BoxInceptionResnet。
在test.py文件中使用net.getBoxes的的方法。
net = BoxInceptionResnet(image, len(categories), name="boxnet")
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
在train.py中使用BoxInceptionResnet的net.getLoss和net.getVariables。
net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)
tf.losses.add_loss(net.getLoss(boxes, classes))
......
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
所以在进行源码查看时就看这几个函数。
发现BoxInceptionResnet是BoxNetwork的子类。
BoxNetwork.py
# BoxNetwork.py
class BoxNetwork:
def __init__(self, nCategories, rpnLayer, rpnDownscale, rpnOffset, featureLayer=None, featureDownsample=None, featureOffset=None, weightDecay=1e-6, hardMining=True):
'''
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
nCategories: 检测的种类
rpnInput:输入到rpn的feature map
rpnDownscale : rpnDownscale下采样的尺度
rpnOffset: 图片的偏置
featureInput : 经过编码的featureInput
featureDownsample: featureDownsample下采样的尺度
featureOffset:feature的偏置尺寸
'''
if featureLayer is None:
featureLayer=rpnLayer
if featureDownsample is None:
featureDownsample=rpnDownscale
if featureOffset is None:
rpnOffset=featureOffset
with tf.name_scope("BoxNetwork"):
self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)
self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)
# 使用阀值过滤筛选出合适的box坐标
def getProposals(self, threshold=None):
if threshold is not None and threshold>0:
s = tf.cast(tf.where(self.proposalScores > threshold), tf.int32)
return tf.gather_nd(self.proposals, s), tf.gather_nd(self.proposalScores, s)
else:
return self.proposals, self.proposalScores
# 使用boxRefiner对feature map 进行classes 和 box坐标的预测
def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
# 输入真实的box 坐标和classes。计算RPN 和 检测的总loss
def getLoss(self, refBoxes, refClasses):
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
请仔细看注释,因为这个是R-FCN的基类所以这里的程序比较简单。
大体就是使用InceptionResnetV2的最后的输出的feature map送入RPN,再对feature map进行几次conv2d运算,在BoxRefinementNetwork中进行计算。
注意这里的
def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
在test.py中的输出是
boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)
这里的输出就是通过RPN对所有的feature map产生box。再由self.boxRefiner.getBoxes在feature map上提取ROI,并且产生boxes, scores, classes。
BoxInceptionResnet.py
# BoxInceptionResnet.py
class BoxInceptionResnet(BoxNetwork):
LAYER_NAMES = ['Conv2d_1a_3x3','Conv2d_2a_3x3','Conv2d_2b_3x3','MaxPool_3a_3x3','Conv2d_3b_1x1','Conv2d_4a_3x3',
'MaxPool_5a_3x3','Mixed_5b','Repeat','Mixed_6a','Repeat_1','Mixed_7a','Repeat_2','Block8','Conv2d_7b_1x1']
def __init__(self, inputs, nCategories, name="BoxNetwork", weightDecay=0.00004, freezeBatchNorm=False, reuse=False, isTraining=True, trainFrom=None, hardMining=True):
self.boxThreshold = 0.5
try:
trainFrom = int(trainFrom)
except:
pass
if isinstance(trainFrom, int):
trainFrom = self.LAYER_NAMES[trainFrom]
print("Training network from "+(trainFrom if trainFrom is not None else "end"))
with tf.variable_scope(name, reuse=reuse) as scope:
# 构建基础的特征提取模型模型
self.googleNet = InceptionResnetV2("features", inputs, trainFrom=trainFrom, freezeBatchNorm=freezeBatchNorm)
self.scope=scope
with tf.variable_scope("Box"):
#Pepeat_1 - last 1/16 layer, Mixed_6a - first 1/16 layer
# 拿到Repeat_1的feature map
scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
# 拿到Mixed_6a的feature map
#scale_16 = self.googleNet.getOutput("Mixed_6a")[:,1:-1,1:-1,:]
scale_32 = self.googleNet.getOutput("PrePool")
with slim.arg_scope([slim.conv2d],
weights_regularizer=slim.l2_regularizer(weightDecay),
biases_regularizer=slim.l2_regularizer(weightDecay),
padding='SAME',
activation_fn = tf.nn.relu):
# 合并Repeat_1和PrePool的feature map
# 从这里开始net分为两个输出,一个是输出rpnInput,一个是输出featureInput
net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)
rpnInput = slim.conv2d(net, 1024, 1)
#BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)
# 这里的featureInput
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
def getVariables(self, includeFeatures=False):
if includeFeatures:
return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name)
else:
vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name+"/Box/")
vars += self.googleNet.getTrainableVars()
print("Training variables: ", [v.op.name for v in vars])
return vars
def importWeights(self, sess, filename):
self.googleNet.importWeights(sess, filename, includeTraining=True)
注意看上面的注释:
net分为两路:rpnInput和featureInput。
# 合并Repeat_1和PrePool的feature map
# 从这里开始net分为两个输出,一个是输出rpnInput,一个是输出featureInput
net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)
rpnInput = slim.conv2d(net, 1024, 1)
# BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)
# 这里的featureInput
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
如下图:
在这里有个比较重要的操作就是RPN和BoxRefinementNetwork
接下来那就看看怎么计算RPN吧
RPN.py
为了不把RPN搞得太复杂,这里就看看被调用的几个函数
在BoxNetwork.py中使用RPN.如下:
self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
......
self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)
只用到rpn.getPositiveOutputs
# RPN.py
class RPN:
def __init__(self, input, anchors=None, immediateSize=512, weightDecay=1e-5, inputDownscale=16, offset=[32,32]):
self.input = input
self.anchors = anchors
self.inputDownscale = inputDownscale
self.offset = offset
self.anchors = anchors if anchors is not None else self.makeAnchors([64,128,256,512])
print("Anchors: ", self.anchors)
self.tfAnchors = tf.constant(self.anchors, dtype=tf.float32)
self.hA=tf.reshape(self.tfAnchors[:,0],[-1])
self.wA=tf.reshape(self.tfAnchors[:,1],[-1])
self.nAnchors = len(self.anchors)
self.positiveIouThreshold=0.7
self.negativeIouThreshold=0.3
self.regressionWeight=1.0
self.nBoxLosses=256
self.nPositiveLosses=128
#dimensions
with tf.name_scope('dimension_info'):
s = tf.shape(self.input)
self.hIn = s[1]
self.wIn = s[2]
self.imageH = tf.cast(self.hIn*self.inputDownscale+self.offset[0]*2, tf.float32)
self.imageW = tf.cast(self.wIn*self.inputDownscale+self.offset[1]*2, tf.float32)
self.define(immediateSize, weightDecay)
def define(self, immediateSize, weightDecay):
with tf.name_scope('RPN'):
with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(weightDecay), padding='SAME'):
#box prediction layers
with tf.name_scope('NN'):
net = slim.conv2d(self.input, immediateSize, 3, activation_fn=tf.nn.relu)
scores = slim.conv2d(net, 2*self.nAnchors, 1, activation_fn=None)
boxRelativeCoordinates = slim.conv2d(net, 4*self.nAnchors, 1, activation_fn=None)
#split coordinates
x_raw, y_raw, w_raw, h_raw = tf.split(boxRelativeCoordinates, 4, axis=3)
#Save raw box sizes for loss
self.rawSizes = BoxUtils.mergeBoxData([w_raw, h_raw])
#Convert NN outputs to BBox coordinates
self.boxes = BoxUtils.nnToImageBoxes(x_raw, y_raw, w_raw, h_raw, self.wA, self.hA, self.inputDownscale, self.offset)
#store the size of every box
with tf.name_scope('box_sizes'):
boxSizes = tf.reshape(self.tfAnchors, [1,1,1,-1,2])
boxSizes = tf.tile(boxSizes, tf.stack([1,self.hIn,self.wIn,1,1]))
self.boxSizes = tf.reshape(boxSizes, [-1,2])
#scores
self.scores = tf.reshape(scores, [-1,2])
这里的define输出self.boxes 和self.boxes。这里可以理解为使用Faster R-CNN一样的原理。这里输入的rpnInput输出为self.boxes 和self.boxes。这里的self.boxes是针对rpnInput的每一个点产生的box坐标,self.boxes是判断box里面是否有物体,为一个二分类判别器。
在BoxNetwork调用的是rpn.filterOutputBoxe
# RPN.py
def filterOutputBoxes(self, boxes, scores, others=[], preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7):
with tf.name_scope("filter_output_boxes"):
scores = tf.nn.softmax(scores)[:,1]
scores = tf.reshape(scores,[-1])
#Clip boxes to edge
boxes = self.clipBoxesToEdge(boxes)
#Remove empty boxes
boxes, scores = BoxUtils.filterSmallBoxes(boxes, [scores])
scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))
#NMS filter
nmsIndices = tf.image.non_max_suppression(boxes, scores, iou_threshold=nmsThreshold, max_output_size=maxOutSize)
nmsIndices = tf.expand_dims(nmsIndices, axis=-1)
return MultiGather.gather([boxes, scores]+others, nmsIndices)
def getPositiveOutputs(self, preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7):
boxes, scores = self.filterOutputBoxes(self.boxes, self.scores, preNmsCount=preNmsCount, nmsThreshold=nmsThreshold, maxOutSize=maxOutSize)
return boxes, scores
filterOutputBoxe就是对boxes, scores进过滤对使用nms和阀值等方法过滤掉不合适的boxes。
-
- 裁剪超出边界的box
- 2.一出不包含物体的box
- 3.使用NMS进行过滤
BoxRefinementNetwork.py
在BoxNetwork中是使用BoxRefinementNetwork
self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)
................
self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
这里通过BoxRefinementNetwork使用到了self.boxRefiner.getBoxes
直接来看BoxRefinementNetwork
# BoxRefinementNetwork.py
class BoxRefinementNetwork:
POOL_SIZE=3
def __init__(self, input, nCategories, downsample=16, offset=[32,32], hardMining=True):
self.downsample = downsample
self.offset = offset
# 设置分类个数
self.nCategories = nCategories
# 输出(self.POOL_SIZE**2)*(1+nCategories)个feature map的classes scores。
self.classMaps = slim.conv2d(input, (self.POOL_SIZE**2)*(1+nCategories), 3, activation_fn=None, scope='classMaps')
# (self.POOL_SIZE**2)*4 个feature map的box坐标
self.regressionMap = slim.conv2d(input, (self.POOL_SIZE**2)*4, 3, activation_fn=None, scope='regressionMaps')
self.hardMining=hardMining
#Magic parameters.
self.posIouTheshold = 0.5
self.negIouThesholdHi = 0.5
self.negIouThesholdLo = 0.1
self.nTrainBoxes = 128
self.nTrainPositives = 32
self.falseValue = 0.0002
这里比较绕的就是这里的三个函数,从下往上调用。
def roiPooling(self, layer, boxes):
return positionSensitiveRoiPooling(layer, boxes, offset=self.offset, downsample=self.downsample, roiSize=self.POOL_SIZE)
def roiMean(self, layer, boxes):
with tf.name_scope("roiMean"):
return tf.reduce_mean(self.roiPooling(layer, boxes), axis=[1,2])
def getBoxScores(self, boxes):
with tf.name_scope("getBoxScores"):
return self.roiMean(self.classMaps, boxes)
下图就是实现过程:
上面的ROI就是这里输入的box坐标。box的坐标在self.classMaps上进行裁剪,拿到对应的ROI的feature map再进行对应的roi_pooling操作。选择出合适的类别。这里的具体过程还没仔细分析。。。。。。
详细的原理细节看这里
还记得self.classMaps和self.regressionMap吗??
这里的self.classMaps是用来计算分数的,使用getBoxScores已经对self.classMaps操作完了,筛选出合适的box了。现在需要使用refineBoxes对self.regressionMap进行裁剪操作,得到最后的positives 的box
def getBoxes(self, proposals, proposal_scores, maxOutputs=30, nmsThreshold=0.3, scoreThreshold=0.8):
if scoreThreshold is None:
scoreThreshold = 0
with tf.name_scope("getBoxes"):
scores = tf.nn.softmax(self.getBoxScores(proposals))
classes = tf.argmax(scores, 1)
scores = tf.reduce_max(scores, axis=1)
posIndices = tf.cast(tf.where(tf.logical_and(classes > 0, scores>scoreThreshold)), tf.int32)
positives, scores, classes = MultiGather.gather([proposals, scores, classes], posIndices)
positives = self.refineBoxes(positives, False)
#Final NMS
posIndices = tf.image.non_max_suppression(positives, scores, iou_threshold=nmsThreshold, max_output_size=maxOutputs)
posIndices = tf.expand_dims(posIndices, axis=-1)
positives, scores, classes = MultiGather.gather([positives, scores, classes], posIndices)
classes = tf.cast(tf.cast(classes,tf.int32) - 1, tf.uint8)
return positives, scores, classes
可以这么说这里检测流程已经结束了
loss
def loss(self, proposals, refBoxes, refClasses):
with tf.name_scope("BoxRefinementNetworkLoss"):
proposals = tf.stop_gradient(proposals)
# 位置loss
def getPosLoss(positiveBoxes, positiveRefIndices, nPositive):
with tf.name_scope("getPosLoss"):
positiveRefIndices = tf.reshape(positiveRefIndices,[-1,1])
positiveClasses, positiveRefBoxes = MultiGather.gather([refClasses, refBoxes], positiveRefIndices)
positiveClasses = tf.cast(tf.cast(positiveClasses,tf.int8) + 1, tf.uint8)
if not self.hardMining:
selected = Utils.RandomSelect.randomSelectIndex(tf.shape(positiveBoxes)[0], nPositive)
positiveBoxes, positiveClasses, positiveRefBoxes = MultiGather.gather([positiveBoxes, positiveClasses, positiveRefBoxes], selected)
return tf.tuple([self.classRefinementLoss(positiveBoxes, positiveClasses) + self.boxRefinementLoss(positiveBoxes, positiveRefBoxes), tf.shape(positiveBoxes)[0]])
#
def getNegLoss(negativeBoxes, nNegative):
with tf.name_scope("getNetLoss"):
if not self.hardMining:
negativeIndices = Utils.RandomSelect.randomSelectIndex(tf.shape(negativeBoxes)[0], nNegative)
negativeBoxes = tf.gather_nd(negativeBoxes, negativeIndices)
return self.classRefinementLoss(negativeBoxes, tf.zeros(tf.stack([tf.shape(negativeBoxes)[0],1]), dtype=tf.uint8))
def getRefinementLoss():
with tf.name_scope("getRefinementLoss"):
iou = BoxUtils.iou(proposals, refBoxes)
maxIou = tf.reduce_max(iou, axis=1)
bestIou = tf.expand_dims(tf.cast(tf.argmax(iou, axis=1), tf.int32), axis=-1)
#Find positive and negative indices based on their IOU
posBoxIndices = tf.cast(tf.where(maxIou > self.posIouTheshold), tf.int32)
negBoxIndices = tf.cast(tf.where(tf.logical_and(maxIou < self.negIouThesholdHi, maxIou > self.negIouThesholdLo)), tf.int32)
#Split the boxes and references
posBoxes, posRefIndices = MultiGather.gather([proposals, bestIou], posBoxIndices)
negBoxes = tf.gather_nd(proposals, negBoxIndices)
#Add GT boxes
posBoxes = tf.concat([posBoxes,refBoxes], 0)
posRefIndices = tf.concat([posRefIndices, tf.reshape(tf.range(tf.shape(refClasses)[0]), [-1,1])], 0)
#Call the loss if the box collection is not empty
nPositive = tf.shape(posBoxes)[0]
nNegative = tf.shape(negBoxes)[0]
if self.hardMining:
posLoss = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, 0)[0], lambda: tf.zeros((0,), tf.float32))
negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, 0), lambda: tf.zeros((0,), tf.float32))
allLoss = tf.concat([posLoss, negLoss], 0)
return tf.cond(tf.shape(allLoss)[0]>0, lambda: tf.reduce_mean(Utils.MultiGather.gatherTopK(allLoss, self.nTrainBoxes)), lambda: tf.constant(0.0))
else:
posLoss, posCount = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, self.nTrainPositives), lambda: tf.tuple([tf.constant(0.0), tf.constant(0,tf.int32)]))
negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, self.nTrainBoxes-posCount), lambda: tf.constant(0.0))
nPositive = tf.cast(tf.shape(posLoss)[0], tf.float32)
nNegative = tf.cond(nNegative > 0, lambda: tf.cast(tf.shape(negLoss)[0], tf.float32), lambda: tf.constant(0.0))
return (tf.reduce_mean(posLoss)*nPositive + tf.reduce_mean(negLoss)*nNegative)/(nNegative+nPositive)
return tf.cond(tf.logical_and(tf.shape(proposals)[0] > 0, tf.shape(refBoxes)[0] > 0), lambda: getRefinementLoss(), lambda:tf.constant(0.0))
这里只是R-FCN的loss
总的loss
联合训练需要RPN和R-FCN总的loss
def getLoss(self, refBoxes, refClasses):
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
参考:
RFCN论文笔记
R-FCN论文阅读