源码地址：
https://github.com/xdever/RFCN-tensorflow

简单结构：

k^2(C+1)的conv: ResNet101的最后的输出是WxHx1024，用K^2(C+1)个1x1的卷积核去卷积，即可得到K^2(C+1)个大小为WxH的position sensitive的score map。这步的卷积操作就是在做prediction。这里的k=3，表示把一个ROI划分成3*3，对应的9个位置分别是：上左（左上角），上中，上右，中左，中中，中右，下左，下中，下右（右下角）
k^2(C+1)个feature map的物理意义: 共有k x k = 9个颜色，每个颜色的立体块（WxHx(C+1)）表示的是不同位置存在目标的概率值（第一块黄色表示的是左上角位置，最后一块淡蓝色表示的是右下角位置）。
pooling公式

z(i,j,c)是第i+k*(j-1)个立体块上的第c个map（1<= i,j <=3）。(i,j)决定了9种位置的某一种位置，假设为左上角位置（i=j=1），c决定了哪一类，假设为person类。在z(i,j,c)这个feature map上的某一个像素的位置是（x,y），像素值是value，则value表示的是原图对应的(x,y)这个位置上可能是人（c=‘person’）且是人的左上部位（i=j=1）的概率值

ROI pooling的输入和输出：ROI pooling操作的输入（对于C+1个类）是k^2(C+1)W' H'（W'和H'是ROI的宽度和高度）的score map上某ROI对应的那个立体块（由RPN预测的box坐标，在feature map上进行裁剪），且该立体块组成一个新的k^2(C+1)W' H'的立体块：每个颜色的立体块（C+1）都只抠出对应位置的一个bin，把这kk个bin组成新的立体块，大小为（C+1）W'H'。例如，下图中的第一块黄色只取左上角的bin，最后一块淡蓝色只取右下角的bin。所有的bin重新组合后就变成了类似右图的那个薄的立体块（图中的这个是池化后的输出，即每个面上的每个bin上已经是一个像素。池化前这个bin对应的是一个区域，是多个像素）。ROI pooling的输出为为一个（C+1）k*k的立体块

介绍结束，下面开始代码。。。

注意这是非官方版本的代码。仅仅是为了学习关于R-FCN的检测流程而去读的这份代码，代码本人并没有调试。

代码目录

Google搜索了很多博客，发现大家其实并不是特别关注R-FCN模型。就把这分代码捣鼓一下吧。

main.py是进行train的文件，如果你想进行训练可以修改这里的代码。
testCheckpoint.py是进行test的文件，就是测试ckpt文件中保存参数的shape的一个文件，这是与整个模型独立的一个文件，仅仅是对ckpt进行检查的文件。
test.py 这个文件是进行检测模型测试的文件。

test.py

所有的检测代码大部分都是相同的流程，所以首先看test.py模型代码。

parser = argparse.ArgumentParser(description="RFCN tester")
parser.add_argument('-gpu', type=str, default="0", help='Train on this GPU(s)')
parser.add_argument('-n', type=str, help='Network checkpoint file')
parser.add_argument('-i', type=str, help='Input file.')
parser.add_argument('-o', type=str, default="", help='Write output here.')
parser.add_argument('-p', type=int, default=1, help='Show preview')
parser.add_argument('-threshold', type=float, default=0.5, help='Detection threshold')
parser.add_argument('-delay', type=int, default=-1, help='Delay between frames in visualization. -1 for automatic, 0 for wait for keypress.')

这里的超参数设置的比较简单。

palette = Visualize.Palette(len(categories))

image = tf.placeholder(tf.float32, [None, None, None, 3])
net = BoxInceptionResnet(image, len(categories), name="boxnet")

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)


input = PreviewIO.PreviewInput(opt.i)
output = PreviewIO.PreviewOutput(opt.o, input.getFps())

这里使用BoxInceptionResnet构建R-FCN模型。
使用PreviewIO进行文件的读写。

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)

这里直接通过net.getBoxes获取模型的检测输出的占位符。

只看test.py 的主要结构。

with tf.Session() as sess:
        .........
        # 检查模型
    if not CheckpointLoader.loadCheckpoint(sess, None, opt.n, ignoreVarsInFileNotInSess=True):
                
        ..........
        # 对img进行预处理
        img = preprocessInput(img)  
                ...........
         # 把图片输入到检测模型进行计算
        rBoxes, rScores, rClasses = sess.run([boxes, scores, classes], feed_dict={image: np.expand_dims(img, 0)})

        ...........
        # 对输入的结果进行可视化
        res = Visualize.drawBoxes(img, rBoxes, rClasses, [categories[i] for i in rClasses.tolist()], palette, scores=rScores)

main.py

main.py就是进行训练的函数，为了简化训练过程，下面只列出重要的训练过程。下面只看loss

# 数据读取
dataset = BoxLoader()
dataset.add(CocoDataset(opt.dataset, randomZoom=opt.randZoom==1, set="train"+opt.cocoVariant))
if opt.mergeValidationSet==1:
    dataset.add(CocoDataset(opt.dataset, set="val"+opt.cocoVariant))

.....................
# 获取图片的标签
images, boxes, classes = Augment.augment(*dataset.get())

# 获取检测模型
net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)

# 获取模型的loss
tf.losses.add_loss(net.getLoss(boxes, classes))

# 构建loss操作
def createUpdateOp(gradClip=1):
    with tf.name_scope("optimizer"):
        optimizer=tf.train.AdamOptimizer(learning_rate=opt.learningRate, epsilon=opt.adamEps)
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        totalLoss = tf.losses.get_total_loss()
        grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
        if gradClip is not None:
            cGrads = []
            for g, v in grads:
                if g is None:
                    print("WARNING: no grad for variable "+v.op.name)
                    continue
                cGrads.append((tf.clip_by_value(g, -float(gradClip), float(gradClip)), v))
            grads = cGrads

        update_ops.append(optimizer.apply_gradients(grads))
        return control_flow_ops.with_dependencies([tf.group(*update_ops)], totalLoss, name='train_op')

# 构建优化操作
trainOp=createUpdateOp()

在main.py文件中有一个while=true的死循环，使用RunManager对训练进行管理。如下：

    runManager = RunManager(sess, options=runOptions, run_metadata=runMetadata)
    runManager.add("train", [globalStepInc,trainOp], modRun=1)

在while中进行优化训练

    ......
    # 进行训练
    while True:
        #run various parts of the network
        res = runManager.modRun(i)
        .....
        # 可是化训练结果
        visualizer.draw(res)

下面来看检测模型BoxInceptionResnet。
在test.py文件中使用net.getBoxes的的方法。

net = BoxInceptionResnet(image, len(categories), name="boxnet")

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)

在train.py中使用BoxInceptionResnet的net.getLoss和net.getVariables。

net = BoxInceptionResnet(images, dataset.categoryCount(), name="boxnet", trainFrom=opt.trainFrom, hardMining=opt.hardMining==1, freezeBatchNorm=opt.freezeBatchNorm==1)
tf.losses.add_loss(net.getLoss(boxes, classes))
......
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())

所以在进行源码查看时就看这几个函数。
发现BoxInceptionResnet是BoxNetwork的子类。
BoxNetwork.py

# BoxNetwork.py

class BoxNetwork:
    def __init__(self, nCategories, rpnLayer, rpnDownscale, rpnOffset, featureLayer=None, featureDownsample=None, featureOffset=None, weightDecay=1e-6, hardMining=True):
        '''
        featureInput = slim.conv2d(net, 1536, 1)
        BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
    
        nCategories: 检测的种类
        rpnInput：输入到rpn的feature map
        rpnDownscale ： rpnDownscale下采样的尺度
        rpnOffset： 图片的偏置
        featureInput ： 经过编码的featureInput
        featureDownsample： featureDownsample下采样的尺度
        featureOffset：feature的偏置尺寸
        '''
        
        if featureLayer is None:
            featureLayer=rpnLayer

        if featureDownsample is None:
            featureDownsample=rpnDownscale
            
        if featureOffset is None:
            rpnOffset=featureOffset

        with tf.name_scope("BoxNetwork"):
            self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
            self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)

            self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)

    # 使用阀值过滤筛选出合适的box坐标
    def getProposals(self, threshold=None):
        if threshold is not None and threshold>0:
            s = tf.cast(tf.where(self.proposalScores > threshold), tf.int32)
            return tf.gather_nd(self.proposals, s), tf.gather_nd(self.proposalScores, s)
        else:
            return self.proposals, self.proposalScores
    
    # 使用boxRefiner对feature map 进行classes 和 box坐标的预测
    def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
        return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)
    
    # 输入真实的box 坐标和classes。计算RPN 和 检测的总loss
    def getLoss(self, refBoxes, refClasses):
        return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)

请仔细看注释，因为这个是R-FCN的基类所以这里的程序比较简单。

大体就是使用InceptionResnetV2的最后的输出的feature map送入RPN，再对feature map进行几次conv2d运算，在BoxRefinementNetwork中进行计算。

注意这里的

def getBoxes(self, nmsThreshold=0.3, scoreThreshold=0.8):
        return self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)

在test.py中的输出是

boxes, scores, classes = net.getBoxes(scoreThreshold=opt.threshold)

这里的输出就是通过RPN对所有的feature map产生box。再由self.boxRefiner.getBoxes在feature map上提取ROI，并且产生boxes, scores, classes。

BoxInceptionResnet.py

# BoxInceptionResnet.py
class BoxInceptionResnet(BoxNetwork):
    LAYER_NAMES = ['Conv2d_1a_3x3','Conv2d_2a_3x3','Conv2d_2b_3x3','MaxPool_3a_3x3','Conv2d_3b_1x1','Conv2d_4a_3x3',
              'MaxPool_5a_3x3','Mixed_5b','Repeat','Mixed_6a','Repeat_1','Mixed_7a','Repeat_2','Block8','Conv2d_7b_1x1']

    def __init__(self, inputs, nCategories, name="BoxNetwork", weightDecay=0.00004, freezeBatchNorm=False, reuse=False, isTraining=True, trainFrom=None, hardMining=True):
        self.boxThreshold = 0.5

        try:
            trainFrom = int(trainFrom)
        except:
            pass

        if isinstance(trainFrom, int):
            trainFrom = self.LAYER_NAMES[trainFrom]


        print("Training network from "+(trainFrom if trainFrom is not None else "end"))

        with tf.variable_scope(name, reuse=reuse) as scope:
            # 构建基础的特征提取模型模型
            self.googleNet = InceptionResnetV2("features", inputs, trainFrom=trainFrom, freezeBatchNorm=freezeBatchNorm)
            self.scope=scope
        
            with tf.variable_scope("Box"):
                #Pepeat_1 - last 1/16 layer, Mixed_6a - first 1/16 layer
                # 拿到Repeat_1的feature map
                scale_16 = self.googleNet.getOutput("Repeat_1")[:,1:-1,1:-1,:]
                
                # 拿到Mixed_6a的feature map 
                #scale_16 = self.googleNet.getOutput("Mixed_6a")[:,1:-1,1:-1,:]
                scale_32 = self.googleNet.getOutput("PrePool")

                with slim.arg_scope([slim.conv2d],
                        weights_regularizer=slim.l2_regularizer(weightDecay),
                        biases_regularizer=slim.l2_regularizer(weightDecay),
                        padding='SAME',
                        activation_fn = tf.nn.relu):
                    
                    # 合并Repeat_1和PrePool的feature map
                    # 从这里开始net分为两个输出，一个是输出rpnInput，一个是输出featureInput
                    net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)
                    
                    
                    rpnInput = slim.conv2d(net, 1024, 1)
                    
                    #BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)
                    
                    # 这里的featureInput
                    featureInput = slim.conv2d(net, 1536, 1)
                    BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)
    
    def getVariables(self, includeFeatures=False):
        if includeFeatures:
            return tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name)
        else:
            vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=self.scope.name+"/Box/")
            vars += self.googleNet.getTrainableVars()

            print("Training variables: ", [v.op.name for v in vars])
            return vars

    def importWeights(self, sess, filename):
        self.googleNet.importWeights(sess, filename, includeTraining=True)

注意看上面的注释：

net分为两路：rpnInput和featureInput。

# 合并Repeat_1和PrePool的feature map
# 从这里开始net分为两个输出，一个是输出rpnInput，一个是输出featureInput
net = tf.concat([ tf.image.resize_bilinear(scale_32, tf.shape(scale_16)[1:3]), scale_16], 3)

rpnInput = slim.conv2d(net, 1024, 1)

# BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], scale_32, 32, [32,32], weightDecay=weightDecay, hardMining=hardMining)

# 这里的featureInput
featureInput = slim.conv2d(net, 1536, 1)
BoxNetwork.__init__(self, nCategories, rpnInput, 16, [32,32], featureInput, 16, [32,32], weightDecay=weightDecay, hardMining=hardMining)

如下图：

在这里有个比较重要的操作就是RPN和BoxRefinementNetwork

接下来那就看看怎么计算RPN吧

RPN.py

为了不把RPN搞得太复杂，这里就看看被调用的几个函数

在BoxNetwork.py中使用RPN.如下：

self.rpn = RPN(rpnLayer, immediateSize=512, weightDecay=weightDecay, inputDownscale=rpnDownscale, offset=rpnOffset)
......
            self.proposals, self.proposalScores = self.rpn.getPositiveOutputs(maxOutSize=300)

只用到rpn.getPositiveOutputs

# RPN.py

class RPN:
    def __init__(self, input, anchors=None, immediateSize=512, weightDecay=1e-5, inputDownscale=16, offset=[32,32]):
        self.input = input
        self.anchors = anchors
        self.inputDownscale = inputDownscale
        self.offset = offset
        self.anchors = anchors if anchors is not None else self.makeAnchors([64,128,256,512])
        print("Anchors: ", self.anchors)
        self.tfAnchors = tf.constant(self.anchors, dtype=tf.float32)

        self.hA=tf.reshape(self.tfAnchors[:,0],[-1])
        self.wA=tf.reshape(self.tfAnchors[:,1],[-1])

        self.nAnchors = len(self.anchors)

        self.positiveIouThreshold=0.7
        self.negativeIouThreshold=0.3
        self.regressionWeight=1.0
        
        self.nBoxLosses=256
        self.nPositiveLosses=128

        #dimensions
        with tf.name_scope('dimension_info'):
            s = tf.shape(self.input)
            self.hIn = s[1]
            self.wIn = s[2]

        
        self.imageH = tf.cast(self.hIn*self.inputDownscale+self.offset[0]*2, tf.float32)
        self.imageW = tf.cast(self.wIn*self.inputDownscale+self.offset[1]*2, tf.float32)

        self.define(immediateSize, weightDecay)


    def define(self, immediateSize, weightDecay):
        with tf.name_scope('RPN'):
            with slim.arg_scope([slim.conv2d], weights_regularizer=slim.l2_regularizer(weightDecay), padding='SAME'):
                #box prediction layers
                with tf.name_scope('NN'):
                    net = slim.conv2d(self.input, immediateSize, 3, activation_fn=tf.nn.relu)
                    scores = slim.conv2d(net, 2*self.nAnchors, 1, activation_fn=None)
                    boxRelativeCoordinates = slim.conv2d(net, 4*self.nAnchors, 1, activation_fn=None)

                #split coordinates
                x_raw, y_raw, w_raw, h_raw = tf.split(boxRelativeCoordinates, 4, axis=3)

                #Save raw box sizes for loss
                self.rawSizes = BoxUtils.mergeBoxData([w_raw, h_raw])
                            
                #Convert NN outputs to BBox coordinates
                self.boxes = BoxUtils.nnToImageBoxes(x_raw, y_raw, w_raw, h_raw, self.wA, self.hA, self.inputDownscale, self.offset)

                #store the size of every box
                with tf.name_scope('box_sizes'):
                    boxSizes = tf.reshape(self.tfAnchors, [1,1,1,-1,2])
                    boxSizes = tf.tile(boxSizes, tf.stack([1,self.hIn,self.wIn,1,1]))
                    self.boxSizes = tf.reshape(boxSizes, [-1,2])

                #scores
                self.scores = tf.reshape(scores, [-1,2])

这里的define输出self.boxes 和self.boxes。这里可以理解为使用Faster R-CNN一样的原理。这里输入的rpnInput输出为self.boxes 和self.boxes。这里的self.boxes是针对rpnInput的每一个点产生的box坐标，self.boxes是判断box里面是否有物体，为一个二分类判别器。

在BoxNetwork调用的是rpn.filterOutputBoxe

# RPN.py
def filterOutputBoxes(self, boxes, scores, others=[], preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7): 
        with tf.name_scope("filter_output_boxes"):
            scores = tf.nn.softmax(scores)[:,1]
            scores = tf.reshape(scores,[-1])

            #Clip boxes to edge
            boxes = self.clipBoxesToEdge(boxes)

            #Remove empty boxes
            boxes, scores = BoxUtils.filterSmallBoxes(boxes, [scores])
            scores, boxes = tf.cond(tf.shape(scores)[0] > preNmsCount , lambda: tf.tuple(MultiGather.gatherTopK(scores, preNmsCount, [boxes])), lambda: tf.tuple([scores, boxes]))

            #NMS filter
            nmsIndices = tf.image.non_max_suppression(boxes, scores, iou_threshold=nmsThreshold, max_output_size=maxOutSize)
            nmsIndices = tf.expand_dims(nmsIndices, axis=-1)

            return MultiGather.gather([boxes, scores]+others, nmsIndices)
        
    def getPositiveOutputs(self, preNmsCount=6000, maxOutSize=300, nmsThreshold=0.7):
        boxes, scores = self.filterOutputBoxes(self.boxes, self.scores, preNmsCount=preNmsCount, nmsThreshold=nmsThreshold, maxOutSize=maxOutSize)
        return boxes, scores

filterOutputBoxe就是对boxes, scores进过滤对使用nms和阀值等方法过滤掉不合适的boxes。

1. 裁剪超出边界的box
2.一出不包含物体的box
3.使用NMS进行过滤

BoxRefinementNetwork.py

在BoxNetwork中是使用BoxRefinementNetwork

self.boxRefiner = BoxRefinementNetwork(featureLayer, nCategories, downsample=featureDownsample, offset=featureOffset, hardMining=hardMining)

................

self.boxRefiner.getBoxes(self.proposals, self.proposalScores, maxOutputs=50, nmsThreshold=nmsThreshold, scoreThreshold=scoreThreshold)

这里通过BoxRefinementNetwork使用到了self.boxRefiner.getBoxes

直接来看BoxRefinementNetwork

# BoxRefinementNetwork.py

class BoxRefinementNetwork:
    POOL_SIZE=3

    def __init__(self, input, nCategories, downsample=16, offset=[32,32], hardMining=True):
        self.downsample = downsample
        self.offset = offset
        
        # 设置分类个数
        self.nCategories = nCategories
        
        # 输出(self.POOL_SIZE**2)*(1+nCategories)个feature　map的classes scores。
        self.classMaps = slim.conv2d(input, (self.POOL_SIZE**2)*(1+nCategories), 3, activation_fn=None, scope='classMaps')
        
        # (self.POOL_SIZE**2)*4 个feature　map的box坐标
        self.regressionMap = slim.conv2d(input, (self.POOL_SIZE**2)*4, 3, activation_fn=None, scope='regressionMaps')

        self.hardMining=hardMining

        #Magic parameters.
        self.posIouTheshold = 0.5
        self.negIouThesholdHi = 0.5
        self.negIouThesholdLo = 0.1
        self.nTrainBoxes = 128
        self.nTrainPositives = 32
        self.falseValue = 0.0002

这里比较绕的就是这里的三个函数，从下往上调用。

    def roiPooling(self, layer, boxes):
        return positionSensitiveRoiPooling(layer, boxes, offset=self.offset, downsample=self.downsample, roiSize=self.POOL_SIZE)

    def roiMean(self, layer, boxes):
        with tf.name_scope("roiMean"):
            return tf.reduce_mean(self.roiPooling(layer, boxes), axis=[1,2])

    def getBoxScores(self, boxes):
        with tf.name_scope("getBoxScores"):
            return self.roiMean(self.classMaps, boxes)

下图就是实现过程：

上面的ROI就是这里输入的box坐标。box的坐标在self.classMaps上进行裁剪，拿到对应的ROI的feature map再进行对应的roi_pooling操作。选择出合适的类别。这里的具体过程还没仔细分析。。。。。。

详细的原理细节看这里

还记得self.classMaps和self.regressionMap吗？？

这里的self.classMaps是用来计算分数的，使用getBoxScores已经对self.classMaps操作完了，筛选出合适的box了。现在需要使用refineBoxes对self.regressionMap进行裁剪操作，得到最后的positives 的box


def getBoxes(self, proposals, proposal_scores, maxOutputs=30, nmsThreshold=0.3, scoreThreshold=0.8):
        if scoreThreshold is None:
            scoreThreshold = 0

        with tf.name_scope("getBoxes"):
            scores = tf.nn.softmax(self.getBoxScores(proposals))
            
            classes = tf.argmax(scores, 1)
            scores = tf.reduce_max(scores, axis=1)
            posIndices = tf.cast(tf.where(tf.logical_and(classes > 0, scores>scoreThreshold)), tf.int32)

            positives, scores, classes = MultiGather.gather([proposals, scores, classes], posIndices)
            positives = self.refineBoxes(positives, False)

            #Final NMS
            posIndices = tf.image.non_max_suppression(positives, scores, iou_threshold=nmsThreshold, max_output_size=maxOutputs)
            posIndices = tf.expand_dims(posIndices, axis=-1)
            positives, scores, classes = MultiGather.gather([positives, scores, classes], posIndices)   
            
            classes = tf.cast(tf.cast(classes,tf.int32) - 1, tf.uint8)

            return positives, scores, classes

可以这么说这里检测流程已经结束了

loss

def loss(self, proposals, refBoxes, refClasses):
        with tf.name_scope("BoxRefinementNetworkLoss"):
            proposals = tf.stop_gradient(proposals)
            
            # 位置loss 
            def getPosLoss(positiveBoxes, positiveRefIndices, nPositive):
                with tf.name_scope("getPosLoss"):
                    positiveRefIndices =  tf.reshape(positiveRefIndices,[-1,1])

                    positiveClasses, positiveRefBoxes = MultiGather.gather([refClasses, refBoxes], positiveRefIndices)
                    positiveClasses = tf.cast(tf.cast(positiveClasses,tf.int8) + 1, tf.uint8)

                    if not self.hardMining:
                        selected = Utils.RandomSelect.randomSelectIndex(tf.shape(positiveBoxes)[0], nPositive)
                        positiveBoxes, positiveClasses, positiveRefBoxes = MultiGather.gather([positiveBoxes, positiveClasses, positiveRefBoxes], selected)

                    return tf.tuple([self.classRefinementLoss(positiveBoxes, positiveClasses) + self.boxRefinementLoss(positiveBoxes, positiveRefBoxes), tf.shape(positiveBoxes)[0]])
            # 
            def getNegLoss(negativeBoxes, nNegative):
                with tf.name_scope("getNetLoss"):
                    if not self.hardMining:
                        negativeIndices = Utils.RandomSelect.randomSelectIndex(tf.shape(negativeBoxes)[0], nNegative)
                        negativeBoxes = tf.gather_nd(negativeBoxes, negativeIndices)

                    return self.classRefinementLoss(negativeBoxes, tf.zeros(tf.stack([tf.shape(negativeBoxes)[0],1]), dtype=tf.uint8))
            
            def getRefinementLoss():
                with tf.name_scope("getRefinementLoss"):
                    iou = BoxUtils.iou(proposals, refBoxes)
                    
                    maxIou = tf.reduce_max(iou, axis=1)
                    bestIou = tf.expand_dims(tf.cast(tf.argmax(iou, axis=1), tf.int32), axis=-1)

                    #Find positive and negative indices based on their IOU
                    posBoxIndices = tf.cast(tf.where(maxIou > self.posIouTheshold), tf.int32)
                    negBoxIndices = tf.cast(tf.where(tf.logical_and(maxIou < self.negIouThesholdHi, maxIou > self.negIouThesholdLo)), tf.int32)

                    #Split the boxes and references
                    posBoxes, posRefIndices = MultiGather.gather([proposals, bestIou], posBoxIndices)
                    negBoxes = tf.gather_nd(proposals, negBoxIndices)

                    #Add GT boxes
                    posBoxes = tf.concat([posBoxes,refBoxes], 0)
                    posRefIndices = tf.concat([posRefIndices, tf.reshape(tf.range(tf.shape(refClasses)[0]), [-1,1])], 0)

                    #Call the loss if the box collection is not empty
                    nPositive = tf.shape(posBoxes)[0]
                    nNegative = tf.shape(negBoxes)[0]

                    if self.hardMining:
                        posLoss = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, 0)[0], lambda: tf.zeros((0,), tf.float32))
                        negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, 0), lambda: tf.zeros((0,), tf.float32))

                        allLoss = tf.concat([posLoss, negLoss], 0)
                        return tf.cond(tf.shape(allLoss)[0]>0, lambda: tf.reduce_mean(Utils.MultiGather.gatherTopK(allLoss, self.nTrainBoxes)), lambda: tf.constant(0.0))
                    else:
                        posLoss, posCount = tf.cond(nPositive > 0, lambda: getPosLoss(posBoxes, posRefIndices, self.nTrainPositives), lambda: tf.tuple([tf.constant(0.0), tf.constant(0,tf.int32)]))
                        negLoss = tf.cond(nNegative > 0, lambda: getNegLoss(negBoxes, self.nTrainBoxes-posCount), lambda: tf.constant(0.0))

                        nPositive = tf.cast(tf.shape(posLoss)[0], tf.float32)
                        nNegative = tf.cond(nNegative > 0, lambda: tf.cast(tf.shape(negLoss)[0], tf.float32), lambda: tf.constant(0.0))
                        
                        return (tf.reduce_mean(posLoss)*nPositive + tf.reduce_mean(negLoss)*nNegative)/(nNegative+nPositive)
    

        return tf.cond(tf.logical_and(tf.shape(proposals)[0] > 0, tf.shape(refBoxes)[0] > 0), lambda: getRefinementLoss(), lambda:tf.constant(0.0))

这里只是R-FCN的loss

总的loss

联合训练需要RPN和R-FCN总的loss

def getLoss(self, refBoxes, refClasses):
        return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)

参考:

RFCN论文笔记
R-FCN论文阅读

RFCN-tensorflow的源码