DaSiamRPN测试程序分析

此番开源的 DaSiamRPN 代码不包含文章3.3节中的内容(参见知乎评论),同时公布的 SiamRPNBIG.model 相比论文中的模型通道翻倍,计算量增至原有的4倍。因此,速度亦不可按照原文中考量。该项目的实际内容可以看作是 SiamRPNv2。

SiamRPN 和 DaSiamRPN 基于 AlexNet,Fast R-CNN 中 RPN 基于 ZF-Net。ZF-Net 是 AlexNet 的修改版,这意味着 DaSiamRPN 是在用 RPN 做相关(cross correlation)操作。

SiamRPNBIG

SiamRPNBIG 扩大了搜索区域( 255 ⇒ 271 255\Rightarrow271 255271),通道数量相比 Siamese-FC 和 SiamRPN 翻倍。

BN-AlexNet
BN-AlexNet
cls_t
reg_t
cls_d
reg_d
kernel
kernel
1x1
softmax
127x127x3
6x6x512
271x271x3
24x24x512
4x4x 512x2k
4x4x 512x4k
22x22x512
22x22x512
*
*
19x19x2k
19x19x4k
19x19x4k
361k

网络计算量(SiamRPNBIG)

Type/Stride Filter Shape Input Size GFLOPs
Conv1 / s2 11x11x3 271x271x3 2.23
MaxPool1 / s2 3x3 131x131x192
Conv2 5x5x192 65x65x192 17.03
MaxPool2 / s2 3x3 61x61x512
Conv3 3x3x512 30x30x512 5.17
Conv4 3x3x768 28x28x768 4.46
Conv5 3x3x512 26x26x512 2.53
reg_d 3x3x512 24x24x512 2.13
cls_d 3x3x512 24x24x512 2.13
Correlation_reg 4x4x512 22x22x512 0.11
Correlation_cls 4x4x512 22x22x512 0.06
regress_adjust 1x1x20 19x19x20 0.0003
total     35.9

torch.nn.Sequential 是顺序容器。模块将按照它们在构造函数中传递的顺序添加到它中。或者,也可以传入模块的有序字典。

    def __init__(self, feat_in=512, feature_out=512, anchor=5):
        super(SiamRPNBIG, self).__init__()
        self.anchor = anchor
        self.feature_out = feature_out
        self.featureExtract = nn.Sequential(
            nn.Conv2d(3, 192, 11, stride=2),
            nn.BatchNorm2d(192),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, stride=2),
            nn.Conv2d(192, 512, 5),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(3, stride=2),
            nn.Conv2d(512, 768, 3),
            nn.BatchNorm2d(768),
            nn.ReLU(inplace=True),
            nn.Conv2d(768, 768, 3),
            nn.BatchNorm2d(768),
            nn.ReLU(inplace=True),
            nn.Conv2d(768, 512, 3),
            nn.BatchNorm2d(512),
        )

conv_r1是模板分支的坐标回归,conv_cls1是其分类;conv_r2conv_cls2作用于检测分支,只是普通的3x3卷积。但self.r1_kernel的初始化类型值得注意,实际赋值为 Tensor,意味着不是推理所需。作为对比可参考 MathsXDC/DaSiamRPNWithOfflineTraining。

regress_adjust为1x1卷积,在论文中未有提及。

        self.conv_r1 = nn.Conv2d(feat_in, feature_out*4*anchor, 3)
        self.conv_r2 = nn.Conv2d(feat_in, feature_out, 3)
        self.conv_cls1 = nn.Conv2d(feat_in, feature_out*2*anchor, 3)
        self.conv_cls2 = nn.Conv2d(feat_in, feature_out, 3)
        self.regress_adjust = nn.Conv2d(4*anchor, 4*anchor, 1)

        self.r1_kernel = []
        self.cls1_kernel = []

temple

由于孪生网的两个分支在跟踪推理阶段是异步运行,所以 SiamRPNBIG 专门定义了一个 temple 函数来处理模板。

调用featureExtract提取模板z的特征。
z_f得到坐标回归量r1_kernel_raw和分类得分cls1_kernel_raw
kernel_size为模板宽度。
torch.Tensor.view 返回一个新的张量,其数据与自身张量相同,但大小不同。
修改r1_kernelcls1_kernel的形状:
[ f e a t u r e _ o u t ∗ 4 ∗ a n c h o r , k e r n e l _ s i z e , k e r n e l _ s i z e ] ⇒ [ a n c h o r ∗ 4 , f e a t u r e _ o u t , k e r n e l _ s i z e , k e r n e l _ s i z e ] [\mathrm{feature\_out*4*anchor,kernel\_size,kernel\_size}] \Rightarrow [\mathrm{anchor*4,feature\_out,kernel\_size,kernel\_size}] [feature_out4anchor,kernel_size,kernel_size][anchor4,feature_out,kernel_size,kernel_size]
模板分支作为卷积核。
如果是批量训练如何处理呢?

        z_f = self.featureExtract(z)
        r1_kernel_raw = self.conv_r1(z_f)
        cls1_kernel_raw = self.conv_cls1(z_f)
        kernel_size = r1_kernel_raw.data.size()[-1]
        self.r1_kernel = r1_kernel_raw.view(self.anchor*4, self.feature_out, kernel_size, kernel_size)
        self.cls1_kernel = cls1_kernel_raw.view(self.anchor*2, self.feature_out, kernel_size, kernel_size)

forward

torch.nn.Module.forward 定义每次调用时执行的计算。所有子类应该覆盖该函数。

虽然需要在此函数中定义正向传递的配方,但是之后应该调用 Module 实例而不是这个,因为前者负责运行已注册的钩子,而后者默默地忽略它们。

Created with Raphaël 2.2.0 forward x, r1_kernel, cls1_kernel featureExtract functional.conv2d regress_adjust delta, score End

torch.nn.functional.conv2d 在由多个输入平面组成的输入图像上应用 2D 卷积。
如果有参数,使用 torch.nn.Module。torch.nn.functional 使用 torch.Tensor,因此需要指定反向函数。torch.nn.Module 使用torch.autograd.Variable,可以借助 torch.autograd.grad 微分。

坐标回归接regress_adjust1x1卷积。

        x_f = self.featureExtract(x)
        return self.regress_adjust(F.conv2d(self.conv_r2(x_f), self.r1_kernel)), \
               F.conv2d(self.conv_cls2(x_f), self.cls1_kernel)

vot_SiamRPN.py

vot_SiamRPN.py 文件为跟踪的主函数。遵从惯例,其调用 SiamRPN_init 和 SiamRPN_track 两个函数。

Created with Raphaël 2.2.0 Start SiamRPNBIG load_state_dict net.eval vot.VOT handle.region get_axis_aligned_bbox handle.frame cv2.imread SiamRPN_init SiamRPN_track cxy_wh_2_rect Rectangle handle.report End
import vot
from vot import Rectangle
import sys
import cv2  # imread
import torch
import numpy as np
from os.path import realpath, dirname, join

from net import SiamRPNBIG
from run_SiamRPN import SiamRPN_init, SiamRPN_track
from utils import get_axis_aligned_bbox, cxy_wh_2_rect

创建网络并加载模型。
eval() 设置模块为测试模式。cuda(device=None) 将所有模型参数和缓冲区移动到GPU。torch.load 从文件加载用 torch.save() 保存的对象。load_state_dict() 将state_dict中的参数和缓冲区复制到此模块及其后代中。

# load net
net_file = join(realpath(dirname(__file__)), 'SiamRPNBIG.model')
net = SiamRPNBIG()
net.load_state_dict(torch.load(net_file))
net.eval().cuda()

预热。运行模板和检测分支10次。

# warm up
for i in range(10):
    net.temple(torch.autograd.Variable(torch.FloatTensor(1, 3, 127, 127)).cuda())
    net(torch.autograd.Variable(torch.FloatTensor(1, 3, 255, 255)).cuda())

开始跟踪。
创建一个 VOT 对象。VOT 是 Python 集成 VOT 的基类。
region 将配置消息发送到客户端并接收初始化区域和第一个图像的路径。其返回值为初始化区域。
get_axis_aligned_bbox 将坐标数据转换成 RPN 的格式。

# start to track
handle = vot.VOT("polygon")
Polygon = handle.region()
cx, cy, w, h = get_axis_aligned_bbox(Polygon)

frame 函数从客户端获取帧(图像路径)。

image_file = handle.frame()
if not image_file:
    sys.exit(0)

读取图像并初始化跟踪器。SiamRPN_init 构造状态结构体并运行模板分支。

target_pos, target_sz = np.array([cx, cy]), np.array([w, h])
im = cv2.imread(image_file)  # HxWxC
state = SiamRPN_init(im, target_pos, target_sz, net)  # init tracker

进入跟踪循环。
SiamRPN_track 运行检测分支并更新状态变量。
cxy_wh_2_rect 将坐标转换成矩形框的表示形式。
report 将跟踪结果报告给客户端。

while True:
    image_file = handle.frame()
    if not image_file:
        break
    im = cv2.imread(image_file)  # HxWxC
    state = SiamRPN_track(state, im)  # track
    res = cxy_wh_2_rect(state['target_pos'], state['target_sz'])

    handle.report(Rectangle(res[0], res[1], res[2], res[3]))

SiamRPN_init

Created with Raphaël 2.2.0 SiamRPN_init im, target_pos, target_sz, net TrackerConfig generate_anchor get_subwindow_tracking temple state End

TrackerConfig 类定义了跟踪器参数。

    state = dict()
    p = TrackerConfig()
    state['im_h'] = im.shape[0]
    state['im_w'] = im.shape[1]

根据目标和输入图像的大小调整搜索区域。

    if ((target_sz[0] * target_sz[1]) / float(state['im_h'] * state['im_w'])) < 0.004:
        p.instance_size = 287  # small object big search region
    else:
        p.instance_size = 271

根据网络总步长计算出得分图大小。
generate_anchor 构造出以图像中心为原点,格式为[cx, cy, w, h]的锚点矩阵。

    p.score_size = (p.instance_size - p.exemplar_size) / p.total_stride + 1

    p.anchor = generate_anchor(p.total_stride, p.scales, p.ratios, p.score_size)

    avg_chans = np.mean(im, axis=(0, 1))

p.context_amount * sum(target_sz)为填充边界。wc_zhc_z表示纹理填充后的宽高,s_z为等效边长。

    wc_z = target_sz[0] + p.context_amount * sum(target_sz)
    hc_z = target_sz[1] + p.context_amount * sum(target_sz)
    s_z = round(np.sqrt(wc_z * hc_z))

初始化原型。
get_subwindow_tracking 填充并截取出目标。

    # initialize the exemplar
    z_crop = get_subwindow_tracking(im, target_pos, p.exemplar_size, s_z, avg_chans)

torch.autograd.Variable 包裹张量并记录应用于它的操作。
运行 temple 函数计算模板结果。

    z = Variable(z_crop.unsqueeze(0))
    net.temple(z.cuda())

窗有两种。

    if p.windowing == 'cosine':
        window = np.outer(np.hanning(p.score_size), np.hanning(p.score_size))
    elif p.windowing == 'uniform':
        window = np.ones((p.score_size, p.score_size))
    window = np.tile(window.flatten(), p.anchor_num)

记录并返回状态。

    state['p'] = p
    state['net'] = net
    state['avg_chans'] = avg_chans
    state['window'] = window
    state['target_pos'] = target_pos
    state['target_sz'] = target_sz
    return state

TrackerConfig

exemplar 大小为127,搜索区域大小为271。宽高比有5种,尺度只有1种。

    # These are the default hyper-params for DaSiamRPN 0.3827
    windowing = 'cosine'  # to penalize large displacements [cosine/uniform]
    # Params from the network architecture, have to be consistent with the training
    exemplar_size = 127  # input z size
    instance_size = 271  # input x size (search region)
    total_stride = 8
    score_size = (instance_size-exemplar_size)/total_stride+1
    context_amount = 0.5  # context amount for the exemplar
    ratios = [0.33, 0.5, 1, 2, 3]
    scales = [8, ]
    anchor_num = len(ratios) * len(scales)
    anchor = []
    penalty_k = 0.055
    window_influence = 0.42
    lr = 0.295

SiamRPN_track

SiamRPN_track
get_subwindow_tracking
tracker_eval

state中获取所需变量。

    p = state['p']
    net = state['net']
    avg_chans = state['avg_chans']
    window = state['window']
    target_pos = state['target_pos']
    target_sz = state['target_sz']

计算扩展后尺寸。

    wc_z = target_sz[1] + p.context_amount * sum(target_sz)
    hc_z = target_sz[0] + p.context_amount * sum(target_sz)
    s_z = np.sqrt(wc_z * hc_z)
    scale_z = p.exemplar_size / s_z
    d_search = (p.instance_size - p.exemplar_size) / 2
    pad = d_search / scale_z
    s_x = s_z + 2 * pad

在前一个目标位置为搜索区域x提取缩放的截图。

    # extract scaled crops for search region x at previous target position
    x_crop = Variable(get_subwindow_tracking(im, target_pos, p.instance_size, round(s_x), avg_chans).unsqueeze(0))

tracker_eval 预测出新的位置和得分。

    target_pos, target_sz, score = tracker_eval(net, x_crop.cuda(), target_pos, target_sz * scale_z, window, scale_z, p)
    target_pos[0] = max(0, min(state['im_w'], target_pos[0]))
    target_pos[1] = max(0, min(state['im_h'], target_pos[1]))
    target_sz[0] = max(10, min(state['im_w'], target_sz[0]))
    target_sz[1] = max(10, min(state['im_h'], target_sz[1]))
    state['target_pos'] = target_pos
    state['target_sz'] = target_sz
    state['score'] = score
    return state

tracker_eval

运行网络的检测分支,得到坐标回归量和得分。

 delta, score = net(x_crop)

torch.Tensor.permute 置换此张量的尺寸。
torch.Tensor.contiguous 返回包含与自张量相同的数据的连续张量。如果自张量是连续的,则此函数返回自张量。
torch.Tensor.numpy 将自张量作为 NumPy ndarray 返回。此张量和返回的 ndarray 共享相同的底层存储。自张量的变化将反映在 ndarray 中,反之亦然。

置换delta,其形状由 N x 4k x H x W 变为4x(kx17x17)。score形状为2x(kx17x17),并取其后一半结果。

    delta = delta.permute(1, 2, 3, 0).contiguous().view(4, -1).data.cpu().numpy()
    score = F.softmax(score.permute(1, 2, 3, 0).contiguous().view(2, -1), dim=0).data[1, :].cpu().numpy()

x i p r o = x i a n + d x l r e g ∗ w l a n y j p r o = y j a n + d y l r e g ∗ h l a n w l p r o = w l a n ∗ e d w l h l p r o = h l a n ∗ e d h l \begin{aligned} x_i^{pro} &= x_i^{an} + dx_l^{reg}\ast w_l^{an}\\ y_j^{pro} &= y_j^{an} + dy_l^{reg}\ast h_l^{an}\\ w_l^{pro} &= w_l^{an} \ast e^{dw_l}\\ h_l^{pro} &= h_l^{an} \ast e^{dh_l} \end{aligned} xiproyjprowlprohlpro=xian+dxlregwlan=yjan+dylreghlan=wlanedwl=hlanedhl
由于p.anchor[:, 0]p.anchor[:, 1]采用相对坐标,所以delta[0, :]delta[1, :]表示相对前一帧的中心偏移,而delta[2, :]delta[3, :]为预测宽高。变量含义不明确。

    delta[0, :] = delta[0, :] * p.anchor[:, 2] + p.anchor[:, 0]
    delta[1, :] = delta[1, :] * p.anchor[:, 3] + p.anchor[:, 1]
    delta[2, :] = np.exp(delta[2, :]) * p.anchor[:, 2]
    delta[3, :] = np.exp(delta[3, :]) * p.anchor[:, 3]

p e n a l t y = e k ∗ max ⁡ ( r r ′ , r ′ r ) ∗ max ⁡ ( s s ′ , s ′ s ) \mathrm{penalty} = e^{k\ast \max{(\frac{r}{r'},\frac{r'}{r})}\ast \max{(\frac{s}{s'}, \frac{s'}{s})} } penalty=ekmax(rr,rr)max(ss,ss)
( w + p ) × ( h + p ) = s 2 p = w + h 2 \begin{aligned} (w + p) \times &(h + p) = s^2\\ p = &\frac{w + h}{2} \end{aligned} (w+p)×p=(h+p)=s22w+h

尺寸惩罚。sz 和 sz_wh 分别计算两种输入类型的等效边长。

    def change(r):
        return np.maximum(r, 1./r)

    def sz(w, h):
        pad = (w + h) * 0.5
        sz2 = (w + pad) * (h + pad)
        return np.sqrt(sz2)

    def sz_wh(wh):
        pad = (wh[0] + wh[1]) * 0.5
        sz2 = (wh[0] + pad) * (wh[1] + pad)
        return np.sqrt(sz2)
        
    # size penalty
    s_c = change(sz(delta[2, :], delta[3, :]) / (sz_wh(target_sz)))  # scale penalty
    r_c = change((target_sz[0] / target_sz[1]) / (delta[2, :] / delta[3, :]))  # ratio penalty
    
    penalty = np.exp(-(r_c * s_c - 1.) * p.penalty_k)
    pscore = penalty * score

pscore按一定权值叠加一个窗分布值。找出最优得分的索引。

    # window float
    pscore = pscore * (1 - p.window_influence) + window * p.window_influence
    best_pscore_id = np.argmax(pscore)

获得目标的坐标及尺寸。
delta除以scale_z映射到原图。

    target = delta[:, best_pscore_id] / scale_z
    target_sz = target_sz / scale_z
    lr = penalty[best_pscore_id] * score[best_pscore_id] * p.lr

由预测坐标偏移得到目标中心,宽高进行滑动平均。

    res_x = target[0] + target_pos[0]
    res_y = target[1] + target_pos[1]

    res_w = target_sz[0] * (1 - lr) + target[2] * lr
    res_h = target_sz[1] * (1 - lr) + target[3] * lr
    target_pos = np.array([res_x, res_y])
    target_sz = np.array([res_w, res_h])
    return target_pos, target_sz, score[best_pscore_id]

VOT

Python VOT 集成的基类。区域格式有两种。如果没有 trax,从’images.txt’中逐行读取图像路径,从’region.txt’中读取第一行区域坐标。parse_region 根据输入字符串解析出坐标。

    """ Base class for Python VOT integration """
    def __init__(self, region_format):
        """ Constructor
        Args:
            region_format: Region format options
        """
        assert(region_format in ['rectangle', 'polygon'])
        if TRAX:
            options = trax.server.ServerOptions(region_format, trax.image.PATH)
            self._trax = trax.server.Server(options)

            request = self._trax.wait()
            assert(request.type == 'initialize')
            if request.region.type == 'polygon':
                self._region = Polygon([Point(x[0], x[1]) for x in request.region.points])
            else:
                self._region = Rectangle(request.region.x, request.region.y, request.region.width, request.region.height)
            self._image = str(request.image)
            self._trax.status(request.region)
        else:
            self._files = [x.strip('\n') for x in open('images.txt', 'r').readlines()]
            self._frame = 0
            self._region = convert_region(parse_region(open('region.txt', 'r').readline()), region_format)
            self._result = []

region

返回区域位置。


        return self._region

report

记录结果。

        assert(isinstance(region, Rectangle) or isinstance(region, Polygon))
        if TRAX:
            if isinstance(region, Polygon):
                tregion = trax.region.Polygon([(x.x, x.y) for x in region.points])
            else:
                tregion = trax.region.Rectangle(region.x, region.y, region.width, region.height)
            self._trax.status(tregion, {"confidence" : confidence})
        else:
            self._result.append(region)
            self._frame += 1

frame

如果帧索引超出文件列表返回None,否则对应帧文件。

        if TRAX:
            if hasattr(self, "_image"):
                image = str(self._image)
                del self._image
                return image

            request = self._trax.wait()

            if request.type == 'frame':
                return str(request.image)
            else:
                return None

        else:
            if self._frame >= len(self._files):
                return None
            return self._files[self._frame]

namedtuple() 具有命名字段的元组的工厂函数。命名元组为元组中的每个位置赋予含义,并允许更可读,自我记录的代码。它们可以在使用常规元组的任何地方使用,并且它们添加了按名称而不是位置索引访问字段的功能。

parse_region

Split Strings with Multiple Delimiters?

    tokens = map(float, string.split(','))
    if len(tokens) == 4:
        return Rectangle(tokens[0], tokens[1], tokens[2], tokens[3])
    elif len(tokens) % 2 == 0 and len(tokens) > 4:
        return Polygon([Point(tokens[i],tokens[i+1]) for i in xrange(0,len(tokens),2)])
    return None

get_subwindow_tracking

获取图像及目标信息。

    if isinstance(pos, float):
        pos = [pos, pos]
    sz = original_sz
    im_sz = im.shape

original_szsz混用?
original_sz+1实现四舍五入取整。
pos是目标中心点的绝对坐标,c为扩展背景后边界到中心的距离。
left_pad = int(max(0., 0 - context_xmin)),即检查图像块是否超出了图片,计算超出量。

    c = (original_sz+1) / 2
    context_xmin = round(pos[0] - c)  # floor(pos(2) - sz(2) / 2);
    context_xmax = context_xmin + sz - 1
    context_ymin = round(pos[1] - c)  # floor(pos(1) - sz(1) / 2);
    context_ymax = context_ymin + sz - 1
    left_pad = int(max(0., -context_xmin))
    top_pad = int(max(0., -context_ymin))
    right_pad = int(max(0., context_xmax - im_sz[1] + 1))
    bottom_pad = int(max(0., context_ymax - im_sz[0] + 1))

填充使得原点发生改变,获得填充后图像块的坐标。

    context_xmin = context_xmin + left_pad
    context_xmax = context_xmax + left_pad
    context_ymin = context_ymin + top_pad
    context_ymax = context_ymax + top_pad

如果需要填充,首先初始化te_im,再进行对应位置赋值,最后赋给im_patch_original

    # zzp: a more easy speed version
    r, c, k = im.shape
    if any([top_pad, bottom_pad, left_pad, right_pad]):
        te_im = np.zeros((r + top_pad + bottom_pad, c + left_pad + right_pad, k), np.uint8)  # 0 is better than 1 initialization
        te_im[top_pad:top_pad + r, left_pad:left_pad + c, :] = im
        if top_pad:
            te_im[0:top_pad, left_pad:left_pad + c, :] = avg_chans
        if bottom_pad:
            te_im[r + top_pad:, left_pad:left_pad + c, :] = avg_chans
        if left_pad:
            te_im[:, 0:left_pad, :] = avg_chans
        if right_pad:
            te_im[:, c + left_pad:, :] = avg_chans
        im_patch_original = te_im[int(context_ymin):int(context_ymax + 1), int(context_xmin):int(context_xmax + 1), :]
    else:
        im_patch_original = im[int(context_ymin):int(context_ymax + 1), int(context_xmin):int(context_xmax + 1), :]

如果原始图像块大小与模型输入不同则调用 OpenCV 函数。

    if not np.array_equal(model_sz, original_sz):
        im_patch = cv2.resize(im_patch_original, (model_sz, model_sz))  # zzp: use cv to get a better speed
    else:
        im_patch = im_patch_original

im_to_torch 仅转换轴,BGR 通道顺序和数值(0~255)保持不变。

    return im_to_torch(im_patch) if out_mode in 'torch' else im_patch

generate_anchor

构造锚点数组。
size似乎改成 Receptive Field 更好理解。scale为8,需要根据输入小心设计。

    anchor_num = len(ratios) * len(scales)
    anchor = np.zeros((anchor_num, 4),  dtype=np.float32)
    size = total_stride * total_stride
    count = 0
    for ratio in ratios:
        ws = int(np.sqrt(size / ratio))
        hs = int(ws * ratio)
        for scale in scales:
            wws = ws * scale
            hhs = hs * scale
            anchor[count, 0] = 0
            anchor[count, 1] = 0
            anchor[count, 2] = wws
            anchor[count, 3] = hhs
            count += 1

对锚点组进行广播,并设置其坐标。
加上ori偏移后,xxyy以图像中心为原点。

    anchor = np.tile(anchor, score_size * score_size).reshape((-1, 4))
    ori = - (score_size / 2) * total_stride
    xx, yy = np.meshgrid([ori + total_stride * dx for dx in range(score_size)],
                         [ori + total_stride * dy for dy in range(score_size)])
    xx, yy = np.tile(xx.flatten(), (anchor_num, 1)).flatten(), \
             np.tile(yy.flatten(), (anchor_num, 1)).flatten()
    anchor[:, 0], anchor[:, 1] = xx.astype(np.float32), yy.astype(np.float32)
    return anchor

参考资料:

  • ECCV视觉目标跟踪之DaSiamRPN
  • 2018CVPR之siameseRPN
  • [深度学习] [目标跟踪] Siamese-RPN论文阅读笔记
  • 【目标跟踪】SiameseRPN:High Performance Visual Tracking with Siamese Region Proposal Network
  • zkisthebest/Siamese-RPN
  • Understanding AlexNet
  • 深度学习经典卷积神经网络之AlexNet
  • 经典CNN网络 - AlexNet总结
  • 卷积神经网络模型解读汇总——LeNet5,AlexNet、ZFNet、VGG16、GoogLeNet和ResNet
  • faster-rcnn 之 RPN网络的结构解析
  • [CVPR2017] CFNet 论文解读
  • cnn模型所需的计算力(flops)是怎么计算的? - chen liu的回答
  • dongfangduoshou123/DaSiamRPN-Caffe2
  • MathsXDC/DaSiamRPNWithOfflineTraining

你可能感兴趣的:(DeepLearning,VisualTracking,PyTorch)