ICCV 2019视频目标跟踪算法Pipeline集合

文章目录

  • 1. ARCF: "Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking"
  • 2. MLT: "Deep Meta Learning for Real-Time Target-Aware Visual Tracking"
  • 3. : "Joint Monocular 3D Vehicle Detection and Tracking"
  • 4. : "`Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-term Tracking"
  • 5. DiMP: "Learning Discriminative Model Prediction for Tracking"
  • 6. : "Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking"
  • 7. GradNet: "GradNet: Gradient-Guided Network for Visual Object Tracking"
  • 8. : "Bridging the Gap Between Detection and Tracking: A Unified Approach"
  • 9. : "Physical Adversarial Textures That Fool Visual Object Tracking"
  • 10. CDTB : "CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark"


1. ARCF: “Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking”

[在线阅读]
ICCV 2019视频目标跟踪算法Pipeline集合_第1张图片
Figure 1. Comparison between background-aware correlation filter (BACF) and the proposed ARCF tracker. The central figure is to demonstrate the differences between previous response map and current response map on group1 1 from UAV123@10fps. Sudden changes of response maps indicate aberrances. When aberrances take place, BACF is tend to lose track of the object while the proposed ARCF can repress aberrances so that this kind of drifting can be avoided.

ICCV 2019视频目标跟踪算法Pipeline集合_第2张图片
Figure 2. Main work-flow of the proposed ARCF tracker. It learns both positive sample (green samples) of the object and negative samples (red samples) extracted from the background and the response map restriction is integrated in the learning process so that aberrances in response maps can be repressed. [ ψ p , q ] \left[\psi_{p, q}\right] [ψp,q] serves to shift the generated response map so that the peak position in the previous frame is the same as that of the current frame and thus the position of the detected object will not affect the restriction.




2. MLT: “Deep Meta Learning for Real-Time Target-Aware Visual Tracking”

[在线阅读]
ICCV 2019视频目标跟踪算法Pipeline集合_第3张图片
Figure 1: Motivation of the proposed visual tracker. Our framework incorporates a meta-learner network along with a matching network. The meta-learner network receives meta information from the matching network and provides the matching network with the adaptive target-specific feature space needed for robust matching and tracking.

ICCV 2019视频目标跟踪算法Pipeline集合_第4张图片
Figure 2: Overview of proposed visual tracking framework. The matching network provides the meta-learner network with meta-information in the form of loss gradients obtained using the training samples. Then the meta-learner network provides the matching network with target-specific information in the form of convolutional kernels and channel-wise attention.
ICCV 2019视频目标跟踪算法Pipeline集合_第5张图片
Figure 3: Training scheme of meta-learner network. The meta-learner network uses loss gradients δ δ δ in (2) as meta information, derived from the matching network, which explains its own status in the current feature space [35]. Then, the function g ( ⋅ ) g(·) g() in (3) learns the mapping from this loss gradient to adaptive weights w target, which describe the target-specific feature space. The meta-learner network can be trained by minimizing the loss function in (7), which measures how accurate the adaptive weights w target were at fitting new examples z 1 , . . . , z M ′ {z_1, ..., z_{M'}} z1,...,zM correctly.




3. : “Joint Monocular 3D Vehicle Detection and Tracking”

[在线阅读] [code]
ICCV 2019视频目标跟踪算法Pipeline集合_第6张图片
Figure 1: Joint online detection and tracking in 3D. Our dynamic 3D tracking pipeline predicts 3D bounding box association of observed vehicles in image sequences captured by a monocular camera with an ego-motion sensor.

ICCV 2019视频目标跟踪算法Pipeline集合_第7张图片
Figure 2: Overview of our monocular 3D tracking framework. Our online approach processes monocular frames to estimate and track region of interests (RoIs) in 3D (a). For each ROI, we learn 3D layout (i.e., depth, orientation, dimension, a projection of 3D center) estimation (b). With 3D layout, our LSTM tracker produces robust linking across frames leveraging occlusion-aware association and depth-ordering matching ©. With the help of 3D tracking, the model further refines the ability of 3D estimation by fusing object motion features of the previous frames (d).

ICCV 2019视频目标跟踪算法Pipeline集合_第8张图片
Figure 3: Illustration of depth-ordering matching. Given the tracklets and detections, we sort them into a list by depth order. For each detection of interest (DOI), we calculate the IOU between DOI and non-occluded regions of each tracklet. The depth order naturally provides higher probabilities to tracklets near the DOI.

ICCV 2019视频目标跟踪算法Pipeline集合_第9张图片
Figure 4: Illustration of Occlusion-aware association. A tracked tracklet (yellow) is visible all the time, while a tracklet (red) is occluded by another (blue) at frame T − 1 T −1 T1. During occlusion, the tracklet does not update state but keep inference motion until reappearance. For a truncated or disappear tracklet (blue at frame T T T), we left it as lost.




4. : “`Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking”

[在线阅读] [源码链接] [sinat_31184961的论文笔记]
ICCV 2019视频目标跟踪算法Pipeline集合_第10张图片
Figure 1. Our ‘Skimming-Perusal’ long-term tracking framework. Better viewed in color with zoom-in.

ICCV 2019视频目标跟踪算法Pipeline集合_第11张图片
Figure 2. The adopted SiameseRPN module in our framework. Better viewed in color with zoom-in.

ICCV 2019视频目标跟踪算法Pipeline集合_第12张图片
Figure 4. The network architecture of our skimming module. Better viewed in color with zoom-in.




5. DiMP: “Learning Discriminative Model Prediction for Tracking”

[在线阅读]

ICCV 2019视频目标跟踪算法Pipeline集合_第13张图片
Figure 1. Confidence maps of the target object (red box) provided by the target model obtained using i) a Siamese approach (middle), and ii) Our approach (right). The model predicted in a Siamese fashion, using only target appearance, struggles to distinguish the target from distractor objects in the background. In contrast, our model prediction architecture also integrates background appearance, providing superior discriminative power.

ICCV 2019视频目标跟踪算法Pipeline集合_第14张图片
Figure 2. An overview of the target classification branch in our tracking architecture. Given an annotated training set (top left), we extract deep feature maps using a backbone network followed by an additional convolutional block (Cls Feat). The feature maps are then input to the model predictor D, consisting of the initializer and the recurrent optimizer module. The model predictor outputs the weights of the convolutional layer which performs target classification on the feature map extracted from the test frame.




6. : “Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking”

[在线阅读]
ICCV 2019视频目标跟踪算法Pipeline集合_第15张图片
Figure 1. In contrast to the classical DCF paradigm, our GFSDCF performs channel and spatial group feature selection for the learning of correlation filters. Group sparsity is enforced in the channel and spatial dimensions to highlight relevant features with enhanced discrimination and interpretability. Additionally, a lowrank temporal smoothness constraint is employed across temporal frames to improve the stability of the learned filters.




7. GradNet: “GradNet: Gradient-Guided Network for Visual Object Tracking”

[在线阅读]
ICCV 2019视频目标跟踪算法Pipeline集合_第16张图片
Figure 1. The motivation of our algorithm. Images in the first and third column are target patches in SiameseFC. The other images show absolute values of their gradients, where the red regions have large gradient. As we can see, the gradient values can reflect the target variations and background clutter.

ICCV 2019视频目标跟踪算法Pipeline集合_第17张图片
Figure 3. The pipeline of the proposed algorithm, which consists of two branches. The bottom branch extracts the feature of search region X and the top branch (named update branch) is responsible for template generation. The two purple trapezoids in the figure represent sub-nets with shared parameters; the solid and dotted line represents forward and backward propagation respectively.




8. : “Bridging the Gap Between Detection and Tracking: A Unified Approach”

[在线阅读]

ICCV 2019视频目标跟踪算法Pipeline集合_第18张图片
Figure 1: (a) The overall architecture of our tracking-by-detection framework. The architecture consists of two branches, one for generating target features as guidance while the other is an ordinary object detector. The two branches are bridged through a Target-Guidance Module (TGM). The blue dotted line represents traditional object detection process, while the red arrow denotes the procedure of the proposed guided object detection. (b) The outline of TGM. The input to the module is the exemplar and search image features and it output a modulated feature map with target information incorporated. The follow-up detection process remains intact. Note the detection model in (a) can be replaced by almost any modern object detectors.

ICCV 2019视频目标跟踪算法Pipeline集合_第19张图片
Figure 2: Overview of our training procedure. In the training stage, we sample triplets of exemplar, support and query images from video frames. Each triplet is chronologically sampled from a same video. We take the exemplar image as the guidance and perform detection on the support and query images. The losses calculated on the support image are used to finetune the meta-layers (i.e., the detector’s heads) of our model, and we expect the updated model to generalize and perform well on the query image, which is realized by backpropagating all parameters of our model based on the losses on query image. The red arrows represent the backpropagation path during optimization. The inner optimization loop only updates head layer parameters, while the outer optimization loop updates all parameters in the architecture.

ICCV 2019视频目标跟踪算法Pipeline集合_第20张图片
Figure 3: Instantiation of our framework on SSD [40]. We employ SSD with VGG-16 [50] as the backbone. The original SSD performs object detection on 6 different convolutional layers with increased receptive fields, with each being responsible for specific sized objects. In our work, we only use its first 3 backbone layers, denoting as L1, L2 and L3 in the figure. The target-guidance modules are appended to each layer, with increased guidance image resolutions that are consistent with the receptive fields. Operators ϕ 1 ϕ_1 ϕ1, ϕ 2 ϕ_2 ϕ2 and ϕ 3 ϕ_3 ϕ3 represent extracting features at L1, L2 and L3 layers.

ICCV 2019视频目标跟踪算法Pipeline集合_第21张图片
Figure 4: Instantiation of our framework on FasterRCNN [15]. The exemplar and query images are fed into the backbone and bridged using the target-guidance module, while the subsequent region proposal and RoI classification and regression procedure is kept unchanged. We evaluate the model with either VGG [50] or ResNet [20] as the backbone.




9. : “Physical Adversarial Textures That Fool Visual Object Tracking”

[在线阅读]

ICCV 2019视频目标跟踪算法Pipeline集合_第22张图片
Figure 1: A poster of a Physical Adversarial Texture resembling a photograph, causes a tracker’s bounding-box predictions to lose track as the target person moves over it.

ICCV 2019视频目标跟踪算法Pipeline集合_第23张图片
Figure 2: The Physical Adversarial Texture (PAT) Attack creates adversaries to fool the GOTURN tracker, via minibatch
gradient descent to optimize various losses, using randomized scenes following Expectation Over Transformation (EOT).




10. CDTB : “CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark”

ICCV 2019视频目标跟踪算法Pipeline集合_第24张图片
Figure 1. RGB and depth sequences from CDTB. Depth offers a complementary information to color: two identical objects are easier to distinguish in depth (a), low illumination scenes (b) are less challenging for trackers if depth information is available, tracking a deformable object in depth simplifies the problem © and a sudden significant change in depth is a strong clue for occlusion (d). Sequences (a,b) are captured by a ToF-RGB pair of cameras, © by s tereo-camera sensor and (d) by a Kinect sensor.





最后,感谢Sophia-11同学维护的ICCV2019录用论文项目:https://github.com/Sophia-11/Awesome-ICCV2019

你可能感兴趣的:(视频目标跟踪(Visual,tracking),论文笔记(Paper,notes))