SORT 是一种实用的多目标跟踪算法,然而由于现实中目标运动多变且遮挡频繁,该算法的身份转换(Identity Switches)次数较高。DeepSORT 整合外观信息使得身份转换的数量减少了45%。所提方案为:
DeepSORT 属于传统的单假设跟踪方法,采用递归卡尔曼滤波和逐帧数据关联。
多目标跟踪问题的一个普遍场景为:摄像机未校准且没有自身运动信息可用。这也是多目标跟踪基准测试中最常见的设置(MOT16)。因此,DeepSORT
每个轨迹 K K K 内部记录自上次成功关联到测量 a k a_k ak 以来的帧数。该计数器在卡尔曼滤波器预测期间递增,并且当轨迹与测量相关联时重置为0。
关联预测卡尔曼状态与新到达测量值的常规方法是将其看作分配问题,利用匈牙利算法求解。DeepSORT 通过结合目标框马氏距离和特征余弦距离两个度量来整合运动和外观信息。一方面,马式距离基于运动提供了有关物体可能位置的信息,这对短期预测特别有用。另一方面,余弦距离考虑外观信息,这对于在长期遮挡之后找回身份特别有用,此时运动不那么具有辨别力。
预测卡尔曼状态和新到测量值之间的(平方)马氏距离(Mahalanobis 距离):
d ( 1 ) ( i , j ) = ( d j − y i ) ⊤ S i − 1 ( d j − y i ) , \begin{aligned} d^{(1)}(i,j) = (d_j - y_i)^\top S^{-1}_i (d_j - y_i), \end{aligned} d(1)(i,j)=(dj−yi)⊤Si−1(dj−yi),
其中, ( y i , S i ) (y_i, S_i) (yi,Si) 表示第 i i i 个轨迹分布到测量空间的投影, d j d_j dj 表示第 j j j 个检测边界框。
马氏距离通过测算检测与平均轨迹位置的距离超过多少标准差来考虑状态估计的不确定性。此外,可以通过从逆 χ 2 \chi^2 χ2 分布计算 95 % 95\% 95% 置信区间的阈值,排除可能性小的关联。四维测量空间对应的马氏阈值为 t ( 1 ) = 9.4877 t^{(1)} = 9.4877 t(1)=9.4877。如果第 i i i 条轨迹和第 j j j 个检测之间的关联是可采纳的,则:
b i , j ( 1 ) = 1 [ d ( 1 ) ( i , j ) ≤ t ( 1 ) ] \begin{aligned} b_{i,j}^{(1)} = \mathbb{1}[d^{(1)}(i, j) \leq t^{(1)}] \end{aligned} bi,j(1)=1[d(1)(i,j)≤t(1)]
当运动不确定性较低时,马式距离是一个合适的关联度量。但在跟踪的图像空间问题公式中,卡尔曼滤波框架仅提供目标位置的粗略估计。尤其是,未考虑的摄像机运动会在图像平面中引入快速位移,使得在遮挡情况下跟踪时马式距离度量相当不精确。因此,DeepSORT
算法在一个独立训练数据集上找到该指标的合适阈值。在实践中,DeepSORT 应用一个预训练的 CNN 来计算边界框外观描述符。
关联问题的成本函数为以上两个指标的加权和:
c i , j = λ   d ( 1 ) ( i , j ) + ( 1 − λ ) d ( 2 ) ( i , j ) \begin{aligned} c_{i,j} = \lambda \, d^{(1)}(i, j) + (1 - \lambda) d^{(2)}(i, j) \end{aligned} ci,j=λd(1)(i,j)+(1−λ)d(2)(i,j)
超参数 λ \lambda λ 控制每个度量对组合关联成本的影响。在实验中,作者发现当有大量的相机运动时,设置 λ = 0 \lambda=0 λ=0 是一个合理的选择。此时,关联成本中仅使用外观信息。然而,关联结果仍受两方面的约束。仅当关联在两个度量的选通区域内时,称其为可接受关联:
b i , j = ∏ m = 1 2 b i , j ( m ) . \begin{aligned} b_{i,j} = \prod_{m=1}^{2} b_{i, j}^{(m)}. \end{aligned} bi,j=m=1∏2bi,j(m).
当目标被遮挡一段较长的时间后,随后的卡尔曼滤波预测会增加与目标位置相关的不确定性。因此,概率质量在状态空间中扩散,观测概率变得不那么尖峰。直观地说,关联度量应该通过增加测量跟踪距离来解释概率质量的这种扩散。与直觉相反,当两条轨迹竞争同一检测时,马式距离倾向于更大的不确定性,因为它有效地减少了检测的标准偏差到投影轨迹平均值的距离。这不是我们所期望的,因为它可能导致轨迹碎片增加和轨迹不稳定。因此,DeepSORT
引入级联匹配,优先考虑更常见的目标,以编码关联似然中概率扩散的概念。
在最后的匹配阶段,使用 SORT 算法中提出的 IoU 度量方法尝试关联未确认和年龄为 n = 1 n=1 n=1 的不匹配轨迹。 这有助于解决外观的突然变化,如静态场景几何体的部分遮挡,并且增加了针对错误初始化的鲁棒性。
以上方法的成功应用需要提前离线训练区分度高的特征嵌入。为此,DeepSORT 采用了一个在大规模行人重新识别数据集(MARS)上训练的 CNN,其中包含1261个行人的超过110万张图像,这使得它非常适合行人跟踪中的深度度量学习。
如下表所示,模型结构为宽残差网络(WRNS),其中有两个卷积层,后面是六个残差块。维度 128 128 128 的全局特征映射在 “Dense 10”层中计算。最终 BN 和 ℓ 2 \ell_2 ℓ2 规范化投影特征到单元超球面上从而与余弦外观度量兼容。网络参数量为2.67M,在 Nvidia GeForce GTX 1050移动 GPU 上,32个边界框的一次前向花费大约 30   ms 30\,\textrm{ms} 30ms。因此,只要有现代 GPU,该网络就非常适合在线跟踪。作者在 GitHub 仓库中提供了预先训练的模型以及可用于生成特征的脚本。
程序分为两部分:运行框架(application_util)和算法(deep_sort)。程序运行时由 Visualization 或者 NoVisualization 进行管理。算法主要实体为 Tracker、KalmanFilter、Track、NearestNeighborDistanceMetric 和 Detection。KalmanFilter 中自己定义了马氏距离的计算,NearestNeighborDistanceMetric 能够计算特征相似度。linear_assignment.py 中定义了阈值选通和匹配函数。
args = parse_args()
run(
args.sequence_dir, args.detection_file, args.output_file,
args.min_confidence, args.nms_max_overlap, args.min_detection_height,
args.max_cosine_distance, args.nn_budget, args.display)
解析命令行参数。
parser = argparse.ArgumentParser(description="Deep SORT")
parser.add_argument(
"--sequence_dir", help="Path to MOTChallenge sequence directory",
default=None, required=True)
parser.add_argument(
"--detection_file", help="Path to custom detections.", default=None,
required=True)
parser.add_argument(
"--output_file", help="Path to the tracking output file. This file will"
" contain the tracking results on completion.",
default="/tmp/hypotheses.txt")
parser.add_argument(
"--min_confidence", help="Detection confidence threshold. Disregard "
"all detections that have a confidence lower than this value.",
default=0.8, type=float)
parser.add_argument(
"--min_detection_height", help="Threshold on the detection bounding "
"box height. Detections with height smaller than this value are "
"disregarded", default=0, type=int)
parser.add_argument(
"--nms_max_overlap", help="Non-maxima suppression threshold: Maximum "
"detection overlap.", default=1.0, type=float)
parser.add_argument(
"--max_cosine_distance", help="Gating threshold for cosine distance "
"metric (object appearance).", type=float, default=0.2)
parser.add_argument(
"--nn_budget", help="Maximum size of the appearance descriptors "
"gallery. If None, no budget is enforced.", type=int, default=None)
parser.add_argument(
"--display", help="Show intermediate tracking results",
default=True, type=bool_string)
return parser.parse_args()
gather_sequence_info 收集序列信息,例如图像文件名、检测、标注(如果有的话)。
NearestNeighborDistanceMetric 最近邻距离度量,对于每个目标,返回到目前为止已观察到的任何样本的最近距离(欧式或余弦)。
由距离度量方法构造一个 Tracker。
seq_info = gather_sequence_info(sequence_dir, detection_file)
metric = nn_matching.NearestNeighborDistanceMetric(
"cosine", max_cosine_distance, nn_budget)
tracker = Tracker(metric)
results = []
嵌套定义回调函数,过滤检测结果,预测目标并进行更新。
create_detections 从原始检测矩阵创建给定帧索引的检测。
non_max_suppression 抑制重叠的检测。
def frame_callback(vis, frame_idx):
print("Processing frame %05d" % frame_idx)
# Load image and generate detections.
detections = create_detections(
seq_info["detections"], frame_idx, min_detection_height)
detections = [d for d in detections if d.confidence >= min_confidence]
# Run non-maxima suppression.
boxes = np.array([d.tlwh for d in detections])
scores = np.array([d.confidence for d in detections])
indices = preprocessing.non_max_suppression(
boxes, nms_max_overlap, scores)
detections = [detections[i] for i in indices]
Tracker.predict 将跟踪状态分布向前传播一步。
Tracker.update 执行测量更新和跟踪管理。
# Update tracker.
tracker.predict()
tracker.update(detections)
vis
为 Visualization 或者 NoVisualization。
Visualization.set_image 设置 ImageViewer。
Visualization.draw_detections 绘制检测框。
Visualization.draw_trackers 绘制跟踪框。
# Update visualization.
if display:
image = cv2.imread(
seq_info["image_filenames"][frame_idx], cv2.IMREAD_COLOR)
vis.set_image(image.copy())
vis.draw_detections(detections)
vis.draw_trackers(tracker.tracks)
Track.is_confirmed 检查该轨迹是否确认过。
Track.to_tlwh 以[x, y, width, height]
边界框格式获取当前位置。
# Store results.
for track in tracker.tracks:
if not track.is_confirmed() or track.time_since_update > 1:
continue
bbox = track.to_tlwh()
results.append([
frame_idx, track.track_id, bbox[0], bbox[1], bbox[2], bbox[3]])
根据序列信息创建一个 Visualization 或者 NoVisualization 对象。由其运行跟踪器。
update_ms
为 ImageViewer 刷新显示的最小间隔(包含了跟踪处理时间)。
# Run tracker.
if display:
visualizer = visualization.Visualization(seq_info, update_ms=5)
else:
visualizer = visualization.NoVisualization(seq_info)
visualizer.run(frame_callback)
# Store results.
f = open(output_file, 'w')
for row in results:
print('%d,%d,%.2f,%.2f,%.2f,%.2f,1,-1,-1,-1' % (
row[0], row[1], row[2], row[3], row[4], row[5]),file=f)
显示 OpenCV 图像查看器中的跟踪输出。
seq_info
主要包含图片大小和帧起止索引。
def __init__(self, seq_info, update_ms):
image_shape = seq_info["image_size"][::-1]
aspect_ratio = float(image_shape[1]) / image_shape[0]
image_shape = 1024, int(aspect_ratio * 1024)
self.viewer = ImageViewer(
update_ms, image_shape, "Figure %s" % seq_info["sequence_name"])
self.viewer.thickness = 2
self.frame_idx = seq_info["min_frame_idx"]
self.last_idx = seq_info["max_frame_idx"]
self.viewer.run(lambda: self._update_fun(frame_callback))
_update_fun 对 frame_callback 进行封装。根据帧索引判断是否终止,调用 frame_callback 进行处理。
if self.frame_idx > self.last_idx:
return False # Terminate
frame_callback(self, self.frame_idx)
self.frame_idx += 1
return True
self.viewer.image = image
create_unique_color_uchar 为给定的轨迹 ID(标签)创建唯一的 RGB 颜色代码。
self.viewer.thickness = 2
for track_id, box in zip(track_ids, boxes):
self.viewer.color = create_unique_color_uchar(track_id)
self.viewer.rectangle(*box.astype(np.int), label=str(track_id))
绘制红色检测框。
self.viewer.thickness = 2
self.viewer.color = 0, 0, 255
for i, detection in enumerate(detections):
self.viewer.rectangle(*detection.tlwh)
绘制目标轨迹,跳过未确认或者本次未检到的目标。
self.viewer.thickness = 2
for track in tracks:
if not track.is_confirmed() or track.time_since_update > 0:
continue
self.viewer.color = create_unique_color_uchar(track.track_id)
self.viewer.rectangle(
*track.to_tlwh().astype(np.int), label=str(track.track_id))
# self.viewer.gaussian(track.mean[:2], track.covariance[:2, :2],
# label="%d" % track.track_id)
参数:
metric
:NearestNeighborDistanceMetric 测量与轨迹关联的距离度量。max_age
:int,删除轨迹前的最大未命中数 A m a x A_{\mathrm{max}} Amax。n_init
:int,确认轨迹前的连续检测次数。如果前n_init
帧内发生未命中,则将轨迹状态设置为Deleted
。
def __init__(self, metric, max_iou_distance=0.7, max_age=30, n_init=3):
self.metric = metric
self.max_iou_distance = max_iou_distance
self.max_age = max_age
self.n_init = n_init
self.kf = kalman_filter.KalmanFilter()
self.tracks = []
self._next_id = 1
对于每个轨迹,由一个 KalmanFilter 预测状态分布。每个轨迹记录自己的均值和方差作为滤波器输入。
for track in self.tracks:
track.predict(self.kf)
调用 _match 进行级联匹配。
"""Perform measurement update and track management.
Parameters
----------
detections : List[deep_sort.detection.Detection]
A list of detections at the current time step.
"""
# Run matching cascade.
matches, unmatched_tracks, unmatched_detections = \
self._match(detections)
根据匹配结果更新轨迹集合。
# Update track set.
for track_idx, detection_idx in matches:
self.tracks[track_idx].update(
self.kf, detections[detection_idx])
for track_idx in unmatched_tracks:
self.tracks[track_idx].mark_missed()
for detection_idx in unmatched_detections:
self._initiate_track(detections[detection_idx])
self.tracks = [t for t in self.tracks if not t.is_deleted()]
传入特征列表及其对应 id,NearestNeighborDistanceMetric.partial_fit 构造一个活跃目标的特征字典。
# Update distance metric.
active_targets = [t.track_id for t in self.tracks if t.is_confirmed()]
features, targets = [], []
for track in self.tracks:
if not track.is_confirmed():
continue
features += track.features
targets += [track.track_id for _ in track.features]
track.features = []
self.metric.partial_fit(
np.asarray(features), np.asarray(targets), active_targets)
_match 实现了论文2.3. Matching Cascade 的内容。
内部嵌套定义 gated_metric 函数,由特征距离构建门矩阵。
d ( 2 ) ( i , j ) = min { 1 − r j ⊤ r k ( i ) ∥ r k ( i ) ∈ R i } b i , j ( 2 ) = 1 [ d ( 2 ) ( i , j ) ≤ t ( 2 ) ] \begin{aligned} d^{(2)}(i, j) &= \min\{1 - r^\top_j r^{(i)}_k \| r^{(i)}_k\in \mathcal{R}_i\}\\ b_{i,j}^{(2)} &= \mathbb{1}[d^{(2)}(i, j) \leq t^{(2)}] \end{aligned} d(2)(i,j)bi,j(2)=min{1−rj⊤rk(i)∥rk(i)∈Ri}=1[d(2)(i,j)≤t(2)]
NearestNeighborDistanceMetric.distance 计算 d ( 2 ) ( i , j ) d^{(2)}(i, j) d(2)(i,j)。
def gated_metric(tracks, dets, track_indices, detection_indices):
features = np.array([dets[i].feature for i in detection_indices])
targets = np.array([tracks[i].track_id for i in track_indices])
cost_matrix = self.metric.distance(features, targets)
cost_matrix = linear_assignment.gate_cost_matrix(
self.kf, cost_matrix, tracks, dets, track_indices,
detection_indices)
return cost_matrix
将轨迹集合拆分为已确认和未确认的,得到两个集合的索引。
Track.is_confirmed 查询轨迹的状态。
# Split track set into confirmed and unconfirmed tracks.
confirmed_tracks = [
i for i, t in enumerate(self.tracks) if t.is_confirmed()]
unconfirmed_tracks = [
i for i, t in enumerate(self.tracks) if not t.is_confirmed()]
matching_cascade 根据特征将检测框匹配到确认的轨迹。
传入门矩阵 B = [ b i , j ] \mathit{B} = [b_{i,j}] B=[bi,j] 而不是成本矩阵 C = [ c i , j ] \mathit{C} = [c_{i,j}] C=[ci,j]。
# Associate confirmed tracks using appearance features.
matches_a, unmatched_tracks_a, unmatched_detections = \
linear_assignment.matching_cascade(
gated_metric, self.metric.matching_threshold, self.max_age,
self.tracks, detections, confirmed_tracks)
min_cost_matching 使用匈牙利算法解决线性分配问题。
传入 iou_cost,尝试关联剩余的轨迹与未确认的轨迹。
# Associate remaining tracks together with unconfirmed tracks using IOU.
iou_track_candidates = unconfirmed_tracks + [
k for k in unmatched_tracks_a if
self.tracks[k].time_since_update == 1]
unmatched_tracks_a = [
k for k in unmatched_tracks_a if
self.tracks[k].time_since_update != 1]
matches_b, unmatched_tracks_b, unmatched_detections = \
linear_assignment.min_cost_matching(
iou_matching.iou_cost, self.max_iou_distance, self.tracks,
detections, iou_track_candidates, unmatched_detections)
matches = matches_a + matches_b
unmatched_tracks = list(set(unmatched_tracks_a + unmatched_tracks_b))
return matches, unmatched_tracks, unmatched_detections
KalmanFilter.initiate 由检测目标构建均值向量与协方差矩阵。
mean, covariance = self.kf.initiate(detection.to_xyah())
self.tracks.append(Track(
mean, covariance, self._next_id, self.n_init, self.max_age,
detection.feature))
self._next_id += 1
解决线性分配问题。
参数:
distance_metric
:Callable[List[Track], List[Detection], List[int], List[int]) -> ndarray
。距离度量给出了轨迹和检测的列表以及 N 个轨迹索引和 M 个检测索引的列表。度量应该返回 NxM 维度成本矩阵,其中元素(i,j)是给定轨迹索引中的第 i 个轨迹与给定的检测索引中的第 j 个检测之间的关联成本。max_distance
:门控阈值,float。忽略成本大于此值的关联。tracks
:列表[track.Track]
,当前时间步骤的预测轨迹列表。detections
:列表[detection.Detection]
当前时间步骤的检测列表。track_indices
:int 型列表。将cost_matrix
中的行映射到轨迹的轨迹索引列表track
(见上面的描述)。detection_indices
:int 型列表。 将cost_matrix
中的列映射到的检测索引列表detections
中的检测(见上面的描述)。返回值:
(List[(int, int)], List[int], List[int])
返回包含以下三个条目的元组:
if track_indices is None:
track_indices = np.arange(len(tracks))
if detection_indices is None:
detection_indices = np.arange(len(detections))
if len(detection_indices) == 0 or len(track_indices) == 0:
return [], track_indices, detection_indices # Nothing to match.
由距离度量指标计算成本矩阵。设置超过阈值max_distance
的成本为固定值,消除差异。
linear_assignment 关联检测框。
cost_matrix = distance_metric(
tracks, detections, track_indices, detection_indices)
cost_matrix[cost_matrix > max_distance] = max_distance + 1e-5
indices = linear_assignment(cost_matrix)
matches, unmatched_tracks, unmatched_detections = [], [], []
for col, detection_idx in enumerate(detection_indices):
if col not in indices[:, 1]:
unmatched_detections.append(detection_idx)
for row, track_idx in enumerate(track_indices):
if row not in indices[:, 0]:
unmatched_tracks.append(track_idx)
for row, col in indices:
track_idx = track_indices[row]
detection_idx = detection_indices[col]
if cost_matrix[row, col] > max_distance:
unmatched_tracks.append(track_idx)
unmatched_detections.append(detection_idx)
else:
matches.append((track_idx, detection_idx))
return matches, unmatched_tracks, unmatched_detections
if track_indices is None:
track_indices = list(range(len(tracks)))
if detection_indices is None:
detection_indices = list(range(len(detections)))
初始化匹配集matches
M ← ∅ M \gets \emptyset M←∅
未匹配检测集unmatched_detections
U ← D U \gets D U←D
unmatched_detections = detection_indices
matches = []
f o r   n ∈ { 1 , … , A m a x }   d o Select tracks by age T n ← { i ∈ T ∣ a i = n } \begin{aligned} \mathbf{for}& \, n\in\{1,\dots,A_{\rm max}\}\, \mathbf{do}\\ &\text{Select tracks by age } T_n \gets \{i \in T \mid a_i = n\} \end{aligned} forn∈{1,…,Amax}doSelect tracks by age Tn←{i∈T∣ai=n}
for level in range(cascade_depth):
if len(unmatched_detections) == 0: # No detections left
break
track_indices_l = [
k for k in track_indices
if tracks[k].time_since_update == 1 + level
]
if len(track_indices_l) == 0: # Nothing to match at this level
continue
[ x i , j ] ← min_cost_matching ( C , T n , U ) M ← M ∪ { ( i , j ) ∣ b i , j ⋅ x i , j > 0 } U ← U ∖ { j ∣ ∑ i b i , j ⋅ x i , j > 0 } [x_{i,j}] \gets \text{min\_cost\_matching}(\mathit{C}, \mathcal{T}_n, \mathcal{U})\\ \mathcal{M} \gets \mathcal{M} \cup \{(i, j) \mid b_{i,j}\cdot x_{i,j} > 0 \}\\ \mathcal{U} \gets \mathcal{U} \setminus \{j \mid \sum_i b_{i,j}\cdot x_{i,j} > 0\} [xi,j]←min_cost_matching(C,Tn,U)M←M∪{(i,j)∣bi,j⋅xi,j>0}U←U∖{j∣i∑bi,j⋅xi,j>0}
min_cost_matching 输出的匹配直接满足 b i , j > 0 b_{i,j}>0 bi,j>0。
matches_l, _, unmatched_detections = \
min_cost_matching(
distance_metric, max_distance, tracks, detections,
track_indices_l, unmatched_detections)
matches += matches_l
unmatched_tracks = list(set(track_indices) - set(k for k, _ in matches))
return matches, unmatched_tracks, unmatched_detections
基于卡尔曼滤波获得的状态分布,使成本矩阵中的不可行条目无效。
参数:
kf
:卡尔曼滤波器。cost_matrix
:ndarray NxM 维度成本矩阵,其中 N 是轨迹索引的数量,M 是检测索引的数量,使得条目(i, j)是track[track_indices [i]]
和detections[detection_indices[j]]
关联成本。tracks
:列表 [track.Track],当前时间点的预测轨迹列表。detections
:列表 [detection.Detection],当前时间步骤的检测列表。track_indices
:List [int] 将cost_matrix
中的行映射到轨道的轨道索引列表track
(见上面的描述)。detection_indices
:List [int] 将“cost_matrix”中的列映射到“检测”中的检测的检测索引列表(参见上面的描述)。gated_cost
:可选[float] 与不可行关联对应的成本矩阵中的条目设置为此值。默认为非常大的值。only_position
:可选[bool] 如果为True,则在门控期间仅考虑状态分布的x,y位置。默认为False。返回值:
chi2inv95 具有N个自由度的卡方分布的0.95分位数的表(包含N=1, ..., 9
的值)。 取自 MATLAB/Octave 的 chi2inv 函数并用作 Mahalanobis 门控阈值。
KalmanFilter.gating_distance 计算状态分布和测量之间的选通距离。
gating_dim = 2 if only_position else 4
gating_threshold = kalman_filter.chi2inv95[gating_dim]
measurements = np.asarray(
[detections[i].to_xyah() for i in detection_indices])
for row, track_idx in enumerate(track_indices):
track = tracks[track_idx]
gating_distance = kf.gating_distance(
track.mean, track.covariance, measurements, only_position)
cost_matrix[row, gating_distance > gating_threshold] = gated_cost
return cost_matrix
最近邻距离度量。对于每个目标,返回到目前为止已观察到的所有样本的最近距离。
参数:
metric
:str 类型,“euclidean"或者是"cosine”。
matching_threshold
:float 型,匹配阈值。将距离较大的样本视为无效匹配。
budget
:int 型(可选),如果不是 None
,则最多每个类抽样到为此数字。达到预算时删除最旧的样本。
属性:
samples
:Dict [int - > List [ndarray]],从目标身份映射到目前已观察到的样本列表的字典。
def __init__(self, metric, matching_threshold, budget=None):
if metric == "euclidean":
self._metric = _nn_euclidean_distance
elif metric == "cosine":
self._metric = _nn_cosine_distance
else:
raise ValueError(
"Invalid metric; must be either 'euclidean' or 'cosine'")
self.matching_threshold = matching_threshold
self.budget = budget
self.samples = {}
使用新数据更新距离指标。
参数:
features
:ndarray 类型,具有维数 M 的 N 个特征的 N×M 矩阵。targets
:ndarray 类型,关联目标标识的整数数组。active_targets
:int 型列表,场景中当前存在的目标列表。setdefault 如果字典存在键key
,返回它的值。如果不存在,插入值为default
的键key
,并返回default
。default
默认为None
。
由目标及对应特征构造样本字典self.samples
并剔除其中不活跃的。
for feature, target in zip(features, targets):
self.samples.setdefault(target, []).append(feature)
if self.budget is not None:
self.samples[target] = self.samples[target][-self.budget:]
self.samples = {k: self.samples[k] for k in active_targets}
具有绘图程序和视频捕获功能的图像查看器。
def __init__(self, update_ms, window_shape=(640, 480), caption="Figure 1"):
self._window_shape = window_shape
self._caption = caption
self._update_ms = update_ms
self._video_writer = None
self._user_fun = lambda: None
self._terminate = False
self.image = np.zeros(self._window_shape + (3, ), dtype=np.uint8)
self._color = (0, 0, 0)
self.text_color = (255, 255, 255)
self.thickness = 1
@property
def color(self):
return self._color
@color.setter
def color(self, value):
if len(value) != 3:
raise ValueError("color must be tuple of 3")
self._color = tuple(int(c) for c in value)
绘制一个矩形。输入矩形参数格式为[x, y, w, h]
,在矩形左上角放置文本标签。
pt1 = int(x), int(y)
pt2 = int(x + w), int(y + h)
cv2.rectangle(self.image, pt1, pt2, self._color, self.thickness)
if label is not None:
text_size = cv2.getTextSize(
label, cv2.FONT_HERSHEY_PLAIN, 1, self.thickness)
center = pt1[0] + 5, pt1[1] + 5 + text_size[0][1]
pt2 = pt1[0] + 10 + text_size[0][0], pt1[1] + 10 + \
text_size[0][1]
cv2.rectangle(self.image, pt1, pt2, self._color, -1)
cv2.putText(self.image, label, center, cv2.FONT_HERSHEY_PLAIN,
1, (255, 255, 255), self.thickness)
绘制圆圈。
image_size = int(radius + self.thickness + 1.5) # actually half size
roi = int(x - image_size), int(y - image_size), \
int(2 * image_size), int(2 * image_size)
if not is_in_bounds(self.image, roi):
return
image = view_roi(self.image, roi)
center = image.shape[1] // 2, image.shape[0] // 2
cv2.circle(
image, center, int(radius + .5), self._color, self.thickness)
if label is not None:
cv2.putText(
self.image, label, center, cv2.FONT_HERSHEY_PLAIN,
2, self.text_color, 2)
绘制二维高斯分布的95%置信椭圆。
# chi2inv(0.95, 2) = 5.9915
vals, vecs = np.linalg.eigh(5.9915 * covariance)
indices = vals.argsort()[::-1]
vals, vecs = np.sqrt(vals[indices]), vecs[:, indices]
center = int(mean[0] + .5), int(mean[1] + .5)
axes = int(vals[0] + .5), int(vals[1] + .5)
angle = int(180. * np.arctan2(vecs[1, 0], vecs[0, 0]) / np.pi)
cv2.ellipse(
self.image, center, axes, angle, 0, 360, self._color, 2)
if label is not None:
cv2.putText(self.image, label, center, cv2.FONT_HERSHEY_PLAIN,
2, self.text_color, 2)
cv2.putText(self.image, text, (int(x), int(y)), cv2.FONT_HERSHEY_PLAIN,
2, self.text_color, 2)
if not skip_index_check:
cond1, cond2 = points[:, 0] >= 0, points[:, 0] < 480
cond3, cond4 = points[:, 1] >= 0, points[:, 1] < 640
indices = np.logical_and.reduce((cond1, cond2, cond3, cond4))
points = points[indices, :]
if colors is None:
colors = np.repeat(
self._color, len(points)).reshape(3, len(points)).T
indices = (points + .5).astype(np.int)
self.image[indices[:, 1], indices[:, 0], :] = colors
fourcc = cv2.VideoWriter_fourcc(*fourcc_string)
if fps is None:
fps = int(1000. / self._update_ms)
self._video_writer = cv2.VideoWriter(
output_filename, fourcc, fps, self._window_shape)
def disable_videowriter(self):
""" Disable writing videos.
"""
self._video_writer = None
启动图像查看器。此方法将阻塞,直到用户请求关闭窗口。
运行传入的函数,保存视频并显示。enable_videowriter 函数会创建 VideoWriter。
if update_fun is not None:
self._user_fun = update_fun
self._terminate, is_paused = False, False
# print("ImageViewer is paused, press space to start.")
while not self._terminate:
t0 = time.time()
if not is_paused:
self._terminate = not self._user_fun()
if self._video_writer is not None:
self._video_writer.write(
cv2.resize(self.image, self._window_shape))
t1 = time.time()
remaining_time = max(1, int(self._update_ms - 1e3*(t1-t0)))
cv2.imshow(
self._caption, cv2.resize(self.image, self._window_shape[:2]))
key = cv2.waitKey(remaining_time)
if key & 255 == 27: # ESC
print("terminating")
self._terminate = True
elif key & 255 == 32: # ' '
print("toggeling pause: " + str(not is_paused))
is_paused = not is_paused
elif key & 255 == 115: # 's'
print("stepping")
self._terminate = not self._user_fun()
is_paused = True
销毁窗口后重新调用imshow
。
# Due to a bug in OpenCV we must call imshow after destroying the
# window. This will make the window appear again as soon as waitKey
# is called.
#
# see https://github.com/Itseez/opencv/issues/4535
self.image[:] = 0
cv2.destroyWindow(self._caption)
cv2.waitKey(1)
cv2.imshow(self._caption, self.image)
self._terminate = True
一种简单的卡尔曼滤波器,用于跟踪图像空间中的边界框。8维状态空间[x, y, a, h, vx, vy, va, vh]
包含边界框中心位置(x, y)
,纵横比a
,高度h
和它们各自的速度。物体运动遵循等速模型。 边界框位置(x, y, a, h)
被视为状态空间的直接观察(线性观察模型)。
创建卡尔曼滤波器模型矩阵self._motion_mat
和self._update_mat
。
def __init__(self):
ndim, dt = 4, 1.
# Create Kalman filter model matrices.
self._motion_mat = np.eye(2 * ndim, 2 * ndim)
for i in range(ndim):
self._motion_mat[i, ndim + i] = dt
self._update_mat = np.eye(ndim, 2 * ndim)
依据当前状态估计(高度)选择运动和观测不确定性。这些权重控制模型中的不确定性。这有点 hacky。
# Motion and observation uncertainty are chosen relative to the current
# state estimate. These weights control the amount of uncertainty in
# the model. This is a bit hacky.
self._std_weight_position = 1. / 20
self._std_weight_velocity = 1. / 160
由测量初始化均值向量(8维)和协方差矩阵(8x8维)。
numpy.r_ 沿第一轴连接切片对象。
mean_pos = measurement
mean_vel = np.zeros_like(mean_pos)
mean = np.r_[mean_pos, mean_vel]
std = [
2 * self._std_weight_position * measurement[3],
2 * self._std_weight_position * measurement[3],
1e-2,
2 * self._std_weight_position * measurement[3],
10 * self._std_weight_velocity * measurement[3],
10 * self._std_weight_velocity * measurement[3],
1e-5,
10 * self._std_weight_velocity * measurement[3]]
covariance = np.diag(np.square(std))
return mean, covariance
卡尔曼滤波器由目标上一时刻的均值和协方差进行预测。
motion_cov
是过程噪声 W k W_k Wk 协方差矩阵 Q k Q_k Qk
std_pos = [
self._std_weight_position * mean[3],
self._std_weight_position * mean[3],
1e-2,
self._std_weight_position * mean[3]]
std_vel = [
self._std_weight_velocity * mean[3],
self._std_weight_velocity * mean[3],
1e-5,
self._std_weight_velocity * mean[3]]
motion_cov = np.diag(np.square(np.r_[std_pos, std_vel]))
x ^ k ∣ k − 1 = F k x ^ k − 1 ∣ k − 1 + B k u k P k ∣ k − 1 = F k P k − 1 ∣ k − 1 F k ⊤ + Q k \begin{aligned} \hat{\mathrm{x}}_{k|k-1} &= F_k\hat{\mathrm{x}}_{k-1|k-1}+B_k u_k\\ P_{k|k-1}&= F_k P_{k-1|k-1}F^\top_k+Q_k \end{aligned} x^k∣k−1Pk∣k−1=Fkx^k−1∣k−1+Bkuk=FkPk−1∣k−1Fk⊤+Qk
self._motion_mat
为 F k F_k Fk 是作用在 x k − 1 \mathrm{x}_{k-1} xk−1 上的状态变换模型(/矩阵/矢量)。
B k B_k Bk 是作用在控制器向量 u k u_k uk 上的输入-控制模型。
covariance
为 P k ∣ k P_{k|k} Pk∣k,后验估计误差协方差矩阵,度量估计值的精确程度。
mean = np.dot(self._motion_mat, mean)
covariance = np.linalg.multi_dot((
self._motion_mat, covariance, self._motion_mat.T)) + motion_cov
return mean, covariance
投影状态分布到测量空间。
参数:
mean
:ndarray,状态的平均向量(8维数组)。covariance
:ndarray,状态的协方差矩阵(8x8维)。返回(ndarray,ndarray),返回给定状态估计的预计平均值和协方差矩阵。
numpy.linalg.multi_dot 在单个函数调用中计算两个或多个数组的点积,同时自动选择最快的求值顺序。
std = [
self._std_weight_position * mean[3],
self._std_weight_position * mean[3],
1e-1,
self._std_weight_position * mean[3]]
innovation_cov = np.diag(np.square(std))
mean = np.dot(self._update_mat, mean)
covariance = np.linalg.multi_dot((
self._update_mat, covariance, self._update_mat.T))
return mean, covariance + innovation_cov
projected_mean, projected_cov = self.project(mean, covariance)
chol_factor, lower = scipy.linalg.cho_factor(
projected_cov, lower=True, check_finite=False)
kalman_gain = scipy.linalg.cho_solve(
(chol_factor, lower), np.dot(covariance, self._update_mat.T).T,
check_finite=False).T
innovation = measurement - projected_mean
x ^ k ∣ k = x ^ k ∣ k − 1 + K k y ~ k P k ∣ k = ( I − K k H k ) P k ∣ k − 1 \begin{aligned} \hat{\mathrm{x}}_{k|k} &= \hat{\mathrm{x}}_{k|k-1}+K_k \tilde{\mathrm{y}}_k\\ P_{k|k}&= (I- K_{k}H_k)P_{k|k-1} \end{aligned} x^k∣kPk∣k=x^k∣k−1+Kky~k=(I−KkHk)Pk∣k−1
new_mean = mean + np.dot(innovation, kalman_gain.T)
new_covariance = covariance - np.linalg.multi_dot((
kalman_gain, projected_cov, kalman_gain.T))
return new_mean, new_covariance
计算状态分布和测量之间的选通距离。可以从 chi2inv95 获得合适的距离阈值。如果only_position
为 False,则卡方分布具有4个自由度,否则为2。
参数:
mean
:ndarray,状态分布上的平均向量(8维)。covariance
:ndarray,状态分布的协方差(8x8维)。measurements
:ndarray,N 个测量的 N×4维矩阵,每个矩阵的格式为(x,y,a,h),其中(x,y)是边界框中心位置,纵横比和h高度。only_position
:可选[bool],如果为True,则相对于边界进行距离计算盒子中心位置。返回,ndarray,返回一个长度为N的数组,其中第i个元素包含(mean,covariance)和measurements [i]
之间的平方Mahalanobis距离。
numpy.linalg.cholesky Cholesky 分解。返回方阵a
的 Cholesky 分解 L ∗ L . H L * L.H L∗L.H,其中 L L L 是下三角形, . H .H .H 是共轭转置算子(如果a
是实值则是普通转置)。 a
必须是 Hermitian(对称的,如果是实值的)和正定的。实际只返回 L L L。
scipy.linalg.solve_triangular 假设a
是三角阵,求解x
的等式a x = b
。
mean, covariance = self.project(mean, covariance)
if only_position:
mean, covariance = mean[:2], covariance[:2, :2]
measurements = measurements[:, :2]
cholesky_factor = np.linalg.cholesky(covariance)
d = measurements - mean
z = scipy.linalg.solve_triangular(
cholesky_factor, d.T, lower=True, check_finite=False,
overwrite_b=True)
squared_maha = np.sum(z * z, axis=0)
return squared_maha
Challenges on Large Scale Surveillance Video Analysis 跟踪与 DeepSORT 类似,但使用 Re-ranking Person Re-identification with k-reciprocal Encoding 方法。