目标跟踪(7)使用 OpenCV 进行简单的对象跟踪

1.简述

目标跟踪的过程是:

  • 1.获取对象检测的初始集(例如边界框坐标的输入集)
  • 2.为每个初始检测创建唯一的ID
  • 3.然后跟踪每一个在视频中移动的对象,保持唯一ID的分配

此外,对象跟踪允许我们为每个跟踪对象应用唯一 ID,从而使我们能够计算视频中的唯一对象。对象跟踪对于构建人员计数器至关重要。

理想的目标跟踪算法是:

  • 1.只要求一次对象检测阶段(即,当对象最初被检测时)
  • 2.将非常快-比运行实际的物体检测器本身快得多
  • 3.能够处理当跟踪对象“消失”或移动到视频帧的边界之外的情况
  • 4.抗遮挡能力强
  • 5.能够拾取它在帧间“丢失”的对象

这对于任何计算机视觉或图像处理算法来说都是一项艰巨的任务,我们可以使用各种技巧来帮助改进我们的对象跟踪器。

在今天的博文中,你将学习如何使用OpenCV实现质心跟踪,质心跟踪是一种简单易懂但高效的跟踪算法。

质心跟踪依赖于视频中(1)已有的目标质心(即质心跟踪器已经见过的目标)与(2)后续帧之间的新目标质心之间的欧氏距离。

我们将在下一节更深入地回顾质心算法。从那里我们将实现一个 Python 类来包含我们的质心跟踪算法,然后创建一个 Python 脚本来实际运行对象跟踪器并将其应用于输入视频。

最后,我们将运行我们的对象跟踪器并检查结果,同时指出该算法的优点和缺点。

2.质心跟踪算法

质心跟踪算法是一个多步骤的过程。我们将回顾本节中的每个跟踪步骤。

步骤#1:接受边界框坐标并计算质心

目标跟踪(7)使用 OpenCV 进行简单的对象跟踪_第1张图片
要使用质心跟踪构建简单的对象跟踪算法,第一步是从对象检测器获得边界框坐标并使用它们来计算质心。

质心跟踪算法假设我们为每一帧中的每个检测到的对象传入一组边界框 (x, y) 坐标。

这些边界框可以由您想要的任何类型的对象检测器(颜色阈值 + 轮廓提取、Haar 级联、HOG + 线性 SVM、SSD、Faster R-CNN 等)生成。

一旦我们有了边界框坐标,我们就必须计算“质心”,或者更简单地说,计算边界框的中心 (x, y) 坐标。上面的图演示了接受一组边界框坐标并计算质心。

由于这些是呈现给我们算法的第一组初始边界框,我们将为它们分配唯一的 ID。

步骤#2:计算新边界框和现有对象之间的欧几里得距离

目标跟踪(7)使用 OpenCV 进行简单的对象跟踪_第2张图片
此图像中存在三个对象。我们需要计算每对原始质心(紫色)和新质心(黄色)之间的欧几里得距离。

对于视频流中随后的每一帧,我们应用步骤#1计算对象质心;然而,我们首先需要确定是否可以将新的对象质心(黄色)与旧的对象质心(紫色)相关联,而不是为每个检测到的对象分配一个新的唯一ID(这将违背对象跟踪的目的)。为了完成这个过程,我们计算每对现有对象质心和输入对象质心之间的欧几里德距离(绿色或红色箭头突出显示)。

然后我们计算每对原始质心(紫色)和新质心(黄色)之间的欧几里得距离。但是我们如何使用这些点之间的欧几里得距离来实际匹配它们并关联它们呢?

答案在Step #3

步骤 #3:更新现有对象的 (x, y) 坐标

目标跟踪(7)使用 OpenCV 进行简单的对象跟踪_第3张图片
我们简单的质心对象跟踪方法将对象与最小的对象距离相关联。我们如何处理上图左下角的对象呢?

质心跟踪算法的主要假设是给定对象可能会在后续帧之间移动,但帧的质心之间的距离将小于对象之间的所有其他距离。

因此,如果我们选择将质心与后续帧之间的最小距离相关联,我们可以构建我们的对象跟踪器。

但是左下角的孤独点呢?

步骤#4:注册新对象

目标跟踪(7)使用 OpenCV 进行简单的对象跟踪_第4张图片
在我们使用 Python 和 OpenCV 进行对象跟踪的示例中,我们有一个与现有对象不匹配的新对象,因此它被注册为对象 ID #3。

如果输入检测比跟踪的现有对象多,我们需要注册新对象。 “注册”只是意味着我们通过以下方式将新对象添加到我们的跟踪对象列表中:

  • 为其分配一个新的对象 ID
  • 存储该对象的边界框坐标的质心

然后我们可以返回到步骤#2,并为视频流中的每一帧重复步骤管道。

步骤#5:注销旧对象

任何合理的对象跟踪算法都需要能够处理对象丢失、消失或离开视野的情况。

您如何处理这些情况实际上取决于您的对象跟踪器的部署位置,但是对于此实现,当旧对象无法与任何现有对象匹配总共 N 个后续帧时,我们将取消注册。

3.对象跟踪项目结构

要在终端中查看今天的项目结构,只需使用 tree 命令:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── centroidtracker.py
├── object_tracker.py
├── deploy.prototxt
└── res10_300x300_ssd_iter_140000.caffemodel

4.使用 OpenCV 实现质心跟踪

在我们可以对输入视频流应用对象跟踪之前,我们首先需要实现质心跟踪算法。当您消化这个质心跟踪器脚本时,请记住上面的步骤 1-5,并根据需要查看这些步骤。

正如您将看到的,将步骤转换为代码需要很多思考,虽然我们执行所有步骤,但由于我们各种数据结构和代码结构的性质,它们不是线性的。

我会建议 :

  • 阅读上面的步骤
  • 阅读质心跟踪器的代码说明
  • 最后再次阅读上述步骤

一旦你确定你理解了质心跟踪算法的步骤,打开 pyimagesearch 模块中的 centroidtracker.py,让我们回顾一下代码:

# import the necessary packages
from scipy.spatial import distance as dist
from collections import OrderedDict
import numpy as np
class CentroidTracker():
	def __init__(self, maxDisappeared=50):
		# initialize the next unique object ID along with two ordered
		# dictionaries used to keep track of mapping a given object
		# ID to its centroid and number of consecutive frames it has
		# been marked as "disappeared", respectively
		self.nextObjectID = 0
		self.objects = OrderedDict()
		self.disappeared = OrderedDict()
		# store the number of maximum consecutive frames a given
		# object is allowed to be marked as "disappeared" until we
		# need to deregister the object from tracking
		self.maxDisappeared = maxDisappeared

我们导入我们需要的包和模块—— distance 、 OrderedDict 和 numpy 。

首先我们定义CentroidTracker 类。构造函数接受一个参数,即跟踪器可以容忍的给定对象丢失/消失的最大连续帧数。

我们的构造函数构建了四个类变量:

  • nextObjectID:用于为每个对象分配唯一 ID 的计数器。如果对象离开帧并且在 maxDisappeared 帧中没有返回,则将分配一个新的(下一个)对象 ID。
  • objects:对象 ID 作为键和质心 (x, y) 坐标作为值的字典
  • disappeared:保存特定对象 ID(键)已被标记为“丢失”的连续帧数(值)
  • maxDisappeared:在我们取消注册该对象之前,允许将对象标记为“丢失/消失”的连续帧数。

让我们定义负责向我们的跟踪器添加新对象的 register 方法:

	def register(self, centroid):
		# when registering an object we use the next available object
		# ID to store the centroid
		self.objects[self.nextObjectID] = centroid
		self.disappeared[self.nextObjectID] = 0
		self.nextObjectID += 1

定义register 方法,它接受一个质心centroid,然后使用下一个可用的对象 ID 将其添加到objects字典中。 对象消失的次数在disappeared字典中初始化为 0。 最后,我们递增 nextObjectID,这样如果一个新对象进入视野,它将与一个唯一 ID 相关联。 与我们的register方法类似,我们也需要一个deregister方法:

	def deregister(self, objectID):
		# to deregister an object ID we delete the object ID from
		# both of our respective dictionaries
		del self.objects[objectID]
		del self.disappeared[objectID]

就像我们可以向跟踪器添加新对象一样,我们还需要能够从输入帧中删除丢失或消失的旧对象。

定义deregister 方法,它简单地分别删除objectsdisappeared字典中的 objectID

我们的质心跟踪器实现的核心位于update方法中:

	def update(self, rects):
		# check to see if the list of input bounding box rectangles
		# is empty
		if len(rects) == 0:
			# loop over any existing tracked objects and mark them
			# as disappeared
			for objectID in list(self.disappeared.keys()):
				self.disappeared[objectID] += 1
				# if we have reached a maximum number of consecutive
				# frames where a given object has been marked as
				# missing, deregister it
				if self.disappeared[objectID] > self.maxDisappeared:
					self.deregister(objectID)
			# return early as there are no centroids or tracking info
			# to update
			return self.objects

定义的更新方法接受边界框矩形列表,可能来自对象检测器(Haar 级联、HOG + 线性 SVM、SSD、Faster R-CNN 等)。 rects 参数的格式假定为具有以下结构的元组: (startX, startY, endX, endY)

如果没有检测到,我们将遍历所有对象 ID 并增加它们的disappeared计数。我们还将检查是否已达到给定对象被标记为丢失的最大连续帧数。如果是这种情况,我们需要将其从我们的跟踪系统中删除。由于没有要更新的跟踪信息,我们继续前进并提前return

否则,在接下来的7个update方法的代码块中,我们有很多工作要做:

		# initialize an array of input centroids for the current frame
		inputCentroids = np.zeros((len(rects), 2), dtype="int")
		# loop over the bounding box rectangles
		for (i, (startX, startY, endX, endY)) in enumerate(rects):
			# use the bounding box coordinates to derive the centroid
			cX = int((startX + endX) / 2.0)
			cY = int((startY + endY) / 2.0)
			inputCentroids[i] = (cX, cY)

我们将初始化一个 NumPy 数组inputCentroids 来存储每个 rect 的质心。

然后,我们遍历边界框矩形并计算质心并将其存储在 inputCentroids 列表中。

如果当前没有我们正在跟踪的对象,我们将注册每个新对象:

		# if we are currently not tracking any objects take the input
		# centroids and register each of them
		if len(self.objects) == 0:
			for i in range(0, len(inputCentroids)):
				self.register(inputCentroids[i])

否则,我们需要根据最小化它们之间欧几里得距离的质心位置来更新任何现有对象 (x, y) 坐标:

		# otherwise, are are currently tracking objects so we need to
		# try to match the input centroids to existing object
		# centroids
		else:
			# grab the set of object IDs and corresponding centroids
			objectIDs = list(self.objects.keys())
			objectCentroids = list(self.objects.values())
			# compute the distance between each pair of object
			# centroids and input centroids, respectively -- our
			# goal will be to match an input centroid to an existing
			# object centroid
			D = dist.cdist(np.array(objectCentroids), inputCentroids)
			# in order to perform this matching we must (1) find the
			# smallest value in each row and then (2) sort the row
			# indexes based on their minimum values so that the row
			# with the smallest value is at the *front* of the index
			# list
			rows = D.min(axis=1).argsort()
			# next, we perform a similar process on the columns by
			# finding the smallest value in each column and then
			# sorting using the previously computed row index list
			cols = D.argmin(axis=1)[rows]

对现有跟踪对象的更新从else 开始。目标是跟踪对象并保持正确的对象 ID——这个过程是通过计算所有 objectCentroidsinputCentroids 对之间的欧几里德距离来完成的,然后关联最小化欧几里得距离的对象 ID。

在else 块中,我们将:

  • 获取 objectIDobjectCentroid
  • 计算每对现有对象质心和新输入质心之间的距离。我们的距离图 D 的输出形状将是 (# of object centroids, # of input centroids)
  • 要执行匹配,我们必须 (1) 找到每行中的最小值,以及 (2) 根据最小值对行索引进行排序。我们对列执行非常相似的过程,在每列中找到最小值,然后根据有序行对它们进行排序。我们的目标是在列表的前面有最小对应距离的索引值。

下一步是使用距离来查看我们是否可以关联对象 ID:

			# in order to determine if we need to update, register,
			# or deregister an object we need to keep track of which
			# of the rows and column indexes we have already examined
			usedRows = set()
			usedCols = set()
			# loop over the combination of the (row, column) index
			# tuples
			for (row, col) in zip(rows, cols):
				# if we have already examined either the row or
				# column value before, ignore it
				# val
				if row in usedRows or col in usedCols:
					continue
				# otherwise, grab the object ID for the current row,
				# set its new centroid, and reset the disappeared
				# counter
				objectID = objectIDs[row]
				self.objects[objectID] = inputCentroids[col]
				self.disappeared[objectID] = 0
				# indicate that we have examined each of the row and
				# column indexes, respectively
				usedRows.add(row)
				usedCols.add(col)

在上面的代码块中,我们:

  • 初始化两个集合以确定我们已经使用了哪些行和列索引。请记住,集合类似于列表,但它只包含唯一值。
  • 然后我们遍历 (row, col) 索引元组的组合以更新我们的对象质心:
    • 如果我们已经使用了此行或列索引,请忽略它并继续循环。
    • 否则,我们找到了一个输入质心:
      • 1.到现有质心的欧几里得距离最小
      • 2.并且没有与任何其他对象匹配
      • 在这种情况下,我们更新对象质心并确保将 row 和 col 添加到它们各自的 usedRowsusedCols 集中

在我们的 usedRows + usedCols 集合中可能有我们尚未检查的索引:

			# compute both the row and column index we have NOT yet
			# examined
			unusedRows = set(range(0, D.shape[0])).difference(usedRows)
			unusedCols = set(range(0, D.shape[1])).difference(usedCols)

所以我们必须确定哪些质心索引我们还没有检查,并将它们存储在两个新的集合中(unusedRows 和unusedCols

我们的最后处理任何丢失或可能消失的对象:

			# in the event that the number of object centroids is
			# equal or greater than the number of input centroids
			# we need to check and see if some of these objects have
			# potentially disappeared
			if D.shape[0] >= D.shape[1]:
				# loop over the unused row indexes
				for row in unusedRows:
					# grab the object ID for the corresponding row
					# index and increment the disappeared counter
					objectID = objectIDs[row]
					self.disappeared[objectID] += 1
					# check to see if the number of consecutive
					# frames the object has been marked "disappeared"
					# for warrants deregistering the object
					if self.disappeared[objectID] > self.maxDisappeared:
						self.deregister(objectID)

最后:

  • 如果对象质心的数量大于或等于输入质心的数量:
    • 我们需要通过遍历未使用的行索引(如果有)来验证这些对象是否丢失或消失。
    • 在循环中,我们将:
      • 1.增加他们在字典中disappeared次数。
      • 2.检查disappeared计数是否超过 maxDisappeared 阈值,如果是,我们将注销该对象。

否则,输入质心的数量大于现有对象质心的数量,因此我们有新的对象要注册和跟踪:

			# otherwise, if the number of input centroids is greater
			# than the number of existing object centroids we need to
			# register each new input centroid as a trackable object
			else:
				for col in unusedCols:
					self.register(inputCentroids[col])
		# return the set of trackable objects
		return self.objects

我们循环遍历unusedCols 索引并注册每个新质心。最后,我们将可跟踪对象集返回给调用方法。

5.了解质心跟踪距离关系

我们的质心跟踪实现很长,诚然,这是算法中最令人困惑的方面。

如果您在跟随该代码的操作时遇到问题,您应该考虑打开 Python shell 并执行以下实验:

>>> from scipy.spatial import distance as dist
>>> import numpy as np
>>> np.random.seed(42)
>>> objectCentroids = np.random.uniform(size=(2, 2))
>>> centroids = np.random.uniform(size=(3, 2))
>>> D = dist.cdist(objectCentroids, centroids)
>>> D
array([[0.82421549, 0.32755369, 0.33198071],
       [0.72642889, 0.72506609, 0.17058938]])

结果是具有两行(# of existing object centroids)和三列(# of new input centroids)的距离矩阵 D。

就像我们之前在脚本中所做的那样,让我们​​找到每行中的最小距离并根据该值对索引进行排序:

>>> D.min(axis=1)
array([0.32755369, 0.17058938])
>>> rows = D.min(axis=1).argsort()
>>> rows
array([1, 0])

首先,我们找到每一行的最小值,让我们能够确定哪个现有对象最接近新的输入质心。然后对这些值进行排序,我们可以获得这些行的索引。

对列使用类似的过程:

>>> D.argmin(axis=1)
array([1, 2])
>>> cols = D.argmin(axis=1)[rows]
>>> cols
array([2, 1])

我们首先检查列中的值并找到具有最小列的值的索引。然后,我们使用现有的rows对这些值排序。

让我们打印结果并分析它们:

>>> print(list(zip(rows, cols)))
[(1, 2), (0, 1)]

分析结果,我们发现:

  • D[1, 2] 具有最小的欧几里得距离,这意味着第二个现有对象将与第三个输入质心匹配。
  • 并且 D[0, 1] 具有下一个最小的欧几里德距离,这意味着第一个现有对象将与第二个输入质心匹配。

6.实现对象跟踪程序脚本

现在我们已经实现了 CentroidTracker 类,让我们将其与对象跟踪程序脚本一起使用。

在程序脚本中,您可以使用自己喜欢的对象检测器,前提是它生成一组包围框。这可以是Haar级联,HOG +线性支持向量机,YOLO, SSD, Faster R-CNN等。对于这个示例脚本,我将使用OpenCV的深度学习人脸检测器,但您可以自行制作实现不同检测器的脚本版本。

在这个脚本中,我们将:

  • 使用实时 VideoStream 对象从您的网络摄像头中抓取帧
  • 加载并使用 OpenCV 的深度学习人脸检测器
  • 实例化我们的 CentroidTracker 并使用它来跟踪视频流中的人脸对象
  • 并显示我们的结果,其中包括覆盖在帧上的边界框和对象 ID 注释

当你准备好了,打开object_tracker.py,然后继续:

# import the necessary packages
from pyimagesearch.centroidtracker import CentroidTracker
from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

首先,我们指定我们的导入。最值得注意的是,我们正在使用我们刚刚回顾过的 CentroidTracker 类。我们还将使用来自 imutilsOpenCVVideoStream

我们有三个命令行参数,它们都与我们的深度学习人脸检测器相关:

  • --prototxt :Caffe 部署prototxt 的路径。
  • --model :预训练模型模型的路径。
  • --confidence :我们过滤弱检测的概率阈值。我发现默认值 0.5 就足够了。

接下来,让我们执行我们的初始化:

# initialize our centroid tracker and frame dimensions
ct = CentroidTracker()
(H, W) = (None, None)
# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])
# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)

在上面的块中,我们:

  • 实例化我们的 CentroidTracker , ct。回想一下上一节的解释,这个对象有三个方法:(1) register , (2) deregister ,和 (3) update 。我们只会使用 update 方法,因为它会自动注册和注销对象。我们还将 HW(我们的帧尺寸)初始化为 None
  • 使用 OpenCV 的 DNN 模块从磁盘加载我们的序列化深度学习人脸检测器模型
  • 启动我们的 VideoStream , vs。使用 vs ,我们将能够在下一个 while 循环中从我们的相机中捕获帧。我们将让我们的相机预热 2.0 秒。

现在让我们开始我们的 while 循环并开始跟踪面部对象:

# loop over the frames from the video stream
while True:
	# read the next frame from the video stream and resize it
	frame = vs.read()
	frame = imutils.resize(frame, width=400)
	# if the frame dimensions are None, grab them
	if W is None or H is None:
		(H, W) = frame.shape[:2]
	# construct a blob from the frame, pass it through the network,
	# obtain our output predictions, and initialize the list of
	# bounding box rectangles
	blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H),
		(104.0, 177.0, 123.0))
	net.setInput(blob)
	detections = net.forward()
	rects = []

我们遍历帧并将它们调整为固定宽度(同时保持纵横比)。我们的帧尺寸根据需要设置。
然后我们将帧通过 CNN 对象检测器来获得预测和对象位置。我们初始化一个矩形列表来保存我们的边界框矩形。

	# loop over the detections
	for i in range(0, detections.shape[2]):
		# filter out weak detections by ensuring the predicted
		# probability is greater than a minimum threshold
		if detections[0, 0, i, 2] > args["confidence"]:
			# compute the (x, y)-coordinates of the bounding box for
			# the object, then update the bounding box rectangles list
			box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
			rects.append(box.astype("int"))
			# draw a bounding box surrounding the object so we can
			# visualize it
			(startX, startY, endX, endY) = box.astype("int")
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				(0, 255, 0), 2)

我们开始循环检测。如果检测结果超过我们的置信度阈值,表明检测有效,我们:

  • 计算边界框坐标并将它们附加到 rects 列表中
  • 在对象周围绘制一个边界框

最后,让我们在质心跟踪器对象 ct 上调用 update

	# update our centroid tracker using the computed set of bounding
	# box rectangles
	objects = ct.update(rects)
	# loop over the tracked objects
	for (objectID, centroid) in objects.items():
		# draw both the ID of the object and the centroid of the
		# object on the output frame
		text = "ID {}".format(objectID)
		cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
		cv2.circle(frame, (centroid[0], centroid[1]), 4, (0, 255, 0), -1)
	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF
	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

ct.update 调用处理了我们使用 Python 和 OpenCV 脚本实现的简单对象跟踪器中的繁重工作。 如果我们不关心可视化,我们将在这里完成并准备循环。

我们将质心显示为一个填充的圆和唯一的对象ID号文本。现在,我们将能够可视化结果,并检查CentroidTracker是否通过将正确的ID与视频流中的对象相关联来正确地跟踪对象。

我们显示帧,直到按下退出键(“q”)。如果按下退出键,我们只需中断并执行清理。

7.完整代码

centroidtracker.py

# import the necessary packages
from scipy.spatial import distance as dist
from collections import OrderedDict
import numpy as np

class CentroidTracker():
    def __init__(self, maxDisappeared=50):
        # initialize the next unique object ID along with two ordered
        # dictionaries used to keep track of mapping a given object
        # ID to its centroid and number of consecutive frames it has
        # been marked as "disappeared", respectively
        self.nextObjectID = 0
        self.objects = OrderedDict()
        self.disappeared = OrderedDict()

        # store the number of maximum consecutive frames a given
        # object is allowed to be marked as "disappeared" until we
        # need to deregister the object from tracking
        self.maxDisappeared = maxDisappeared

    def register(self, centroid):
        # when registering an object we use the next available object
        # ID to store the centroid
        self.objects[self.nextObjectID] = centroid
        self.disappeared[self.nextObjectID] = 0
        self.nextObjectID += 1

    def deregister(self, objectID):
        # to deregister an object ID we delete the object ID from
        # both of our respective dictionaries
        del self.objects[objectID]
        del self.disappeared[objectID]

    def update(self, rects):
        # check to see if the list of input bounding box rectangles
        # is empty
        if len(rects) == 0:
            # loop over any existing tracked objects and mark them
            # as disappeared
            for objectID in self.disappeared.keys():
                self.disappeared[objectID] += 1

                # if we have reached a maximum number of consecutive
                # frames where a given object has been marked as
                # missing, deregister it
                if self.disappeared[objectID] > self.maxDisappeared:
                    self.deregister(objectID)

            # return early as there are no centroids or tracking info
            # to update
            return self.objects

        # initialize an array of input centroids for the current frame
        inputCentroids = np.zeros((len(rects), 2), dtype="int")

        # loop over the bounding box rectangles
        for (i, (startX, startY, endX, endY)) in enumerate(rects):
            # use the bounding box coordinates to derive the centroid
            cX = int((startX + endX) / 2.0)
            cY = int((startY + endY) / 2.0)
            inputCentroids[i] = (cX, cY)

        # if we are currently not tracking any objects take the input
        # centroids and register each of them
        if len(self.objects) == 0:
            for i in range(0, len(inputCentroids)):
                self.register(inputCentroids[i])

        # otherwise, are are currently tracking objects so we need to
        # try to match the input centroids to existing object
        # centroids
        else:
            # grab the set of object IDs and corresponding centroids
            objectIDs = list(self.objects.keys())
            objectCentroids = list(self.objects.values())

            # compute the distance between each pair of object
            # centroids and input centroids, respectively -- our
            # goal will be to match an input centroid to an existing
            # object centroid
            D = dist.cdist(np.array(objectCentroids), inputCentroids)

            # in order to perform this matching we must (1) find the
            # smallest value in each row and then (2) sort the row
            # indexes based on their minimum values so that the row
            # with the smallest value as at the *front* of the index
            # list
            rows = D.min(axis=1).argsort()

            # next, we perform a similar process on the columns by
            # finding the smallest value in each column and then
            # sorting using the previously computed row index list
            cols = D.argmin(axis=1)[rows]

            # in order to determine if we need to update, register,
            # or deregister an object we need to keep track of which
            # of the rows and column indexes we have already examined
            usedRows = set()
            usedCols = set()

            # loop over the combination of the (row, column) index
            # tuples
            for (row, col) in zip(rows, cols):
                # if we have already examined either the row or
                # column value before, ignore it
                # val
                if row in usedRows or col in usedCols:
                    continue

                # otherwise, grab the object ID for the current row,
                # set its new centroid, and reset the disappeared
                # counter
                objectID = objectIDs[row]
                self.objects[objectID] = inputCentroids[col]
                self.disappeared[objectID] = 0

                # indicate that we have examined each of the row and
                # column indexes, respectively
                usedRows.add(row)
                usedCols.add(col)

            # compute both the row and column index we have NOT yet
            # examined
            unusedRows = set(range(0, D.shape[0])).difference(usedRows)
            unusedCols = set(range(0, D.shape[1])).difference(usedCols)

            # in the event that the number of object centroids is
            # equal or greater than the number of input centroids
            # we need to check and see if some of these objects have
            # potentially disappeared
            if D.shape[0] >= D.shape[1]:
                # loop over the unused row indexes
                for row in unusedRows:
                    # grab the object ID for the corresponding row
                    # index and increment the disappeared counter
                    objectID = objectIDs[row]
                    self.disappeared[objectID] += 1

                    # check to see if the number of consecutive
                    # frames the object has been marked "disappeared"
                    # for warrants deregistering the object
                    if self.disappeared[objectID] > self.maxDisappeared:
                        self.deregister(objectID)

            # otherwise, if the number of input centroids is greater
            # than the number of existing object centroids we need to
            # register each new input centroid as a trackable object
            else:
                for col in unusedCols:
                    self.register(inputCentroids[col])

        # return the set of trackable objects
        return self.objects

object_tracker.py

# USAGE
# python object_tracker.py --prototxt deploy.prototxt --model res10_300x300_ssd_iter_140000.caffemodel

# import the necessary packages
from pyimagesearch.centroidtracker import CentroidTracker
from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
    help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
    help="path to Caffe pre-trained model")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
    help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

# initialize our centroid tracker and frame dimensions
ct = CentroidTracker()
(H, W) = (None, None)

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)

# loop over the frames from the video stream
while True:
    # read the next frame from the video stream and resize it
    frame = vs.read()
    frame = imutils.resize(frame, width=400)

    # if the frame dimensions are None, grab them
    if W is None or H is None:
        (H, W) = frame.shape[:2]

    # construct a blob from the frame, pass it through the network,
    # obtain our output predictions, and initialize the list of
    # bounding box rectangles
    blob = cv2.dnn.blobFromImage(frame, 1.0, (W, H),
        (104.0, 177.0, 123.0))
    net.setInput(blob)
    detections = net.forward()
    rects = []

    # loop over the detections
    for i in range(0, detections.shape[2]):
        # filter out weak detections by ensuring the predicted
        # probability is greater than a minimum threshold
        if detections[0, 0, i, 2] > args["confidence"]:
            # compute the (x, y)-coordinates of the bounding box for
            # the object, then update the bounding box rectangles list
            box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
            rects.append(box.astype("int"))

            # draw a bounding box surrounding the object so we can
            # visualize it
            (startX, startY, endX, endY) = box.astype("int")
            cv2.rectangle(frame, (startX, startY), (endX, endY),
                (0, 255, 0), 2)

    # update our centroid tracker using the computed set of bounding
    # box rectangles
    objects = ct.update(rects)

    # loop over the tracked objects
    for (objectID, centroid) in objects.items():
        # draw both the ID of the object and the centroid of the
        # object on the output frame
        text = "ID {}".format(objectID)
        cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        cv2.circle(frame, (centroid[0], centroid[1]), 4, (0, 255, 0), -1)

    # show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

8.质心目标跟踪结果

打开终端并执行以下命令:

$ python object_tracker.py --prototxt deploy.prototxt \
	--model res10_300x300_ssd_iter_140000.caffemodel
[INFO] loading model...
[INFO] starting video stream...

请注意,即使当我将书籍封面移到相机视野之外时,第二张脸“丢失”了,我们的对象跟踪也能够在它进入视野时再次将其重新拾起。如果面部在视野之外存在超过 50 帧,则该对象将被取消注册。

9.限制和缺点

虽然我们的质心跟踪器在这个例子中工作得很好,但这种对象跟踪算法有两个主要缺点。

首先是它要求在输入视频的每一帧上运行对象检测步骤。

  • 对于非常快速的目标检测器(即颜色阈值和 Haar 级联)来说,必须在每个输入帧上运行检测器可能不是问题。
  • 但是,如果您在资源受限的设备上使用计算量大得多的对象检测器,例如 HOG + 线性 SVM 或基于深度学习的检测器,那么您的帧处理管道将大大减慢,因为您将花费整个管道运行一个非常慢的检测器。

第二个缺点与质心跟踪算法本身的基本假设有关——质心必须在后续帧之间靠得很近。

  • 这个假设通常成立,但请记住,我们用 2D 帧来表示我们的 3D 世界——当一个对象与另一个对象重叠时会发生什么?
  • 答案是可能会发生对象 ID 切换。
  • 如果两个或多个对象相互重叠到它们的质心相交的点,并且与另一个相应的对象具有最小距离,则算法可能(在不知不觉中)交换对象 ID。
  • 重要的是要了解重叠/遮挡对象问题并非特定于质心跟踪——它也发生在许多其他对象跟踪器中,包括高级对象跟踪器。 然而,质心跟踪的问题更加明显,因为我们严格依赖质心之间的欧几里得距离,并且没有额外的度量、启发式或学习模式。

只要您在使用质心跟踪时牢记这些假设和限制,该算法就会非常适合您。

BONUS

以下实现基于YOLOV3和质心跟踪算法的多目标跟踪

# import the necessary packages
from CentroidTracking.centroidtracker import CentroidTracker
from imutils.video import VideoStream
import numpy as np
import argparse
import imutils
import time
import cv2
import os

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()

ap.add_argument("-i", "--input", required=True,
	help="path to input video")
# ap.add_argument("-o", "--output", required=True,
	# help="path to output video")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
    help="minimum probability to filter weak detections")
ap.add_argument("-t", "--threshold", type=float, default=0.3,
    help="threshold when applying non-maxima suppression")
args = vars(ap.parse_args())

ct = CentroidTracker()
# load the COCO class labels, our YOLO model was trained on
labelsPath = os.path.sep.join(["yolo-coco", "coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")
# initialize a list of colors to represent each possible class label
np.random.seed(42)
COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),dtype="uint8")
# derive the paths to the YOLO weights and model configuration
weightsPath = os.path.sep.join(["yolo-coco", "yolov3.weights"])
configPath = os.path.sep.join(["yolo-coco", "yolov3.cfg"])
# load our YOLO object detector trained on COCO dataset (80 classes)
print("[INFO] loading YOLO from disk...")
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)

writer = None

if args["input"] == 'camera':
    cap = cv2.VideoCapture(0)
else:
    cap = cv2.VideoCapture(args["input"])

# try to determine the total number of frames in the video file
try:
	prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2() \
		else cv2.CAP_PROP_FRAME_COUNT
	total = int(vs.get(prop))
	print("[INFO] {} total frames in video".format(total))

# an error occurred while trying to determine the total
# number of frames in the video file
except:
	print("[INFO] could not determine # of frames in video")
	print("[INFO] no approx. completion time can be provided")
	total = -1


print(cap.isOpened())
print("starting-----------------------------------------------------------")
begin = time.time()
while (cap.isOpened()):
    ret, image = cap.read()

    # load our input image and grab its spatial dimension

    if ret == True:
        (H, W) = image.shape[:2]

        # determine only the *output* layer names that we need from YOLO
        ln = net.getLayerNames()
        ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

        # construct a blob from the input image and then perform a forward
        # pass of the YOLO object detector, giving us our bounding boxes and
        # associated probabilities
        blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
        	swapRB=True, crop=False)
        net.setInput(blob)
        start = time.time()
        layerOutputs = net.forward(ln)
        end = time.time()

        # show timing information on YOLO
        print("[INFO] YOLO took {:.6f} seconds".format(end - start))

        # initialize our lists of detected bounding boxes, confidences, and
        # class IDs, respectively
        boxes = []
        boxes_c = []
        confidences = []
        classIDs = []
        rects = []
        # loop over each of the layer outputs
        for output in layerOutputs:
        	# loop over each of the detections
            for detection in output:
        		# extract the class ID and confidence (i.e., probability) of
        		# the current object detection
                scores = detection[5:]
                classID = np.argmax(scores)
                confidence = scores[classID]
                # filter out weak predictions by ensuring the detected
                # probability is greater than the minimum probability
                if confidence > args["confidence"]:
                    # scale the bounding box coordinates back relative to the
                    # size of the image, keeping in mind that YOLO actually
                    # returns the center (x, y)-coordinates of the bounding
                    # box followed by the boxes' width and height
                    box = detection[0:4] * np.array([W, H, W, H])
                    (centerX, centerY, width, height) = box.astype("int")
        			# use the center (x, y)-coordinates to derive the top and
        			# and left corner of the bounding box
                    x = int(centerX - (width / 2))
                    y = int(centerY - (height / 2))
        			# update our list of bounding box coordinates, confidences,
        			# and class IDs
                    boxes.append([x, y, int(width), int(height)])
                    boxes_c.append([centerX - int(width/2), centerY - int(height/2), centerX + int(width/2), centerY + int(height/2)])
                    confidences.append(float(confidence))
                    classIDs.append(classID)
        # apply non-maxima suppression to suppress weak, overlapping bounding
        # boxes
        idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"],args["threshold"])

        if len(idxs) > 0:
            for i in idxs.flatten():
                rects.append(boxes_c[i])
        # update our centroid tracker using the computed set of bounding
    	# box rectangles
        objects = ct.update(rects)
    	# loop over the tracked objects
        for (objectID, centroid) in objects.items():
    		# draw both the ID of the object and the centroid of the
    		# object on the output frame
            text = "ID {}".format(objectID)
            cv2.putText(image, text, (centroid[0] - 10, centroid[1] - 10),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
            cv2.circle(image, (centroid[0], centroid[1]), 4, (0, 255, 0), -1)
        # ensure at least one detection exists
        if len(idxs) > 0:
        	# loop over the indexes we are keeping
        	for i in idxs.flatten():
        		# extract the bounding box coordinates
        		(x, y) = (boxes[i][0], boxes[i][1])
        		(w, h) = (boxes[i][2], boxes[i][3])

        		# draw a bounding box rectangle and label on the image
        		color = [int(c) for c in COLORS[classIDs[i]]]
        		cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
        		text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
        		cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
        			0.5, color, 2)
        # writer.write(image)
        # # show the output image
        # cv2.imshow("Image", image)
        # check if the video writer is None
        if writer is None:
            # initialize our video writer
            fourcc = cv2.VideoWriter_fourcc(*"MJPG")
            writer = cv2.VideoWriter("output.avi", fourcc, 30,(image.shape[1], image.shape[0]), True)
            # some information on processing single frame
            if total > 0:
                elap = (end - start)
                print("[INFO] single frame took {:.4f} seconds".format(elap))
                print("[INFO] estimated total time to finish: {:.4f}".format(elap * total))
        cv2.imshow("Live", image)
        # write the output frame to disk
        writer.write(image)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    else:
        break
# release the file pointers
print("[INFO] cleaning up...")
writer.release()
cap.release()
cv2.destroyAllWindows()
finish = time.time()

print(f"Total time taken : {finish - begin}")

链接:https://pan.baidu.com/s/1UX_HmwwJLtHJ9e5tx6hwOg?pwd=123a
提取码:123a

参考目录

https://pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/

你可能感兴趣的:(OpenCV,目标跟踪,目标跟踪,opencv)