



​ argparse 是 Python 内置的一个用于命令项选项与参数解析的模块,通过在程序中定义好我们需要的参数,argparse 将会从 sys.argv 中解析出这些参数,并自动生成帮助和使用信息。


  • 创建 ArgumentParser() 对象
  • 调用 add_argument() 方法添加参数
  • 使用 parse_args() 解析添加的参数
# 一个小例子
import argparse

# constrcut the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-n", "--name", required = True,
                help = "name of the user"
args = vars(ap.parse_args())

# display a friendly message to the user
print("hello {}, nice to meet you!".format(args["name"]))



​ 这里只调用了一个命令行参数,–name。不管是在命令行中输入-n/–name都可以,因为这里required(可选参数是否可以省略)设置为true。

​ help:添加一些说明信息来提示你输入

​ --help/-h 都可以。


​ 调用对象上的vars()将解析后的命令行参数转化为字典.

​ key:命令行参数的名称
​ value:为命令行参数提供的字典的值



  • input/ : 输入的视频,用来进行目标追踪

  • output/ : 经过处理后的视频,目标被矩形框标记

  • mobilenet_ssd/
    The Caffe CNN model files



from import FPS
import numpy as np
import argparse
import imutils
import dlib
import cv2


### 解析命令行参数 ###
ap= argparse.ArgumentParser()

ap.add_argument("-p","--prototxt", required=True,
    help="path to Caffe 'deploy' prototxt file")

ap.add_argument("-m","--model", required=True,
    help="path to Caffe pre-trained model")

ap.add_argument("-v","--video", required=True,
    help="path to input video file")

ap.add_argument("-l","--label", required=True,
    help="class label we are interested in detecting + tracking")

    help="path to optional output video file")

ap.add_argument("-c","--confidence",type=float, default=0.2,
    help="minimum probability to filter weak detections")

args= vars(ap.parse_args())
  • –prototxt: path to the Caffe deploy prototxt file.

    e.g: -p ./mobilenet_ssd/MobileNetSSD_deploy.prototxt

  • –model: path to the Caffe pre-trained model.

    e.g: -m ./mobilenet_ssd/MobileNetSSD_deploy.caffemodel

  • –video : 输入的视频(不支持网络摄像头)路径.

    e.g: -v ./input/xxx.mp4

  • –label: 对要检测和跟踪的目标的标签,下面会有具体的类表

    e.g: -l person


  • –output: 想保存目标追踪输出的结果

    e.g: -o ./output/object_result.avi

  • –confidence: With a




​ 这些类都已经被MobileNet_SSD模型训练过能够检测。

​ 我们用已预训练过的MobileNet_SSD模型在一个帧中来表现目标追踪,目标的位置交给dlib的相关跟踪器追踪,以便在视频的剩余帧中进行跟踪。

​ 我们的这个模型包含20个下载的支持的目标检测的类(再加上一个后台的类)。

​ 注意事项:如果要使用不同的Caffe模型,需要重新定义下面的类表(同样地,如果你使用下面已经下载好的模型就不用去调整)。

# initialize the list of class labels MobileNet SSD was trained to
# detect
CLASSES= ["background","aeroplane","bicycle","bird","boat",
# load our serialized model from disk
print("[INFO] loading model...")
net= cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])


​ 在开始遍历帧之前,我们需要加载模型到内存中(上面最后一行)。加载中只需要prototxt和model文件的路径,这些在输入命令行参数中已输入。

​ 初始化视频流,dlib相关的追踪器,写视频对象,要检测的类标签。

​ 在最后一行,实例化FPS的吞吐量估计(帧数),为了最后的计数

# initialize the video stream, dlib correlation tracker, # output video writer, and predicted class label
print("[INFO] starting video stream...")
vs= cv2.VideoCapture(args["video"])
tracker= None
writer= None
label= ""
# start the frames per second throughput estimator
fps= FPS().start()


​ 帧的大小被重新调整大小(能够更快的处理)

​ 转变颜色信道(OpenCV颜色的存储是BGR)为RGB

​ 在运行中,可以通过命令行参数来输出视频路径,所以这里实例化了一个video writer

# loop over frames from the video file stream
while True:
	# grab the next frame from the video file
	(grabbed, frame) =
	# check to see if we have reached the end of the video file
	if frame is None:
	# resize the frame for faster processing and then convert the frame from BGR to RGB ordering (dlib needs RGB ordering)
	frame = imutils.resize(frame, width=600)
	rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
	# if we are supposed to be writing a video to disk, initialize the writer
	if args["output"] is not None and writer is None:
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 				30,(frame.shape[1], frame.shape[0]), True)


​ 如果tracker为空,首先要检测输入帧中的对象

​ 获取一个帧,转化它到blob对象(图像中相同像素的连通域,直观理解为色斑:相同像素组成的一小块,一小块特征,一团,一坨)中

​ 通过网络来传递这个blob(包含追踪器和预测器)

    # if our correlation object tracker is None we first need to apply an object detector to seed the tracker with something to actually track
    if tracker is None:
        # grab the frame dimensions and convert the frame to a blob
        (h, w) = frame.shape[:2]
        blob = cv2.dnn.blobFromImage(frame, 0.007843, (w, h), 127.5)
        # pass the blob through the network and obtain the detections and predictions
        detections = net.forward()


​ 如果我们的检测器找到一个对象,我们会抓取到一个可能性最大的。

​ 这篇文章只演示如何使用dlib库来实现单目标的追踪,所以我们找到可能性最大的检测目标。

​ 特征会演示如何检测和提取到具体的目标。

​ 我们将会获取与对象关联的置信度和标签。

        # ensure at least one detection is made
        if len(detections) > 0:
            # find the index of the detection with the largest
            # probability -- out of convenience we are only going
            # to track the first object we find with the largest
            # probability; future examples will demonstrate how to
            # detect and extract *specific* objects
            i = np.argmax(detections[0, 0, :, 2])
            # grab the probability associated with the object along with its class label
            conf = detections[0, 0, i, 2]
            label = CLASSES[int(detections[0, 0, i, 1])]


​ 首先要确保我们获取到了正确的目标类型(通过该目标的置信度比较)。这个例子中我们使用person或者cat,所以你可以看到过滤的结果。

​ 我们通过确定目标的的边界box来协调我们的目标。

​ 然后建立box对象来绘制目标对象。

        # filter out weak detections by requiring a minimum confidence
        if conf > args["confidence"] and label == args["label"]:
            # compute the (x, y)-coordinates of the bounding box for the object
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            # construct a dlib rectangle object from the bounding box coordinates and then start the dlib correlation tracker
            tracker = dlib.correlation_tracker()
            rect = dlib.rectangle(startX, startY, endX, endY)
            tracker.start_track(rgb, rect)
            # draw the bounding box and text for the object
            cv2.rectangle(frame, (startX, startY), (endX, endY),
                          (0, 255, 0), 2)
            cv2.putText(frame, label, (startX, startY - 15),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)


​ 更新追踪目标(在这个更新方法的后面执行繁重的工作)。

​ 从追踪器获取到目标对象的位置,比如一个机器人能够根据追踪器返回的目标进行寻找,在本篇文章中,我们只将追踪目标用box框起来并且注释。

    # otherwise, we've already performed detection so let's track
    # the object
        # update the tracker and grab the position of the tracked
        # object
        pos = tracker.get_position()
        # unpack the position object
        startX = int(pos.left())
        startY = int(
        endX = int(pos.right())
        endY = int(pos.bottom())
        # draw the bounding box from the correlation object tracker
        cv2.rectangle(frame, (startX, startY), (endX, endY),
                      (0, 255, 0), 2)
        cv2.putText(frame, label, (startX, startY - 15),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)


​ 将结果输出到我们想要的视频中。

​ 将处理好的帧(有box和注释)显示出来。

​ 更新FPS值

    # check to see if we should write the frame to disk
    if writer is not None:
    # show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF
    # if the `q` key was pressed, break from the loop
    if key == ord("s"):
    # update the FPS counter


​ 打印FPS吞吐量和释放指针。

# stop the timer and display FPS information
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
# check to see if we need to release the video writer pointer
if writer is not None:
# do a bit of cleanup


