Tensorflow has recently released its object detection API for Tensorflow 2 which has a very large model zoo. However, they have only provided one MobileNet v1 SSD model with Tensorflow lite which is described here. In that blog post, they have provided codes to run it on Android and IOS devices but not for edge devices. With the increase in popularity of edge devices and the release of spatial AI kits by OpenCV, I wanted to fill the missing gap. So that’s exactly what we are going to see here.
Tensorflow最近发布了针对Tensorflow 2的对象检测API,该模型具有非常大的模型动物园。 然而,他们只提供一个MobileNet V1 SSD与描述Tensorflow精简版模型在这里 。 在该博客文章中,他们提供了在Android和IOS设备上运行它的代码,但没有在边缘设备上运行它的代码。 随着边缘设备的普及以及OpenCV发布空间AI工具包,我想填补这一缺失的空白。 这就是我们将要在这里看到的。
要求 (Requirements)
This does not require the installation of the Tensorflow Object Detection API to run. A simple Tensorflow installation, along with OpenCV for image processing is enough to run it.
这不需要安装Tensorflow对象检测API即可运行。 一个简单的Tensorflow安装以及用于图像处理的OpenCV足以运行它。
pip install tensorflow
pip install opencv
I have used some code provided in the Object Detection API to make the work easier, but there is no need to worry about it as does not require its explicit installation and you can find the whole code here.
我使用了对象检测API中提供的一些代码来简化工作,但是无需担心它,因为它不需要显式安装,您可以在这里找到整个代码。
The pre-trained model can be downloaded from Tensorflow’s blog from here or it is provided with the code as well.
预先训练的模型可以从Tensorflow的博客从这里下载这里也提供有代码。
标签 (Labels)
Our first step to make the labels in the format required later on that is as a nested dictionary having id and label inside. We will use the label map text file provided with the model and convert it to the required format.
我们以稍后要求的格式制作标签的第一步是作为嵌套的字典,其中包含id和label。 我们将使用模型随附的标签图文本文件,并将其转换为所需的格式。
Note:- The labels with “???” need to be ignored and those indices are skipped except for the first one. So each label is shifted one place back. For example, the label “person” is on the first row but it will be assigned a label 0 and so on.
注意 :-带“ ???”的标签 需要忽略,除第一个索引外,那些索引都将被跳过。 因此,每个标签都向后移了一个位置。 例如,标签“ person”在第一行上,但将被分配标签0,依此类推。
This can be found commented in its Java implementation.
可以在其Java实现中找到注释。
// SSD Mobilenet V1 Model assumes class 0 is background class
// SSD Mobilenet V1模型假定类别0为背景类别
// in label file and class labels start from 1 to number_of_classes+1,
//在标签文件和类标签中,从1到number_of_classes + 1开始,
// while outputClasses correspond to class index from 0 to number_of_classes
//而outputClasses对应于从0到number_of_classes的类索引
This can be done with the following code:
可以使用以下代码完成此操作:
def create_category_index(label_path='path_to/labelmap.txt'):
f = open(label_path)
category_index = {}
for i, val in enumerate(f):
if i != 0:
val = val[:-1]
if val != '???':
category_index.update({(i-1): {'id': (i-1), 'name': val}})
f.close()
return category_index
Here, to ignore the first line, I am using an if statement and storing i-1
. This creates a dictionary as shown below.
在这里,要忽略第一行,我使用了if语句并存储了i-1
。 这将创建一个字典,如下所示。
It will have 80 rows with keys going to 89.
它将有80行,其中的键将转到89。
TfLite口译员 (TfLite Interpreter)
With labels done, let’s understand TfLite’s interpreter and how we can get the results.
完成标签后,让我们了解TfLite的解释器以及如何获取结果。
初始化翻译 (Initialize the Interpreter)
import tensorflow as tfinterpreter = tf.lite.Interpreter(model_path="path/detect.tflite")
interpreter.allocate_tensors()
Just load the correct model path of your tflite model and allocate tensors.
只需加载tflite模型的正确模型路径并分配张量即可。
输入和输出详细信息 (Input and Output Details)
To get input and output details, write:
要获取输入和输出详细信息,请编写:
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
Now, let’s study them to see what type of inputs to give and the outputs we will get.
现在,让我们研究它们,以了解要提供的输入类型以及将获得的输出。
The input details are a list of only 1 element which is a dictionary as shown below.
输入的详细信息是仅1个元素的列表,这是一个字典,如下所示。
Here, we can see the input shape to be [1, 300, 300, 3]
. Other than this it requires the input image to have a datatype of np.uint8
.
在这里,我们可以看到输入形状为[1, 300, 300, 3]
。 除此之外,它要求输入图像的数据类型为np.uint8
。
The output details are a list of 4 elements, each containing a dictionary like the input details. Each call returns 10 results with the first item storing the rectangle bounding boxes, the second item the detection classes, the third item the detection scores, and finally the last item the number of detections returned. The bounding boxes returned are normalized so they can be scaled to any input dimension.
输出详细信息是4个元素的列表,每个元素都包含与输入详细信息类似的字典。 每个调用返回10个结果,第一项存储矩形边界框,第二项存储检测类别,第三项存储检测得分,最后一项存储返回的检测次数。 返回的边界框已标准化,因此可以缩放到任何输入尺寸。
To get the outputs we need to read the image, convert it to RGB if OpenCV is used, resize it appropriately, and invoke the interpreter after setting the tensor with the input frame. Then the required values can be achieved using the get_tensor
function of the interpreter.
为了获得输出,我们需要读取图像,如果使用OpenCV,则将其转换为RGB,适当调整其大小,并在使用输入帧设置张量后调用解释器。 然后,可以使用解释器的get_tensor
函数获得所需的值。
import cv2img = cv2.imread('image.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_rgb = cv2.resize(img_rgb, (300, 300), cv2.INTER_AREA)
img_rgb = img_rgb.reshape([1, 300, 300, 3])interpreter.set_tensor(input_details[0]['index'], img_rgb)
interpreter.invoke()de_boxes = interpreter.get_tensor(output_details[0]['index'])[0]
det_classes = interpreter.get_tensor(output_details[1]['index'])[0]
det_scores = interpreter.get_tensor(output_details[2]['index'])[0]
num_det = interpreter.get_tensor(output_details[3]['index'])[0]
使用对象检测代码进行绘图 (Using Object Detection code for drawing)
For visualization, I used the python code available here, which not only can be used to draw bounding boxes but also keypoints and instance masks if required. We need to pass the image to draw on, bounding boxes, detected classes, detection scores, and the labels dictionary. Other than that we also set the normalized coordinates to true as we receive normalized bounding box coordinates from the interpreter.
为了进行可视化,我使用了此处提供的python代码,该代码不仅可以用来绘制边界框,还可以根据需要绘制关键点和实例蒙版。 我们需要传递要绘制的图像,边界框,检测到的类,检测分数和标签字典。 除此之外,当我们从解释器接收到标准化边界框坐标时,我们还将标准化坐标设置为true。
from object_detection.utils import visualization_utils as vis_utilvis_util.visualize_boxes_and_labels_on_image_array(
img,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
use_normalized_coordinates=True,
min_score_thresh=0.6,
line_thickness=3)
结果 (Results)
It will give the results something like shown below.
它将使结果如下所示。
However, there is still one problem to address as shown below.
但是,仍然有一个问题需要解决,如下所示。
This can be done via non-maximum suppression.
这可以通过非最大抑制来完成。
非最大抑制 (Non-Maximum Suppression)
I am not going to explain it as it has already been covered in-depth across various articles throughout the internet. One such example is this article. To implement it, I am going to use combined_non_max_suppression
Tensorflow Image to perform this task as it allows us to work with multiple classes at once. It takes the outputs and returns the predictions left after the threshold.
我将不解释它,因为它已在Internet上的各种文章中进行了深入介绍。 本文就是这样一个例子。 为了实现它,我将使用combined_non_max_suppression
Tensorflow Image来执行此任务,因为它允许我们一次处理多个类。 它获取输出并返回阈值后剩余的预测。
def apply_nms(output_dict, iou_thresh=0.5, score_thresh=0.6):q = 90 # no of classes
num = int(output_dict['num_detections'])
boxes = np.zeros([1, num, q, 4])
scores = np.zeros([1, num, q])
# val = [0]*q
for i in range(num):
# indices = np.where(classes == output_dict['detection_classes'][i])[0][0]
boxes[0, i, output_dict['detection_classes'][i], :] = output_dict['detection_boxes'][i]
scores[0, i, output_dict['detection_classes'][i]] = output_dict['detection_scores'][i]
nmsd = tf.image.combined_non_max_suppression(
boxes=boxes,
scores=scores,
max_output_size_per_class=num,
max_total_size=num,
iou_threshold=iou_thresh,
score_threshold=score_thresh,
pad_per_class=False,
clip_boxes=False)
valid = nmsd.valid_detections[0].numpy()
output_dict = {
'detection_boxes' : nmsd.nmsed_boxes[0].numpy()[:valid],
'detection_classes' : nmsd.nmsed_classes[0].numpy().astype(np.int64)[:valid],
'detection_scores' : nmsd.nmsed_scores[0].numpy()[:valid],
}
return output_dict
The full code is given below and or you visit my Github repo which also contains visualization_utils.py
and models.
完整的代码在下面给出,或者您访问我的Github存储库 ,其中也包含visualization_utils.py
和模型。
import numpy as np
import tensorflow as tf
import cv2
import visualization_utils as vis_util
def create_category_index(label_path='coco_ssd_mobilenet/labelmap.txt'):
"""
To create dictionary of label map
Parameters
----------
label_path : string, optional
Path to labelmap.txt. The default is 'coco_ssd_mobilenet/labelmap.txt'.
Returns
-------
category_index : dict
nested dictionary of labels.
"""
f = open(label_path)
category_index = {}
for i, val in enumerate(f):
if i != 0:
val = val[:-1]
if val != '???':
category_index.update({(i-1): {'id': (i-1), 'name': val}})
f.close()
return category_index
def get_output_dict(image, interpreter, output_details, nms=True, iou_thresh=0.5, score_thresh=0.6):
"""
Function to make predictions and generate dictionary of output
Parameters
----------
image : Array of uint8
Preprocessed Image to perform prediction on
interpreter : tensorflow.lite.python.interpreter.Interpreter
tflite model interpreter
input_details : list
input details of interpreter
output_details : list
nms : bool, optional
To perform non-maximum suppression or not. The default is True.
iou_thresh : int, optional
Intersection Over Union Threshold. The default is 0.5.
score_thresh : int, optional
score above predicted class is accepted. The default is 0.6.
Returns
-------
output_dict : dict
Dictionary containing bounding boxes, classes and scores.
"""
output_dict = {
'detection_boxes' : interpreter.get_tensor(output_details[0]['index'])[0],
'detection_classes' : interpreter.get_tensor(output_details[1]['index'])[0],
'detection_scores' : interpreter.get_tensor(output_details[2]['index'])[0],
'num_detections' : interpreter.get_tensor(output_details[3]['index'])[0]
}
output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
if nms:
output_dict = apply_nms(output_dict, iou_thresh, score_thresh)
return output_dict
def apply_nms(output_dict, iou_thresh=0.5, score_thresh=0.6):
"""
Function to apply non-maximum suppression on different classes
Parameters
----------
output_dict : dictionary
dictionary containing:
'detection_boxes' : Bounding boxes coordinates. Shape (N, 4)
'detection_classes' : Class indices detected. Shape (N)
'detection_scores' : Shape (N)
'num_detections' : Total number of detections i.e. N. Shape (1)
iou_thresh : int, optional
Intersection Over Union threshold value. The default is 0.5.
score_thresh : int, optional
Score threshold value below which to ignore. The default is 0.6.
Returns
-------
output_dict : dictionary
dictionary containing only scores and IOU greater than threshold.
'detection_boxes' : Bounding boxes coordinates. Shape (N2, 4)
'detection_classes' : Class indices detected. Shape (N2)
'detection_scores' : Shape (N2)
where N2 is the number of valid predictions after those conditions.
"""
q = 90 # no of classes
num = int(output_dict['num_detections'])
boxes = np.zeros([1, num, q, 4])
scores = np.zeros([1, num, q])
# val = [0]*q
for i in range(num):
# indices = np.where(classes == output_dict['detection_classes'][i])[0][0]
boxes[0, i, output_dict['detection_classes'][i], :] = output_dict['detection_boxes'][i]
scores[0, i, output_dict['detection_classes'][i]] = output_dict['detection_scores'][i]
nmsd = tf.image.combined_non_max_suppression(boxes=boxes,
scores=scores,
max_output_size_per_class=num,
max_total_size=num,
iou_threshold=iou_thresh,
score_threshold=score_thresh,
pad_per_class=False,
clip_boxes=False)
valid = nmsd.valid_detections[0].numpy()
output_dict = {
'detection_boxes' : nmsd.nmsed_boxes[0].numpy()[:valid],
'detection_classes' : nmsd.nmsed_classes[0].numpy().astype(np.int64)[:valid],
'detection_scores' : nmsd.nmsed_scores[0].numpy()[:valid],
}
return output_dict
def make_and_show_inference(img, interpreter, input_details, output_details, category_index, nms=True, score_thresh=0.6, iou_thresh=0.5):
"""
Generate and draw inference on image
Parameters
----------
img : Array of uint8
Original Image to find predictions on.
interpreter : tensorflow.lite.python.interpreter.Interpreter
tflite model interpreter
input_details : list
input details of interpreter
output_details : list
output details of interpreter
category_index : dict
dictionary of labels
nms : bool, optional
To perform non-maximum suppression or not. The default is True.
score_thresh : int, optional
score above predicted class is accepted. The default is 0.6.
iou_thresh : int, optional
Intersection Over Union Threshold. The default is 0.5.
Returns
-------
NONE
"""
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_rgb = cv2.resize(img_rgb, (300, 300), cv2.INTER_AREA)
img_rgb = img_rgb.reshape([1, 300, 300, 3])
interpreter.set_tensor(input_details[0]['index'], img_rgb)
interpreter.invoke()
output_dict = get_output_dict(img_rgb, interpreter, output_details, nms, iou_thresh, score_thresh)
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
img,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
use_normalized_coordinates=True,
min_score_thresh=score_thresh,
line_thickness=3)
# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path="coco_ssd_mobilenet/detect.tflite")
interpreter.allocate_tensors()
# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
category_index = create_category_index()
input_shape = input_details[0]['shape']
cap = cv2.VideoCapture(0)
while(True):
ret, img = cap.read()
if ret:
make_and_show_inference(img, interpreter, input_details, output_details, category_index)
cv2.imshow("image", img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
cap.release()
cv2.destroyAllWindows()
Before ending, I would like to clear up a thing, that if you try to run it on Windows with an Intel processor, you will get a terrible fps. I got ~2 on an i5 and for comparison, the same Tensorflow model without tflite gave me ~8 fps. This is explained here. However, on edge devices that won’t be a problem, and it’s considerably less memory footprint would benefit their memory limitations.
在结束之前,我想澄清一件事,如果您尝试在使用Intel处理器的Windows上运行它,则将获得可怕的fps。 我在i5上获得了〜2,作为比较,没有tflite的相同Tensorflow模型给了我〜8 fps。 这在这里解释。 但是,在边缘设备上这不会成为问题,并且显着减少的内存占用量将有利于其内存限制。
Although this model is not very accurate, I hope I would have provided a boilerplate to make your task easier when using an Object detector if Tflite.
尽管此模型不是很准确,但我希望我能提供一个样板,以简化使用Tflite的对象检测器时的任务。
翻译自: https://towardsdatascience.com/using-tensorflow-lite-for-object-detection-2a0283f94aed