Jetson Nano实现基于YOLO-V4及TensorRT的实时目标检测

一.背景
1.英伟达SOC,2020年最新推出的Jetson Nano B01,价格亲民(99$)。支持GPU,性能高于树莓派且兼容性比较好。嵌入式平台适合验证算法的极限性能。

2.YOLO-V4是YOLO目标检测系列最新版,精度和速度较YOLO-V3都有提升,One-stage架构实时推理性能较好。相比而言,尚在开发中的YOLO-V5未被官方承认,且算法上没有太多创新,更像是YOLO-V4.5。

3.TensorRT在深度学习算法部署环节十分重要,基于GPU推理,能够成倍提升FPS。

二.资源
JetPack-4.4 for NVIDIA Jetson Nano。(也可以是TX2/AGX Xavier/Xavier NX或x86_64 PC平台)
Darknet框架C/C++原版YOLO-V4模型。
https://link.zhihu.com/?target=https%3A//github.com/AlexeyAB/darknet

TensorRT推理优化。
https://link.zhihu.com/?target=https%3A//github.com/jkjung-avt/tensorrt_demos

三.解析
1.YOLO to ONNX
解析基于Darknet的YOLO模型:

class DarkNetParser(object):
    """Definition of a parser for DarkNet-based YOLO model."""

    def __init__(self, supported_layers):
        """Initializes a DarkNetParser object.

        Keyword argument:
        supported_layers -- a string list of supported layers in DarkNet naming convention,
        parameters are only added to the class dictionary if a parsed layer is included.
        """

        # A list of YOLO layers containing dictionaries with all layer
        # parameters:
        self.layer_configs = OrderedDict()
        self.supported_layers = supported_layers
        self.layer_counter = 0

    def parse_cfg_file(self, cfg_file_path):
        """Takes the yolov?.cfg file and parses it layer by layer,
        appending each layer's parameters as a dictionary to layer_configs.

        Keyword argument:
        cfg_file_path
        """
        with open(cfg_file_path, 'r') as cfg_file:
            remainder = cfg_file.read()
            while remainder is not None: # 逐层读取信息。
                layer_dict, layer_name, remainder = self._next_layer(remainder)
                if layer_dict is not None:
                    self.layer_configs[layer_name] = layer_dict # 参数以字典的形式逐层存入layer_configs。
        return self.layer_configs

    def _next_layer(self, remainder):
        """Takes in a string and segments it by looking for DarkNet delimiters.
        Returns the layer parameters and the remaining string after the last delimiter.
        Example for the first Conv layer in yolo.cfg ...

        [convolutional]
        batch_normalize=1
        filters=32
        size=3
        stride=1
        pad=1
        activation=leaky

        ... becomes the following layer_dict return value:
        {'activation': 'leaky', 'stride': 1, 'pad': 1, 'filters': 32,
        'batch_normalize': 1, 'type': 'convolutional', 'size': 3}.

        '001_convolutional' is returned as layer_name, and all lines that follow in yolo.cfg
        are returned as the next remainder.

        Keyword argument:
        remainder -- a string with all raw text after the previously parsed layer
        """
        remainder = remainder.split('[', 1) # 以'['为分隔符,分隔成两部分。
        if len(remainder) == 2: # ==2表示[layer]存在。
            remainder = remainder[1]
        else:
            return None, None, None
        remainder = remainder.split(']', 1) # 以']'为分隔符,分隔成两部分。
        if len(remainder) == 2: # ==2表示[layer]后续内容存在。
            layer_type, remainder = remainder
        else:
            return None, None, None
        if remainder.replace(' ', '')[0] == '#': # 去除空格后第一个字符是'#'。
            remainder = remainder.split('\n', 1)[1] # 取注释换行后的部分。

        out = remainder.split('\n\n', 1) # .cfg文件层间有两个换行符间隔。
        if len(out) == 2: # 存在后续layer。
            layer_param_block, remainder = out[0], out[1]
        else:
            layer_param_block, remainder = out[0], '' # 最后一层。
        if layer_type == 'yolo':
            layer_param_lines = []
        else:
            layer_param_lines = layer_param_block.split('\n')[1:] # 非yolo层
        layer_name = str(self.layer_counter).zfill(3) + '_' + layer_type
        layer_dict = dict(type=layer_type)
        if layer_type in self.supported_layers:
            for param_line in layer_param_lines:
                if param_line[0] == '#':
                    continue
                param_type, param_value = self._parse_params(param_line)
                layer_dict[param_type] = param_value
        self.layer_counter += 1
        return layer_dict, layer_name, remainder

    def _parse_params(self, param_line): # 解析参数。
        """Identifies the parameters contained in one of the cfg file and returns
        them in the required format for each parameter type, e.g. as a list, an int or a float.

        Keyword argument:
        param_line -- one parsed line within a layer block
        """
        param_line = param_line.replace(' ', '')
        param_type, param_value_raw = param_line.split('=')
        param_value = None
        if param_type == 'layers':
            layer_indexes = list()
            for index in param_value_raw.split(','):
                layer_indexes.append(int(index))
            param_value = layer_indexes
        elif isinstance(param_value_raw, str) and not param_value_raw.isalpha():
            condition_param_value_positive = param_value_raw.isdigit()
            condition_param_value_negative = param_value_raw[0] == '-' and \
                param_value_raw[1:].isdigit()
            if condition_param_value_positive or condition_param_value_negative:
                param_value = int(param_value_raw)
            else:
                param_value = float(param_value_raw)
        else:
            param_value = str(param_value_raw)
        return param_type, param_value

创建ONNX图:

class GraphBuilderONNX(object):
    """Class for creating an ONNX graph from a previously generated list of layer dictionaries."""

    def __init__(self, model_name, output_tensors):
        """Initialize with all DarkNet default parameters used creating
        YOLO, and specify the output tensors as an OrderedDict for their
        output dimensions with their names as keys.

        Keyword argument:
        output_tensors -- the output tensors as an OrderedDict containing the keys'
        output dimensions
        """
        self.model_name = model_name
        self.output_tensors = output_tensors
        self._nodes = list()
        self.graph_def = None
        self.input_tensor = None
        self.epsilon_bn = 1e-5
        self.momentum_bn = 0.99
        self.alpha_lrelu = 0.1
        self.param_dict = OrderedDict()
        self.major_node_specs = list()
        self.batch_size = 1
        self.route_spec = 0  # keeping track of the current active 'route'

    def build_onnx_graph(
            self,
            layer_configs,
            weights_file_path,
            verbose=True):
        """Iterate over all layer configs (parsed from the DarkNet
        representation of YOLO), create an ONNX graph, populate it with
        weights from the weights file and return the graph definition.

        Keyword arguments:
        layer_configs -- an OrderedDict object with all parsed layers' configurations
        weights_file_path -- location of the weights file
        verbose -- toggles if the graph is printed after creation (default: True)
        """
        for layer_name in layer_configs.keys():
            layer_dict = layer_configs[layer_name]
            major_node_specs = self._make_onnx_node(layer_name, layer_dict)
            if major_node_specs.name is not None:
                self.major_node_specs.append(major_node_specs)
        # remove dummy 'route' and 'yolo' nodes
        self.major_node_specs = [node for node in self.major_node_specs
                                      if 'dummy' not in node.name]
        outputs = list()
        for tensor_name in self.output_tensors.keys():
            output_dims = [self.batch_size, ] + \
                self.output_tensors[tensor_name]
            output_tensor = helper.make_tensor_value_info(
                tensor_name, TensorProto.FLOAT, output_dims)
            outputs.append(output_tensor)
        inputs = [self.input_tensor]
        weight_loader = WeightLoader(weights_file_path)
        initializer = list()
        # If a layer has parameters, add them to the initializer and input lists.
        for layer_name in self.param_dict.keys():
            _, layer_type = layer_name.split('_', 1)
            params = self.param_dict[layer_name]
            if layer_type == 'convolutional':
                #print('%s  ' % layer_name, end='')
                initializer_layer, inputs_layer = weight_loader.load_conv_weights(
                    params)
                initializer.extend(initializer_layer)
                inputs.extend(inputs_layer)
            elif layer_type == 'upsample':
                initializer_layer, inputs_layer = weight_loader.load_upsample_scales(
                    params)
                initializer.extend(initializer_layer)
                inputs.extend(inputs_layer)
        del weight_loader
        self.graph_def = helper.make_graph(
            nodes=self._nodes,
            name=self.model_name,
            inputs=inputs,
            outputs=outputs,
            initializer=initializer
        )
        if verbose:
            print(helper.printable_graph(self.graph_def))
        model_def = helper.make_model(self.graph_def,
                                      producer_name='NVIDIA TensorRT sample')
        return model_def

    def _make_onnx_node(self, layer_name, layer_dict):
        """Take in a layer parameter dictionary, choose the correct function for
        creating an ONNX node and store the information important to graph creation
        as a MajorNodeSpec object.

        Keyword arguments:
        layer_name -- the layer's name (also the corresponding key in layer_configs)
        layer_dict -- a layer parameter dictionary (one element of layer_configs)
        """
        layer_type = layer_dict['type']
        if self.input_tensor is None:
            if layer_type == 'net':
                major_node_output_name, major_node_output_channels = self._make_input_tensor(
                    layer_name, layer_dict)
                major_node_specs = MajorNodeSpecs(major_node_output_name,
                                                  major_node_output_channels)
            else:
                raise ValueError('The first node has to be of type "net".')
        else:
            node_creators = dict()
            node_creators['convolutional'] = self._make_conv_node
            node_creators['maxpool'] = self._make_maxpool_node
            node_creators['shortcut'] = self._make_shortcut_node
            node_creators['route'] = self._make_route_node
            node_creators['upsample'] = self._make_upsample_node
            node_creators['yolo'] = self._make_yolo_node

            if layer_type in node_creators.keys():
                major_node_output_name, major_node_output_channels = \
                    node_creators[layer_type](layer_name, layer_dict)
                major_node_specs = MajorNodeSpecs(major_node_output_name,
                                                  major_node_output_channels)
            else:
                print(
                    'Layer of type %s not supported, skipping ONNX node generation.' %
                    layer_type)
                major_node_specs = MajorNodeSpecs(layer_name,
                                                  None)
        return major_node_specs

    def _make_input_tensor(self, layer_name, layer_dict):
        """Create an ONNX input tensor from a 'net' layer and store the batch size.

        Keyword arguments:
        layer_name -- the layer's name (also the corresponding key in layer_configs)
        layer_dict -- a layer parameter dictionary (one element of layer_configs)
        """
        batch_size = layer_dict['batch']
        channels = layer_dict['channels']
        height = layer_dict['height']
        width = layer_dict['width']
        self.batch_size = batch_size
        input_tensor = helper.make_tensor_value_info(
            str(layer_name), TensorProto.FLOAT, [
                batch_size, channels, height, width])
        self.input_tensor = input_tensor
        return layer_name, channels

    def _get_previous_node_specs(self, target_index=0):
        """Get a previously ONNX node.

        Target index can be passed for jumping to a specific index.

        Keyword arguments:
        target_index -- optional for jumping to a specific index,
                        default: 0 for the previous element, while
                        taking 'route' spec into account
        """
        if target_index == 0:
            if self.route_spec != 0:
                previous_node = self.major_node_specs[self.route_spec]
                assert 'dummy' not in previous_node.name
                self.route_spec = 0
            else:
                previous_node = self.major_node_specs[-1]
        else:
            previous_node = self.major_node_specs[target_index]
        assert previous_node.created_onnx_node
        return previous_node

    def _make_conv_node(self, layer_name, layer_dict):
        ......
    def _make_shortcut_node(self, layer_name, layer_dict):
        ......
    def _make_route_node(self, layer_name, layer_dict):
        ......
    def _make_upsample_node(self, layer_name, layer_dict):
        ......
    def _make_maxpool_node(self, layer_name, layer_dict):
        ......
    def _make_yolo_node(self, layer_name, layer_dict):
        ......

2.ONNX to TensorRT
通过ONNX文件创建TensorRT引擎:

def build_engine(onnx_file_path, category_num=80, verbose=False):
    """Build a TensorRT engine from an ONNX file."""
    TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) if verbose else trt.Logger()
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
        builder.max_workspace_size = 1 << 28 # This determines the amount of memory available to the builder when building an optimized engine and should generally be set as high as possible.
        builder.max_batch_size = 1 # TensorRT可以优化的最大的batch size,实际运行时,选择的batch size小于等于该值。
        builder.fp16_mode = True
        #builder.strict_type_constraints = True

        # Parse model file
        print('Loading ONNX file from path {}...'.format(onnx_file_path))
        with open(onnx_file_path, 'rb') as model:
            if not parser.parse(model.read()):
                print('ERROR: Failed to parse the ONNX file.')
                for error in range(parser.num_errors):
                    print(parser.get_error(error))
                return None
        if trt.__version__[0] >= '7':
            # The actual yolo*.onnx is generated with batch size 64.
            # Reshape input to batch size 1
            shape = list(network.get_input(0).shape)
            shape[0] = 1
            network.get_input(0).shape = shape

        print('Adding yolo_layer plugins...')
        model_name = onnx_file_path[:-5]
        network = add_yolo_plugins( # Add yolo plugins into a TensorRT network。
            network, model_name, category_num, TRT_LOGGER)

        print('Building an engine.  This would take a while...')
        print('(Use "--verbose" to enable verbose logging.)')
        engine = builder.build_cuda_engine(network) # Builds an ICudaEngine from a INetworkDefinition。
        print('Completed creating engine.')
        return engine

engine序列化:engine.serialize()。

3.TensorRT推理
执行onboard摄像头实时检测:python3 trt_yolo.py --onboard 1 -m yolov4-tiny-416。
trt_yolo.py:
主函数:

def main():
    args = parse_args() # 解析输入参数。
    if args.category_num <= 0: # 类别错误。
        raise SystemExit('ERROR: bad category_num (%d)!' % args.category_num)
    if not os.path.isfile('yolo/%s.trt' % args.model): # .trt文件路径错误。
        raise SystemExit('ERROR: file (yolo/%s.trt) not found!' % args.model)

    cam = Camera(args) # 根据args(不同输入源)读取图像并返回cam实例。
    if not cam.isOpened(): # 正常start之后is_opened应为True。
        raise SystemExit('ERROR: failed to open camera!') # 否则视为开启失败。

    cls_dict = get_cls_dict(args.category_num) # 根据类别数返回类名称字典集合。
    yolo_dim = args.model.split('-')[-1] # 分离yolo对应的模型尺寸参数。
    if 'x' in yolo_dim: # 模型尺寸参数中带有WxH(e.g.416x256)的情况。
        dim_split = yolo_dim.split('x') # 分离w和h。
        if len(dim_split) != 2: # 不是w和h两个维度则报错。
            raise SystemExit('ERROR: bad yolo_dim (%s)!' % yolo_dim)
        w, h = int(dim_split[0]), int(dim_split[1]) # 提取w和h。
    else: # 模型尺寸参数中W和H相同的情况。
        h = w = int(yolo_dim)
    if h % 32 != 0 or w % 32 != 0: # 模型尺寸参数中h和w都要是32的倍数,否则报错。
        raise SystemExit('ERROR: bad yolo_dim (%s)!' % yolo_dim)

    trt_yolo = TrtYOLO(args.model, (h, w), args.category_num) # 封装运行TensorRT所需的参数并返回TRT YOLO实例。

    open_window(
        WINDOW_NAME, 'Camera TensorRT YOLO Demo',
        cam.img_width, cam.img_height) # 打开显示窗口,命名并设置窗口大小(默认640*480)。
    vis = BBoxVisualization(cls_dict) # 画b-boxes。
    loop_and_detect(cam, trt_yolo, conf_th=0.3, vis=vis) # 持续捕获图像并做检测。

    cam.release() # stop,thread_running和is_opened置False。
    cv2.destroyAllWindows() # 删除所有窗口。

实时检测函数:

def loop_and_detect(cam, trt_yolo, conf_th, vis):
    """Continuously capture images from camera and do object detection.

    # Arguments
      cam: the camera instance (video source).
      trt_yolo: the TRT YOLO object detector instance.
      conf_th: confidence/score threshold for object detection.
      vis: for visualization.
    """
    full_scrn = False # 默认非全屏。
    fps = 0.0 # 帧率。
    tic = time.time() # 返回当前时间的时间戳。
    while True:
        if cv2.getWindowProperty(WINDOW_NAME, 0) < 0: # 获取窗口属性,关闭窗口时退出程序。
            break
        img = cam.read() # 从camera结构体读取一帧图像。
        if img is None: # camera runs out of image or error。
            break
        boxes, confs, clss = trt_yolo.detect(img, conf_th) # 检测目标,包括preprocess和postprocess。
        img = vis.draw_bboxes(img, boxes, confs, clss)
        img = show_fps(img, fps) # Draw fps number at top-left corner of the image。
        cv2.imshow(WINDOW_NAME, img) # 显示。
        toc = time.time() # 返回当前时间的时间戳。
        curr_fps = 1.0 / (toc - tic) # 计算当前帧率。
        # calculate an exponentially decaying average of fps number
        fps = curr_fps if fps == 0.0 else (fps*0.95 + curr_fps*0.05)
        tic = toc
        key = cv2.waitKey(1) # 等待键盘输入。
        if key == 27:  # ESC key: quit program
            break
        elif key == ord('F') or key == ord('f'):  # Toggle fullscreen
            full_scrn = not full_scrn
            set_display(WINDOW_NAME, full_scrn) # 切换全屏。

Trt-YOLO推理:
序列化和反序列化

所谓的serialize,就是将这个engine转换为一种格式存储起来,后面用在inference上。在Inference的时候,只需要deserialize这个存储的engine就可以了。之所以这样做,是因为build engine的过程时比较消耗时间的,如果能将已经build的engine存储起来后面调用,这会加速整个inference的准备时间。

host&device

CUDA将memory model unit分为device和host两个系统。

Registers:寄存器是GPU最快的memory,kernel中无特殊声明的自动变量都放在寄存器中。寄存器是稀有资源,寄存器变量是每个线程私有的,一旦thread执行结束,寄存器变量就会失效。如果kernel使用的register超过硬件限制,这部分会使用local memory来代替register,即register spilling,应该尽量避免这种情况。

global、constant和texture拥有相同的生命周期,其中constant和texture是只读的。
Jetson Nano实现基于YOLO-V4及TensorRT的实时目标检测_第1张图片

在GPU编程中,我们不得不将数据从CPU转移到GPU中,而这一步比较耗时。CPU数据分配默认是pageable的(要承受因pagefault导致的操作),GPU不能从pageable的host memory直接获取数据,因此当数据从pageable的host memory转移到device memory这样的一个任务开启后,CUDA驱动需要先分配一个临时的page-locked或者pinned的host array存放数据,再从这样的pinned array传输到device memory.
Jetson Nano实现基于YOLO-V4及TensorRT的实时目标检测_第2张图片

TensorRT加速YOLO

class TrtYOLO(object):
    """TrtYOLO class encapsulates things needed to run TRT YOLO."""

    def _load_engine(self):
        TRTbin = 'yolo/%s.trt' % self.model # 模型对应的文件名。
        with open(TRTbin, 'rb') as f, trt.Runtime(self.trt_logger) as runtime: # 从.trt文件中读取engine并反序列化来执行推断(需要创建一个runtime对象)。
            return runtime.deserialize_cuda_engine(f.read()) # 返回反序列化后的结果,optimized ICudaEngine for executing inference on a built network。

    def __init__(self, model, input_shape, category_num=80, cuda_ctx=None):
        """Initialize TensorRT plugins, engine and conetxt."""
        self.model = model
        self.input_shape = input_shape
        self.category_num = category_num
        self.cuda_ctx = cuda_ctx # 默认CUDA上下文只能从创建它的CPU线程访问,其他线程访问需push/pop从创建它的线程中弹出它,这样context可以被推送到任何其他CPU线程的当前上下文栈,并且随后的CUDA调用将引用该上下文。
        if self.cuda_ctx:
            self.cuda_ctx.push()

        self.inference_fn = do_inference if trt.__version__[0] < '7' \
                                         else do_inference_v2 # 不同TensorRT版本。
        self.trt_logger = trt.Logger(trt.Logger.INFO) # 打印日志,启动一个logging界面,抑制warning和errors,仅报告informational messages。
        self.engine = self._load_engine() # 加载TRT引擎并执行反序列化。

        try: # 创建一个上下文,储存中间值,因为engine包含network定义和训练参数,因此需要额外的空间。
            self.context = self.engine.create_execution_context() # create_execution_context是写在ICudaEngine.py的一个闭源方法,这个方法是创建立一个IExecutionContext类型的对象。
            grid_sizes = get_yolo_grid_sizes(
                self.model, self.input_shape[0], self.input_shape[1]) # 获取yolo网格大小,tiny模型只有1/32和1/16两个scale。
            self.inputs, self.outputs, self.bindings, self.stream = \
                allocate_buffers(self.engine, grid_sizes) # 为输入输出分配host和device的buffers。
        except Exception as e:
            raise RuntimeError('fail to allocate CUDA resources') from e
        finally:
            if self.cuda_ctx:
                self.cuda_ctx.pop()

    def __del__(self):
        """Free CUDA memories."""
        del self.outputs
        del self.inputs
        del self.stream

    def detect(self, img, conf_th=0.3):
        """Detect objects in the input image."""
        img_resized = _preprocess_yolo(img, self.input_shape) # 预处理,原始图像(numpy array)由int8(h,w,3)根据input_shape转换成float32(3,H,W)。

        # Set host input to the image. The do_inference() function
        # will copy the input to the GPU before executing.
        self.inputs[0].host = np.ascontiguousarray(img_resized) # 图片转换为内存连续存储的数组(运行速度更快),给input赋值(设置inputs[0]中的host,即input的host_mem)。
        if self.cuda_ctx: # 默认None。
            self.cuda_ctx.push()
        trt_outputs = self.inference_fn( # 进行推理,TensorRT7版本以上,调用do_inference_v2。
            context=self.context, # context是用来执行推断的对象,初始化时通过engine.create_execution_context()生成。
            bindings=self.bindings, # bindings中存的是每个input/output所占byte数的int值。
            inputs=self.inputs, # 由一个个HostDeviceMem类型组成的list,比如inputs[0]就在之前的步骤被赋值为预处理后的image。
            outputs=self.outputs, # 由一个个HostDeviceMem类型组成的list,outputs在没有执行推断之前,值为0。返回对应三个yolo scale的三个HostDeviceMem对象。
            stream=self.stream) # stream为在allocate_buffers中由cuda.Stream()生成的stream,来自于Stream.py,但是这个不是TensorRT的东西,而来自于pycuda,是cuda使用过程不可缺少的一员。
        if self.cuda_ctx:
            self.cuda_ctx.pop()

        boxes, scores, classes = _postprocess_yolo(
            trt_outputs, img.shape[1], img.shape[0], conf_th) # 后处理。trt_outputs: a list of 2 or 3 tensors, where each tensor contains a multiple of 7 float32 numbers in the order of [x, y, w, h, box_confidence, class_id, class_prob]。

        # clip x1, y1, x2, y2 within original image
        boxes[:, [0, 2]] = np.clip(boxes[:, [0, 2]], 0, img.shape[1]-1)
        boxes[:, [1, 3]] = np.clip(boxes[:, [1, 3]], 0, img.shape[0]-1)
        return boxes, scores, classes

执行GPU底层推理

def do_inference_v2(context, bindings, inputs, outputs, stream):
    """do_inference_v2 (for TensorRT 7.0+)

    This function is generalized for multiple inputs/outputs for full
    dimension networks.
    Inputs and outputs are expected to be lists of HostDeviceMem objects.
    """
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] # 把input中的数据从主机内存复制到设备内存(递给GPU),而inputs中的元素恰好是函数可以接受的HostDeviceMem类型。
    # Run inference.
    context.execute_async_v2(bindings=bindings, stream_handle=stream.handle) # 利用GPU执行推断的步骤,异步。
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] # 把计算完的数据从device(GPU)拷回host memory中。
    # Synchronize the stream
    stream.synchronize() # 同步。
    # Return only the host outputs.
    return [out.host for out in outputs] # 把存在HostDeviceMem类型的outpus中的host中的数据,取出来,放在一个list中返回。

四.结果
Jetson-Nano-B01新版(配置15w电源),预训练yolov4-tiny-416(FP16),树莓派IMX219摄像头(77°视场角),64G-SD卡,JetPack-4.4,TensorRT 7.x+,onnx==1.4.1。

FPS:26.5,能够达到摄像头实时检测效果。

不足:发热,需要加装风扇;摄像头分辨率有限。

【关于bounding box预测】
网络最后输出Feature Map,S×S(YOLO-V4的3个scale:1919/3838/76*76)个grid cell中,每个都产生3个bounding box,每个bounding box的输出包括:box的中心点坐标(x,y)、box的宽和高(w,h)、置信度(Cij)、classes个类别概率。

  1. anchor box
    bounding box:从训练集的所有ground truth box中使用k-means聚类出最经常出现的9个box的宽和高,预先将这些统计上的先验经验加入到模型中,有助于模型快速收敛。YOLO-V3/4中,每个grid cell中有三个bounding box负责预测对象,分别对应不同的anchor box及ground truth。训练中取与ground truth box的IOU最大的anchor box对应的bounding box进行测预测。

宽和高:bounding box不直接预测实际box的宽和高(w,h),而是将预测的宽和高指数化后分别与anchor box的宽和高相乘,形成转换后的预测值。这样,经过多次惩罚训练后,每个bounding box就知道自己该预测什么尺度的box了。

中心坐标:bounding box直接预测出的坐标​是相对grid cell来说的。通过sigmoid函数可以将(x,y)压缩到[0,1]区间內,确保box中心坐标总是落在相应的grid cell中,再通过grid cell的左上角坐标转换成最终输出的绝对坐标。

训练:逆运算,将ground truth box的参数转换为与bounding box相同的输出域,再计算误差。

  1. 置信度(confidence)
    第i个grid cell的第j个bounding box的置信度Cij,取决于当前box是否有对象(不是背景)以及bounding box依赖的anchor box与该对象的ground truth box的IOU在所有的9个anchor box中是不是最大。先后满足这两个条件,置信度为1,否则置信度为0。

  2. 对象条件类别概率(conditional class probabilities)
    一组概率,数组的长度为当前模型检测的类别数。当bounding box认为当前box中有对象时,会分别生成该对象是每种类别的概率。这个概率是基于置信度对background的预测结果的条件概率,同时,每个类别概率是单独用逻辑回归函数(sigmoid)计算得出的,类别间不必是互斥的,因此一个对象可以被预测出多个类别。

你可能感兴趣的:(TensorRT-部署-加速)