yolov8-pose 特征点推理流程

目录

一、关键点预测

二、图像预处理

二、推理

三、后处理与可视化

3.1、后处理

3.2、特征点可视化

四、完整pytorch代码


yolov8-pose tensorrt

一、关键点预测

注:本篇只是阐述推理流程,tensorrt实现后续跟进。

yolov8-pose的tensorrt部署代码稍后更新,还是在仓库:GitHub - FeiYull/TensorRT-Alpha: TensorRT-Alpha supports YOLOv8、YOLOv7、YOLOv6、YOLOv5、YOLOv4、v3、YOLOX、YOLOR...CUDA IS ALL YOU NEED.It also supports end2end CUDA C acceleration and multi-batch inference.

也可以关注:TensorRT系列教程-CSDN博客

以下是官方预测代码:

from ultralytics import YOLO
model = YOLO(model='yolov8n-pose.pt')
model.predict(source="d:/Data/1.jpg", save=True)

推理过程无非是:图像预处理 -> 推理 -> 后处理 + 可视化,这三个关键步骤在文件大概247行:D:\CodePython\ultralytics\ultralytics\engine\predictor.py,代码如下:

# Preprocess
with profilers[0]:
	im = self.preprocess(im0s) # 图像预处理

# Inference
with profilers[1]:
	preds = self.inference(im, *args, **kwargs) # 推理

# Postprocess
with profilers[2]:
	self.results = self.postprocess(preds, im, im0s) # 后处理

二、图像预处理

通过debug,进入上述self.preprocess函数,看到代码实现如下。处理流程大概是:padding(满足矩形推理),图像通道转换,即:BGR装RGB,检查图像数据是否连续,存储顺序有HWC转为CHW,然后归一化。需要注意,原始pytorch框架图像预处理的时候,会将图像缩放+padding为HxW的图像,其中H、W为32倍数,而导出tensorrt的时候,为了高效推理,H、W 固定为640x640。

def preprocess(self, im):
	"""Prepares input image before inference.

	Args:
		im (torch.Tensor | List(np.ndarray)): BCHW for tensor, [(HWC) x B] for list.
	"""
	not_tensor = not isinstance(im, torch.Tensor)
	if not_tensor:
		im = np.stack(self.pre_transform(im))
		im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
		im = np.ascontiguousarray(im)  # contiguous
		im = torch.from_numpy(im)

	img = im.to(self.device)
	img = img.half() if self.model.fp16 else img.float()  # uint8 to fp16/32
	if not_tensor:
		img /= 255  # 0 - 255 to 0.0 - 1.0
	return img

二、推理

图像预处理之后,直接推理就行了,这里是基于pytorch推理。

def inference(self, im, *args, **kwargs):
	visualize = increment_path(self.save_dir / Path(self.batch[0][0]).stem,
							   mkdir=True) if self.args.visualize and (not self.source_type.tensor) else False
	return self.model(im, augment=self.args.augment, visualize=visualize)

三、后处理与可视化

3.1、后处理

网络推理输出特征图维度为:56x8400,其中:

  • 8400表示候选目标数量,
  • 56 = xywhc + points * 17,points的长度为3,分别为:xyc,即:特征点的坐标和置信度

尽管推理输出特征图中,每一行既有bbox,还有keypoints,但是NMS的时候,依然只作用于bbox,下面代码作了NMS之后,将筛选之后的目标中bbox、keypoints进行坐标值缩放(缩放到原图尺寸坐标系)。

def postprocess(self, preds, img, orig_imgs):
	"""Return detection results for a given input image or list of images."""
	preds = ops.non_max_suppression(preds,
									self.args.conf,
									self.args.iou,
									agnostic=self.args.agnostic_nms,
									max_det=self.args.max_det,
									classes=self.args.classes,
									nc=len(self.model.names))

	results = []
	for i, pred in enumerate(preds):
		orig_img = orig_imgs[i] if isinstance(orig_imgs, list) else orig_imgs
		shape = orig_img.shape
		pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], shape).round()
		pred_kpts = pred[:, 6:].view(len(pred), *self.model.kpt_shape) if len(pred) else pred[:, 6:]
		pred_kpts = ops.scale_coords(img.shape[2:], pred_kpts, shape)
		path = self.batch[0]
		img_path = path[i] if isinstance(path, list) else path
		results.append(
			Results(orig_img=orig_img,
					path=img_path,
					names=self.model.names,
					boxes=pred[:, :6],
					keypoints=pred_kpts))
	return results

3.2、特征点可视化

bbox可视化没什么好说的,说下17个特征点的可视化,在文件:D:\CodePython\ultralytics_fire_smoke\ultralytics\utils\plotting.py171行,绘制特征点需要注意,需要按照预定义的顺序绘制,其中特征点置信度需要足够大。

def kpts(self, kpts, shape=(640, 640), radius=5, kpt_line=True):
	"""
	Plot keypoints on the image.

	Args:
		kpts (tensor): Predicted keypoints with shape [17, 3]. Each keypoint has (x, y, confidence).
		shape (tuple): Image shape as a tuple (h, w), where h is the height and w is the width.
		radius (int, optional): Radius of the drawn keypoints. Default is 5.
		kpt_line (bool, optional): If True, the function will draw lines connecting keypoints
								   for human pose. Default is True.

	Note: `kpt_line=True` currently only supports human pose plotting.
	"""
	if self.pil:
		# Convert to numpy first
		self.im = np.asarray(self.im).copy()
	nkpt, ndim = kpts.shape
	is_pose = nkpt == 17 and ndim == 3
	kpt_line &= is_pose  # `kpt_line=True` for now only supports human pose plotting
	# 绘制特征点
	for i, k in enumerate(kpts):
		color_k = [int(x) for x in self.kpt_color[i]] if is_pose else colors(i)
		x_coord, y_coord = k[0], k[1]
		if x_coord % shape[1] != 0 and y_coord % shape[0] != 0:
			if len(k) == 3:
				conf = k[2]
				if conf < 0.5:
					continue
			cv2.circle(self.im, (int(x_coord), int(y_coord)), radius, color_k, -1, lineType=cv2.LINE_AA)
	# 绘制线段
	if kpt_line:
		ndim = kpts.shape[-1]
		for i, sk in enumerate(self.skeleton):
			pos1 = (int(kpts[(sk[0] - 1), 0]), int(kpts[(sk[0] - 1), 1]))
			pos2 = (int(kpts[(sk[1] - 1), 0]), int(kpts[(sk[1] - 1), 1]))
			if ndim == 3:
				conf1 = kpts[(sk[0] - 1), 2]
				conf2 = kpts[(sk[1] - 1), 2]
				if conf1 < 0.5 or conf2 < 0.5:
					continue
			if pos1[0] % shape[1] == 0 or pos1[1] % shape[0] == 0 or pos1[0] < 0 or pos1[1] < 0:
				continue
			if pos2[0] % shape[1] == 0 or pos2[1] % shape[0] == 0 or pos2[0] < 0 or pos2[1] < 0:
				continue
			cv2.line(self.im, pos1, pos2, [int(x) for x in self.limb_color[i]], thickness=2, lineType=cv2.LINE_AA)
	if self.pil:
		# Convert im back to PIL and update draw
		self.fromarray(self.im)

这里给一张特征点顺序图:

yolov8-pose 特征点推理流程_第1张图片

四、完整pytorch代码

将以上流程合并起来,并加以修改,完整代码如下:

import torch
import cv2 as cv
import numpy as np
from ultralytics.data.augment import LetterBox
from ultralytics.utils import ops
from ultralytics.engine.results import Results
import copy

# path = 'd:/Data/1.jpg'
path = 'd:/Data/6406402.jpg'
device = 'cuda:0'
conf = 0.25
iou = 0.7

# preprocess
im = cv.imread(path)
# letterbox
im = [im]
orig_imgs = copy.deepcopy(im)
im = [LetterBox([640, 640], auto=True, stride=32)(image=x) for x in im]
im = im[0][None] # im = np.stack(im)
im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
im = np.ascontiguousarray(im)  # contiguous
im = torch.from_numpy(im)
img = im.to(device)
img = img.float()
img /= 255
# load model pt
ckpt = torch.load('yolov8n-pose.pt', map_location='cpu')
model = ckpt['model'].to(device).float()  # FP32 model
model.eval()

# inference
preds = model(img)
prediction = ops.non_max_suppression(preds, conf, iou, agnostic=False, max_det=300, classes=None, nc=len(model.names))

results = []
for i, pred in enumerate(prediction):
    orig_img = orig_imgs[i] if isinstance(orig_imgs, list) else orig_imgs
    shape = orig_img.shape
    pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], shape).round()
    pred_kpts = pred[:, 6:].view(len(pred), *model.kpt_shape) if len(pred) else pred[:, 6:]
    pred_kpts = ops.scale_coords(img.shape[2:], pred_kpts, shape)
   
    img_path = path
    results.append(
        Results(orig_img=orig_img,
                path=img_path,
                names=model.names,
                boxes=pred[:, :6],
                keypoints=pred_kpts))

# show
plot_args = {'line_width': None,'boxes': True,'conf': True, 'labels': True}
plot_args['im_gpu'] = img[0]
result = results[0]
plotted_img = result.plot(**plot_args)
cv.imshow('plotted_img', plotted_img)
cv.waitKey(0)
cv.destroyAllWindows()
print()

你可能感兴趣的:(TensorRT-Alpha,YOLO高级篇,人工智能,机器学习,opencv,计算机视觉,YOLO)