RetinaFace Pytorch实现训练、测试,pytorch模型转onnx转ncnn C++推理

首先感谢大佬的开源:
RetinaFace Pytorch:https://github.com/biubug6/Pytorch_Retinaface
pytorch转onnx转ncnn C++推理:https://github.com/biubug6/Face-Detector-1MB-with-landmark

RetinaFace Pytorch

训练和评估直接看github项目里的就可以了,主要提供了RetinaFace(resnet和mobilenet)、Slim和RFB网络结构(version-slim (network backbone simplification,slightly faster) and version-RFB (with the modified RFB module, higher precision)。基本思路类似于yolov3,在多个尺度的特征图上生成box。
下面给出单张图片和视频的推理代码,1050显卡推理速度在30ms左右。

import os
import sys
import os
import argparse
import torch
import torch.backends.cudnn as cudnn
import numpy as np
from data.config import cfg_mnet, cfg_re50
from layers.functions.prior_box import PriorBox
from utils.nms.py_cpu_nms import py_cpu_nms
import cv2
from models.retinaface import RetinaFace
from utils.box_utils import decode
import glob
import time


force_cpu = False
if force_cpu:
    device = torch.device('cpu')

device = torch.cuda.current_device()

parser = argparse.ArgumentParser(description='Retinaface')
parser.add_argument('-m', '--trained_model', default='./weights/Resnet50_Final.pth',
                    type=str, help='Trained state_dict file path to open')
parser.add_argument('--origin_size', default=True, type=str,
                    help='Whether use origin image size to evaluate')
parser.add_argument('--img_folder', default='./images/',
                    type=str, help='dataset path')
parser.add_argument('--confidence_threshold', default=0.02,
                    type=float, help='confidence_threshold')
parser.add_argument('--top_k', default=5000, type=int, help='top_k')
parser.add_argument('--nms_threshold', default=0.3,
                    type=float, help='nms_threshold')
parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k')
parser.add_argument('-s', '--show_image', action="store_true",
                    default=True, help='show detection results')
parser.add_argument('--vis_thres', default=0.3, type=float,
                    help='visualization_threshold')
args = parser.parse_args()


def check_keys(model, pretrained_state_dict):
    ckpt_keys = set(pretrained_state_dict.keys())
    model_keys = set(model.state_dict().keys())
    used_pretrained_keys = model_keys & ckpt_keys
    unused_pretrained_keys = ckpt_keys - model_keys
    missing_keys = model_keys - ckpt_keys
    print('Missing keys:{}'.format(len(missing_keys)))
    print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))
    print('Used keys:{}'.format(len(used_pretrained_keys)))
    assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'
    return True


def remove_prefix(state_dict, prefix):
    ''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''
    print('remove prefix \'{}\''.format(prefix))
    def f(x): return x.split(prefix, 1)[-1] if x.startswith(prefix) else x
    return {f(key): value for key, value in state_dict.items()}


def load_model(model, pretrained_path):
    print('Loading pretrained model from {}'.format(pretrained_path))
    pretrained_dict = torch.load(pretrained_path,
                                 map_location=lambda storage, loc: storage if force_cpu else storage.cuda(device))
    if "state_dict" in pretrained_dict.keys():
        pretrained_dict = remove_prefix(
            pretrained_dict['state_dict'], 'module.')
    else:
        pretrained_dict = remove_prefix(pretrained_dict, 'module.')
    check_keys(model, pretrained_dict)
    model.load_state_dict(pretrained_dict, strict=False)
    return model

def detect_vis(net,img_raw):
    img = np.float32(img_raw)
    # testing scale
    target_size = 1600
    max_size = 2150

    im_shape = img.shape
    im_size_min = np.min(im_shape[0:2])
    im_size_max = np.max(im_shape[0:2])
    resize = float(target_size) / float(im_size_min)
    # prevent bigger axis from being more than max_size:
    if np.round(resize * im_size_max) > max_size:
        resize = float(max_size) / float(im_size_max)
    if args.origin_size:
        resize = 1

    if resize != 1:
        img = cv2.resize(img, None, None, fx=resize,
                         fy=resize, interpolation=cv2.INTER_LINEAR)
    im_height, im_width, _ = img.shape
    scale = torch.Tensor(
        [img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
    img -= (104, 117, 123)
    img = img.transpose(2, 0, 1)
    img = torch.from_numpy(img).unsqueeze(0)
    img = img.to(device)
    scale = scale.to(device)

    tic = time.time()
    loc, conf, landms = net(img)  # forward pass
    print('net forward time: {}'.format(time.time() - tic))
    priorbox = PriorBox(cfg_re50, image_size=(im_height, im_width))
    priors = priorbox.forward()
    priors = priors.to(device)
    prior_data = priors.data
    boxes = decode(loc.data.squeeze(0), prior_data, cfg_re50['variance'])
    boxes = boxes * scale / resize
    boxes = boxes.cpu().numpy()
    scores = conf.squeeze(0).data.cpu().numpy()[:, 1]

    # ignore low scores
    inds = np.where(scores > args.confidence_threshold)[0]
    boxes = boxes[inds]
    scores = scores[inds]

    # keep top-K before NMS
    order = scores.argsort()[::-1][:args.top_k]
    boxes = boxes[order]
    scores = scores[order]

    # do NMS
    dets = np.hstack((boxes, scores[:, np.newaxis])).astype(
        np.float32, copy=False)
    keep = py_cpu_nms(dets, args.nms_threshold)
    dets = dets[keep, :]

    # keep top-K faster NMS
    dets = dets[:args.keep_top_k, :]
    # show image
    if args.show_image:
        for b in dets:
            if b[4] < args.vis_thres:
                continue
            text = "{:.4f}".format(b[4])
            b = list(map(int, b))
            cv2.rectangle(img_raw, (b[0], b[1]),
                          (b[2], b[3]), (0, 0, 255), 2)
            cx = b[0]
            cy = b[1] + 12
            cv2.putText(img_raw, text, (cx, cy),
                        cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))
        cv2.imshow("res", img_raw)

def test_video(net):
    cap = cv2.VideoCapture(0)
    while cap.isOpened():
        ret,img = cap.read()
        detect_vis(net,img)
        k = cv2.waitKey(1)
        if k == ord('a') or k == ord('A'):
            cv2.imwrite('test.jpg', img)
        if k == ord('q') or k == ord('Q'):
            cap.release()
            
def test_pic(net):
    img_folder = args.img_folder
    all_imgs = glob.glob(os.path.join(img_folder, '*.jpg'))

    # testing begin
    for i, img_f in enumerate(all_imgs):
        img_raw = cv2.imread(img_f, cv2.IMREAD_COLOR)
        img = np.float32(img_raw)

        # testing scale
        target_size = 1600
        max_size = 2150

        im_shape = img.shape
        im_size_min = np.min(im_shape[0:2])
        im_size_max = np.max(im_shape[0:2])
        resize = float(target_size) / float(im_size_min)
        # prevent bigger axis from being more than max_size:
        if np.round(resize * im_size_max) > max_size:
            resize = float(max_size) / float(im_size_max)
        if args.origin_size:
            resize = 1

        if resize != 1:
            img = cv2.resize(img, None, None, fx=resize,
                             fy=resize, interpolation=cv2.INTER_LINEAR)
        im_height, im_width, _ = img.shape
        scale = torch.Tensor(
            [img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
        img -= (104, 117, 123)
        img = img.transpose(2, 0, 1)
        img = torch.from_numpy(img).unsqueeze(0)
        img = img.to(device)
        scale = scale.to(device)

        print('input tensor shape: {}'.format(img.size()))
        tic = time.time()
        loc, conf, landms = net(img)  # forward pass
        print('net forward time: {}'.format(time.time() - tic))
        priorbox = PriorBox(cfg_re50, image_size=(im_height, im_width))
        priors = priorbox.forward()
        priors = priors.to(device)
        prior_data = priors.data
        boxes = decode(loc.data.squeeze(0), prior_data, cfg_re50['variance'])
        boxes = boxes * scale / resize
        boxes = boxes.cpu().numpy()
        scores = conf.squeeze(0).data.cpu().numpy()[:, 1]

        # ignore low scores
        inds = np.where(scores > args.confidence_threshold)[0]
        boxes = boxes[inds]
        scores = scores[inds]

        # keep top-K before NMS
        order = scores.argsort()[::-1][:args.top_k]
        boxes = boxes[order]
        scores = scores[order]

        # do NMS
        dets = np.hstack((boxes, scores[:, np.newaxis])).astype(
            np.float32, copy=False)
        keep = py_cpu_nms(dets, args.nms_threshold)
        dets = dets[keep, :]

        # keep top-K faster NMS
        dets = dets[:args.keep_top_k, :]
        # show image
        if args.show_image:
            for b in dets:
                if b[4] < args.vis_thres:
                    continue
                text = "{:.4f}".format(b[4])
                b = list(map(int, b))
                cv2.rectangle(img_raw, (b[0], b[1]),
                              (b[2], b[3]), (0, 0, 255), 2)
                cx = b[0]
                cy = b[1] + 12
                cv2.putText(img_raw, text, (cx, cy),
                            cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))
            print(img_f.split("/")[-1][7:])
            cv2.imwrite("./result/"+img_f.split("/")[-1][7:],img_raw)

if __name__ == '__main__':
    torch.set_grad_enabled(False)
    # net and model
    net = RetinaFace(cfg=cfg_re50, phase='test')
    net = load_model(net, args.trained_model)
    net.eval()
    print('Finished loading model!')
    cudnn.benchmark = True
    net = net.to(device)
    test_video(net)
    # test_pic(net)

pytorch转onnx转ncnn C++推理

环境:ubuntu16.04或win10;pytorch1.2(虽然作者说1.1.0以上就可以了,但是1.1.0版本onnx转换会有bug);protobuf;opencv;ncnn。

ubuntu:

ubuntu下安装protobu、opencvf和ncnn还是比较简单的,收集了一些比较靠谱的博客。
protobuf安装:https://blog.csdn.net/u010918487/article/details/82947157
opencv安装:https://www.jianshu.com/p/f646448da265
ncnn安装:https://yyingbiu.github.io/2019/08/21/linux-xia-bian-yi-an-zhuang-ncnn/
执行以下指令,将pytorch模型转换成onnx格式

python convert_to_onnx.py --trained_model weight_file --network mobile0.25 or slim or RFB

pytorch生成onnx转换模型时有一些冗余,我们用工具简化一下onnx模型执行下面命令,转换就是正常的了,这个face_sim.onnx 就是最终的onnx文件。

pip install onnx-simplifier
python -m onnxsim face.onnx face_sim.onnx 

后面按照github里面的流程来就可以了

win10

cmake、opencv和VS2015我之前安装过了,这里就不介绍了
推荐一个安装protobuf、ncnn比较靠谱的博客
https://blog.csdn.net/heiheiya/article/details/100519584
有以下几点要注意的:
1.protobuf的版本为3.4.0,我之前先安装了一个3.6.1版本的,在ncnn编译的时候onnx2caffe等工具的时候出了问题。

2.我是用命令行编译protobuf和ncnn的,这里要注意不能用普通的cmd窗口,要用Visual C++ 2015 x64 Native Build Tools Command Prompt,否则环境不行导致编译失败。

3.在编译ncnn之前要配置一下opencv,cmake可能找不到opencv的路径,在对应的CMakeLists.txt里面添加

set(OpenCV_DIR D:/mysoftware/opencv/build)
include_directories(${OpenCV_DIR}/include)

4.可以提前修改CMakeLists.txt里面相关路径,也可以直接执行以下指令后build里面有个vs2015工程.sln在vs里面配置include和lib路径,注意把ncnn的包含路径换成自己编译的,不要用github作者否则有重定义的bug!

mkdir build
cd build
cmake ..

你可能感兴趣的:(人脸,pytorch,ncnn)