卓玛cug

dlib人脸识别安装及使用教程

文章目录

- 一 dlib本地安装与编译
- - 1.1 dlib源码下载
  - 1.2 dlib C++编译示例程序
  - - 1.2.1 dlib库编译
    - 1.2.2 C++示例程序配置、运行
  - 1.3 dlib python API编译
- 二 dlib库的主要功能及准确率评估
- - 2.1 代码功能简介
  - 2.2 人脸检测和人脸关键点
  - - 2.2.1 数据集、代码准备
    - 2.2.2 测试效果图
    - 2.2.3 准确率
  - 2.3 人脸识别
  - - 2.3.1 数据集、代码准备
    - 2.3.2 人脸识别步骤
    - 2.3.3 准确率
  - 2.4 视频中的人脸检测、人脸识别
  - - 2.4.1 摄像头读入检测时间测试
    - 2.4.2 mp4文件读入检测时间测试
    - 2.4.3 视频中的人脸识别
    - 2.4.4 优化部分
- 三 python训练自己的模型
- - 3.1 数据集标注
  - - 3.1.1 imglab简介
    - 3.1.2 imglab使用方法
  - 3.2 训练自己的人脸关键点检测器
  - - 3.2.1 数据集
    - 3.2.2 训练部分
    - 3.2.3 测试部分
    - 3.2.4 优化部分
  - 3.3 训练自己的人脸检测器
  - - 3.3.1 数据集
    - 3.3.2 训练部分
    - 3.3.3 测试部分
    - 3.3.4 优化部分
  - 3.4 总结
- 四 C++训练自己的模型
- - 4.1 训练自己的人脸关键点检测器
  - - 4.1.1 数据集
    - 4.1.2 训练部分
    - 4.1.3 测试
    - 4.1.4 优化部分
  - 4.2 训练自己的人脸检测器
  - - 4.2.1 数据集
    - 4.2.2 训练部分
    - 4.2.3 测试
    - 4.2.4 优化部分

一 dlib本地安装与编译

1.1 dlib源码下载

下载地址：https://github.com/davisking/dlib （当前最新版dlib 19.15）

为了区分版本，将下载目录命名为dlib-19-15，如上图所示。

1.2 dlib C++编译示例程序

1.2.1 dlib库编译

编译需要安装VS，在此安装的是最新版Visual Studio 15 2017版本。

进入dlib-master/dlib-19-15目录，运行：

# mkdir build
# cd build
# cmake .. 
# cmake --build .

指定运行环境及模式：

# cmake .. -G "Visual Studio 15 2017 Win64" -T host=x64

在上图的目录下能看到生成的.lib依赖项，则代表dllib库成功编译。

1.2.2 C++示例程序配置、运行

以examples/train_shape_predictor_ex.cpp为例，其他示例代码操作相同。
1、创建ConsoleApplication1.cpp和source.cpp来源
首先，打开VS新建一个C++控制台工程，将train_shape_predictor_ex.cpp的代码复制到ConsoleApplication1.cpp，以添加现有项的方式加入source.cpp文件，source.cpp文件在dlib-master/dlib-19-15/dlib/all目录下。

2、修改stadfx属性
进入项目-属性进行以下修改，避免预编译头带来的error。

3、加入目录

4、加入生成的依赖项.lib的路径

5、图形处理类配置
加入DLIB_JPG_SUPPORT、DLIB_JPEG_ SUPPORT、DLIB_JPEG_STATIC

项目配置完成后，点击生成-生成解决方案，工程目录下将会生成ConsoleApplication1.exe文件。以命令行的方式运行ConsoleApplication1.exe文件，或者在VS上点击调试-开始执行即可。有参数输入的需要输入命令行参数。

1.3 dlib python API编译

方法一：
进入目录，运行：

# python setup.py install

之后进入python_examples便可运行python示例程序。

方法二：

# pip3 install dlib

这种方法目前本地dlib19.15版本不能成功安装，只能安装低版本的dlib，这样python示例中的某些函数调用可能不能正常运行。

二 dlib库的主要功能及准确率评估

dlib库中的主要功能包括人脸检测、人脸关键点检测、人脸识别三部分。此处研究python_examples示例代码部分，C++程序示例类似。这里的评估实现主要是参考2.1节中示例代码的二次开发代码。

2.1 代码功能简介

主要代码在dlib库的python_examples目录下，其中需要用到的模型文件下载地址为http://dlib.net/files：

face_detector.py
人脸正面检测器，主要使用dlib.get_frontal_face_detector()。
cnn_face_detector.py
人脸检测器，主要使用dlib.cnn_face_detection_model_v1 (‘mmod_human_face_detector.dat’)，官方指出比dlib.get_frontal_face_detector()准确率高。
face_landmark.py
人脸关键点检测，主要使用dlib.get_frontal_face_detector()和dlib.shape_predictor(‘shape_predictor_68_face_landmarks.dat’)。
face_recognition.py
人脸识别，主要使用dlib.get_frontal_face_detector()和dlib.shape_predictor(‘shape_predictor_5_face_landmarks.dat’)和dlib.face_recognition_model_v1(‘dlib_face_recognition_resnet_model_v1.dat’)。
opencv_webcam_face_detection.py
人脸检测的视频使用，主要使用dlib.get_frontal_face_detector()和cv2.VideoCapture()。
train_object_detector.py
人脸正面检测器的训练部分，训练生成detector.svm文件。
train_shape_predictor.py
人脸关键点检测器的训练部分，训练生成predictor.dat文件。

2.2 人脸检测和人脸关键点

2.2.1 数据集、代码准备

使用参考代码：examples/face_landmark_detection.py，为了进行人脸准确率统计，将其改写并命名为face_landmark.py，目前只能统计图片中含单个人脸的准确率（每张图片含多个人脸难以统计总的准确率）。

需要的模型文件：shape_predictor_68_face_landmarks.dat是训练好的人脸关键点检测器。

待测图像数据集：LFW数据集。

face_landmark.py代码如下：

import os
import dlib
from skimage import io

# 待测人脸数据集
faces_folder_path = "lfwdata"

# 第一步，人脸检测器和人脸关键点检测器加载
# 人脸检测器
detector = dlib.get_frontal_face_detector()
# 人脸关键点检测器
predictor = dlib.shape_predictor("../shape_predictor_68_face_landmarks.dat")

# 第二步，遍历图片，使用人脸检测器和人脸关键点检测器，并显示
# 窗口
win = dlib.image_window()
# 统计检测正确数
tol = ans = 0
# 遍历文件夹中的jpg图片
for (path, dirnames, filenames) in os.walk(faces_folder_path):
    for filename in filenames:
        if filename.endswith('.jpg') or filename.endswith('.png'):
            tol += 1
            img_path = path + '/' + filename
            print("Processing file: {}".format(img_path))
            # 读取图片
            img = io.imread(img_path)

            win.clear_overlay()
            win.set_image(img)

            # 人脸检测器的使用
            dets = detector(img, 1)
            # 统计每张图片人脸个数>0判断是否检测成功
            face_num = len(dets)
            print("Number of faces detected: {}".format(len(dets)))
            if face_num > 0:
                ans += 1
            else:
                print("fail")
            for k, d in enumerate(dets):
                print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
                    k, d.left(), d.top(), d.right(), d.bottom()))
                # 人脸关键点检测器的使用
                shape = predictor(img, d)
                print("Part 0: {}, Part 1: {} ...".format(shape.part(0), shape.part(1)))

                win.add_overlay(shape)

            win.add_overlay(dets)
            # 鼠标控制下一张
            # dlib.hit_enter_to_continue()

# 第三步，计算准确率
# 打印准确率
print("correct:{},total{}".format(ans, tol))
print("correct:{}".format(ans/tol))

2.2.2 测试效果图

2.2.3 准确率

检测总共13234张图片，检测到有人脸的有13172张照片，准确率为：99.53%。

测试失败的图像中，人像多为半脸、侧脸、曝光或有遮挡。这与代码中使用的是正脸检测器dlib.get_frontal_face_detector()有很大关系，检测失败的部分图片如下：

2.3 人脸识别

2.3.1 数据集、代码准备

使用参考代码：face_recognition.py，为了进行准确率统计，将其改写并命名为face_recog.py。

需要的模型文件：shape_predictor_68_face_landmarks.dat是训练好的人脸关键点检测器。dlib_face_recognition_resnet_model_v1.dat是训练好的ResNet人脸识别模型。

数据集：lfw数据集挑选候选人脸398张正脸（每人一张图片），待测人脸525张正脸（每个人可能含有多张图片）。

face_recog.py代码如下：

import os
import dlib
import glob
import numpy
from skimage import io

# 训练人脸文件夹
faces_folder_path = "recog_train"
# 待测人脸文件夹
img_folder_path = "recog_test"


# 第二步，生成训练人脸标签和描述子，供人脸识别使用
# 对文件夹下的每一个人脸进行:
# 1.人脸检测
# 2.关键点检测
# 3.描述子提取
# 训练人脸标签和描述子list
def train(faces_folder_path):
    trainlabel = []
    train_descriptors = []
    for file in glob.glob(os.path.join(faces_folder_path, "*.jpg")):
        labelName = file.split('_0')[0].split('\\')[1]
        trainlabel.append(labelName)
        # print("Processing file: {}".format(labelName))

        face = io.imread(file)
        # 1.人脸检测
        dets = detector(face, 1)
        # print("Number of faces detected: {}".format(len(dets)))

        for k, d in enumerate(dets):
            # 2.关键点检测
            shape = predictor(face, d)

            # 3.描述子提取，128D向量
            face_descriptor = facerec.compute_face_descriptor(face, shape)
            # 转换为numpy array
            face_vector = numpy.array(face_descriptor)
            train_descriptors.append(face_vector)
    return trainlabel, train_descriptors
        
# 第三步，识别待测人脸是哪个人
def recognition(trainlabel, train_descriptors):
    ans_right = 0
    ans_wrong = 0
    # 对需识别人脸进行同样处理
    for file in glob.glob(os.path.join(img_folder_path, "*.jpg")):
        img = io.imread(file)
        # 人脸检测
        dets = detector(img, 1)

        # 待测人脸与所有训练人脸的距离
        dists = []
        for k, d in enumerate(dets):
            # 关键点检测
            shape = predictor(img, d)
            # 提取描述子
            test_descriptor = facerec.compute_face_descriptor(img, shape)
            d_test = numpy.array(test_descriptor)

            # 计算欧式距离
            for d_train in train_descriptors:
                dist = numpy.linalg.norm(d_train-d_test)
                dists.append(dist)
        # 待测人脸和所有训练人脸的标签、距离组成一个dict
        c_d = dict(zip(trainlabel, dists))
        cd_sorted = sorted(c_d.items(), key=lambda d:d[1])

        nametest = file.split('_0')[0].split('\\')[1]
        print(cd_sorted[0][1])
        # 设置阈值判断是哪个人
        if cd_sorted[0][1] < 0.6:
            namepredict = cd_sorted[0][0]
        else:
            namepredict = "Unknown"
        print(nametest, namepredict)

        # 判断识别是否正确识别
        if(namepredict == nametest) or (namepredict == "Unknown" and nametest not in trainlabel):
           print("right")
           ans_right += 1
        else:
            print("wrong")
            ans_wrong += 1
        # dlib.hit_enter_to_continue()
    print("total:", ans_right + ans_wrong, "\nright:", ans_right, "\nwrong:", ans_wrong)

if  __name__ == '__main__':
    # 第一步，三种检测器的加载
    # 1.加载正脸检测器
    detector = dlib.get_frontal_face_detector()
    # 2.加载人脸关键点检测器
    predictor = dlib.shape_predictor("../shape_predictor_68_face_landmarks.dat")
    # 3. 加载人脸识别模型
    facerec = dlib.face_recognition_model_v1("../dlib_face_recognition_resnet_model_v1.dat")

    # 第二步，生成训练人脸标签和描述子，供人脸识别使用
    trainlabel, train_descriptors = train(faces_folder_path)

    # 第三步，识别待测人脸是哪个人并统计正确率
    recognition(trainlabel, train_descriptors)

2.3.2 人脸识别步骤

首先，先将候选人脸文件夹中的人脸进行：
1.人脸检测
2.关键点检测，画出人脸区域和和关键点
3.描述子提取，128D向量，转换为numpy array
4.将候选人图像的文件名提取出来，作为候选人名单

然后，对待测人脸进行同样的处理：
1.人脸检测，关键点检测，描述子提取
2.计算待测人脸描述子和候选人脸描述子之间的欧氏距离
3.将所有候选人与待测人脸描述子的距离组成一个dict
4.排序
5.距离最小者且阈值小于0.6，判定为同一个人

2.3.3 准确率

检测的525张图片中，有503张检测成功，准确率为：503/525=95.81%。

2.4 视频中的人脸检测、人脸识别

2.4.1 摄像头读入检测时间测试

代码命名为face_detector_video.py。代码如下：

import cv2
import dlib
import time

# 初始化dlib人脸检测器
detector = dlib.get_frontal_face_detector()

# 初始化显示窗口
win = dlib.image_window()
# opencv加载视频文件
# cap = cv2.VideoCapture(r'../test.mp4')
cap = cv2.VideoCapture(0) #加载摄像头

while True:
    start = time.time()
    ret, cv_img = cap.read()
    if cv_img is None:
        break

    # 缩小图像至1/4
    cv_img = cv2.resize(cv_img, (0, 0), fx=0.25, fy=0.25)

    # OpenCV默认是读取为RGB图像，而dlib需要的是BGR图像，因此这一步转换不能少
    img = cv2.cvtColor(cv_img, cv2.COLOR_RGB2BGR)

    # 检测人脸
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))

    for i, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            i, d.left(), d.top(), d.right(), d.bottom()))

    print(time.time() - start)
    win.clear_overlay()
    win.set_image(img)
    win.add_overlay(dets)

cap.release()

dlib的人脸检测精度比OpenCV自带的高很多，因此本文采用dlib的人脸检测器。从摄像头读入数据，结合OpenCV将视频流截成图像帧，使用正脸检测器dlib.get_frontal_face_detector()进行检测。

测试效果图：

测试时的输出：

测试速度：
0.09s~0.11s/帧。

2.4.2 mp4文件读入检测时间测试

将2.4.1节的代码中加载摄像头语句更改为加载mp4文件。然后同样将视频截成图像，使用正脸检测器dlib.get_frontal_face_detector()进行检测。
测试效果图：

测试时的输出：

测试速度：
0.09s~0.11s/帧。

注意：视频文件中的人脸检测的速度跟文件的大小（帧高、帧宽）有很大关系。

2.4.3 视频中的人脸识别

分别使用dlib中的人脸识别功能，代码命名为face_recogn_video.py；和dlib二次开发包face_recognition中的人脸识别功能，代码命名为face_recognition_video.py。

face_recogn_video.py代码如下：

import dlib
import numpy as np
import cv2
import json
import os
import glob

# 候选人数据集
faces_folder_path = r'../train_person'
video_path = r'../test.mp4'

# 获取训练集标签和人脸识别描述子
def train(faces_folder_path):
    trainlabel = []
    train_descriptors = []
    for file in glob.glob(os.path.join(faces_folder_path, "*.jpg")):
        labelName = file.split('.jpg')[0].split('\\')[1]
        trainlabel.append(labelName)
        print("Processing file: {}".format(labelName))

        face = cv2.imread(file)
        # 1.人脸检测
        dets = detector(face, 1)
        # print("Number of faces detected: {}".format(len(dets)))

        for k, d in enumerate(dets):
            # 2.关键点检测
            shape = predictor(face, d)

            # 3.描述子提取，128D向量
            face_descriptor = facerec.compute_face_descriptor(face, shape)
            # 转换为numpy array
            face_vector = np.array(face_descriptor)
            train_descriptors.append(face_vector)
    return trainlabel, train_descriptors

# 识别确定哪个人
def findNearestClassForImage(face_descriptor, trainlabel, train_descriptors):
    train_descriptors = np.array(train_descriptors)
    dist = np.linalg.norm(face_descriptor - train_descriptors, axis=1, keepdims=True)
    min_distance = dist.min()
    print('distance: ', min_distance)
    if min_distance > threshold:
        return 'Unknown'
    index = np.argmin(dist)
    return trainlabel[index]

# 人脸识别
def recognition(img, trainlabel, train_descriptors):
    # 人脸检测
    dets = detector(img, 1)
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))
        # 人脸关键点检测器
        shape = predictor(img, d)
        # 人脸识别描述子
        face_descriptor = facerec.compute_face_descriptor(img, shape)

        # 识别确定哪个人
        class_pre = findNearestClassForImage(face_descriptor, trainlabel, train_descriptors)
        print(class_pre)
        cv2.rectangle(img, (d.left(), d.top() + 10), (d.right(), d.bottom()), (0, 255, 0), 2)
        cv2.putText(img, class_pre, (d.left(), d.top()), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2, cv2.LINE_AA)

    cv2.imshow('image', img)

if  __name__ == '__main__':
    # 加载网络模型
    detector = dlib.get_frontal_face_detector()
    predictor = dlib.shape_predictor('../shape_predictor_68_face_landmarks.dat')
    facerec = dlib.face_recognition_model_v1('../dlib_face_recognition_resnet_model_v1.dat')
    # 设置识别阈值
    threshold = 0.6

    # 训练标签及人脸识别描述子
    trainlabel, train_descriptors = train(faces_folder_path)
    # cap = cv2.VideoCapture(0)
    cap = cv2.VideoCapture(video_path)
    # 保存视频
    # fps = 10
    # size = (640, 480)
    # fourcc = cv2.VideoWriter_fourcc(*'XVID')
    # videoWriter = cv2.VideoWriter('video.MP4', fourcc, fps, size)

    while (1):
        ret, frame = cap.read()
        # 缩小图像至1/4
        frame = cv2.resize(frame, (0,0), fx=0.25, fy=0.25)

        # 人脸识别
        recognition(frame, trainlabel, train_descriptors)
        # videoWriter.write(frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    videoWriter.release()
cv2.destroyAllWindows()

	face_recognition_video.py代码如下：
import face_recognition
import cv2
import os
import glob

# 视频路径和已知人脸文件夹
video_path = r'../test.mp4'
faces_folder_path = '../train_person'


# 读取训练集人脸姓名和人脸识别编码
def train(faces_folder_path):
    known_face_names = []
    known_face_encodings = []
    for file in glob.glob(os.path.join(faces_folder_path, "*.jpg")):
        labelName = file.split('.jpg')[0].split('\\')[1]
        known_face_names.append(labelName)

        image = face_recognition.load_image_file(file)
        face_encoding = face_recognition.face_encodings(image)[0]
        known_face_encodings.append(face_encoding)
    return known_face_names, known_face_encodings

def recognition(rgb_small_frame, known_face_names, known_face_encodings):
    # 根据encoding来判断是不是同一个人，是就输出true，不是为flase
    face_locations = face_recognition.face_locations(rgb_small_frame)
    face_encodings = face_recognition.face_encodings(rgb_small_frame, face_locations)

    face_names = []
    for face_encoding in face_encodings:
        # 默认为unknown
        matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
        name = "Unknown"

        if True in matches:
            first_match_index = matches.index(True)
            name = known_face_names[first_match_index]
        face_names.append(name)
    return face_locations, face_names

def main():
    face_locations = []
    face_names = []

    # 设置显示窗口
    wnd = 'OpenCV Video'
    cv2.namedWindow(wnd, flags=0)
    cv2.resizeWindow(wnd, 1920, 1080)

    known_face_names, known_face_encodings = train(faces_folder_path)

    # 读取视频
    # video_capture = cv2.VideoCapture(0)
    video_capture = cv2.VideoCapture(video_path)
    # 隔几帧显示
    process_this_frame = 0
    while True:
        # 读取摄像头画面
        ret, frame = video_capture.read()
        # 改变摄像头图像的大小，图像小，所做的计算就少
        small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)
        # opencv的图像是BGR格式的，而我们需要是的RGB格式的，因此需要进行一个转换。
        rgb_small_frame = small_frame[:, :, ::-1]

        process_this_frame += 1
        if process_this_frame % 5 == 0:
            # 位置，姓名
            face_locations, face_names = recognition(rgb_small_frame, known_face_names, known_face_encodings)

        # 将捕捉到的人脸显示出来
        for (top, right, bottom, left), name in zip(face_locations, face_names):
            # 放大至真实值
            top *= 4
            right *= 4
            bottom *= 4
            left *= 4

            # 矩形框
            cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

            #加上标签
            cv2.rectangle(frame, (left, bottom - 35), (right, bottom), (0, 0, 255), cv2.FILLED)
            font = cv2.FONT_HERSHEY_DUPLEX
            cv2.putText(frame, name, (left + 6, bottom - 6), font, 1.0, (255, 255, 255), 1)

        # 显示
        cv2.imshow(wnd, frame)

        # 按Q退出
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    video_capture.release()
    cv2.destroyAllWindows()

if  __name__ == '__main__':
main()

测试效果图：

测试结果：
face_recognition中的人脸识别功能比dlib中的人脸识别功能识别速度较快。

2.4.4 优化部分

在face_recognition中的人脸识别功能代码中，加入了两点优化：
1、识别时缩小图像至1/4，显示时扩大至图像原大小；
2、每5帧进行一次人脸识别。
最后达到的人脸识别速度接近实时识别（即接近达到正常播放视频的速度）。

三 python训练自己的模型

用python_examples的示例代码训练自己的模型较简单，为了加快训练，将在linux服务器上运行python_examples的示例代码。关于linux服务器dlib库的安装参考1.3节。此处采用的方法是pip3 install dib。

3.1 数据集标注

3.1.1 imglab简介

imglab是dlib提供用来制作数据集的工具，通过给图片打标签，最后会生成一个xml文件。

3.1.2 imglab使用方法

在dlib官方源码中提供了这个工具，文件路径为：tools/imglab。
使用前要先安装好cmake。

使用步骤：

打开cmd
进入tools/imglab目录
新建一个build文件夹，进入build
输入：cmake …
输入：cmake --build . --config Release
进入Release
新建一个image文件夹，将训练集所有图片复制进去
在Release目录下，输入：imglab -c mydataset.xml image，将会创建一个mydataset.xml文件
输入：imglab mydataset.xml

出现imglab标注软件了，可以自己进行标注了。

标注方法如下：

按Shift+左键进行画框。先松开左键，框就画上去了；先松开Shift键，则取消画人脸框。
对框双击左键，按delete键可删除。
对框双击左键，按i键可将物体标注为ignore，即是不明物体，进行忽略。
按e键，会曝光图片，效果如下。
按Ctrl键加滚轮，可以缩放图片加标签。
双击选中框后，按shift+左键可画关键点。
画完人脸框和关键点之后，点filesave保存，然后exit退出，就可以在mydataset.xml文件中看到人脸检测的数据集了。

3.2 训练自己的人脸关键点检测器

3.2.1 数据集

使用imglab工具，给训练的图片和测试的图片标注人脸框和关键点（5个关键点：眼睛、鼻子、嘴巴），训练图片7张，测试图片5张。生成标注文件train_landmarks.xml和test_landmarks.xml。目录如下，train、test文件夹中存放训练、测试图片。

3.2.2 训练部分

训练代码参考python_examples/train_shape_predictor.py，如下：

import os
import sys
import glob
import dlib

options = dlib.shape_predictor_training_options()
# Now make the object responsible for training the model.
# This algorithm has a bunch of parameters you can mess with.  The
# documentation for the shape_predictor_trainer explains all of them.
# You should also read Kazemi's paper which explains all the parameters
# in great detail.  However, here I'm just setting three of them
# differently than their default values.  I'm doing this because we
# have a very small dataset.  In particular, setting the oversampling
# to a high amount (300) effectively boosts the training set size, so
# that helps this example.
options.oversampling_amount = 300
# I'm also reducing the capacity of the model by explicitly increasing
# the regularization (making nu smaller) and by using trees with
# smaller depths.
options.nu = 0.05
options.tree_depth = 2
options.be_verbose = True

# dlib.train_shape_predictor() does the actual training.  It will save the
# final predictor to predictor.dat.  The input is an XML file that lists the
# images in the training dataset and also contains the positions of the face
# parts.
training_xml_path = ' /home/users/chenzhuo/program/dlib-19-15/python_test/mytest/train_landmarks.xml '
dlib.train_shape_predictor(training_xml_path, "predictor.dat", options)

# Now that we have a model we can test it.  dlib.test_shape_predictor()
# measures the average distance between a face landmark output by the
# shape_predictor and where it should be according to the truth data.
print("\nTraining accuracy: {}".format(
    dlib.test_shape_predictor(training_xml_path, "predictor.dat")))
# The real test is to see how well it does on data it wasn't trained on.  We
# trained it on a very small dataset so the accuracy is not extremely high, but
# it's still doing quite good.  Moreover, if you train it on one of the large
# face landmarking datasets you will obtain state-of-the-art results, as shown
# in the Kazemi paper.
testing_xml_path = ‘/home/users/chenzhuo/program/dlib-19-15/python_test/mytest/test_landmarks.xml’
print("Testing accuracy: {}".format(
dlib.test_shape_predictor(testing_xml_path, "predictor.dat")))

将上述代码命名为shape_predictor_train.py，将代码中training_xml_path改为自己的数据集xml文件路径，进入.py文件所在目录，执行

# python3 shape_predictor_train.py

3.2.3 测试部分

测试代码训练代码参考python_examples/shape_predictor_test.py，如下：

import os
import sys
import glob
import cv2
import dlib

if len(sys.argv) != 2:
    print(
        "Give the path to the examples/faces directory as the argument to this "
        "program. For example, if you are in the python_examples folder then "
        "execute this program by running:\n"
        "    ./train_shape_predictor.py ../examples/faces")
    exit()
faces_folder = sys.argv[1]

# Now let's use it as you would in a normal application.  First we will load it
# from disk. We also need to load a face detector to provide the initial
# estimate of the facial location.
predictor = dlib.shape_predictor("predictor.dat")
detector = dlib.get_frontal_face_detector()

# Now let's run the detector and shape_predictor over the images in the faces
# folder and display the results.
print("Showing detections and predictions on the images in the faces folder...")
win = dlib.image_window()
for f in glob.glob(os.path.join(faces_folder, "*.jpg")):
    print("Processing file: {}".format(f))
    # img = dlib.load_rgb_image(f)
    img = cv2.imread(f)

    win.clear_overlay()
    win.set_image(img)

    # Ask the detector to find the bounding boxes of each face. The 1 in the
    # second argument indicates that we should upsample the image 1 time. This
    # will make everything bigger and allow us to detect more faces.
    dets = detector(img, 1)
    print("Number of faces detected: {}".format(len(dets)))
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))
        # Get the landmarks/parts for the face in box d.
        shape = predictor(img, d)
        print("Part 0: {}, Part 1: {} ...".format(shape.part(0),
                                                  shape.part(1)))
        # Draw the face landmarks on the screen.
        win.add_overlay(shape)

    win.add_overlay(dets)
    dlib.hit_enter_to_continue()

将上述代码命名为shape_predictor_test.py，打开VNC客户端，进入.py文件所在目录，执行

# python3 shape_predictor_test.py /home/users/chenzhuo/program/dlib-19-15/examples/faces

3.2.4 优化部分

训练时可以用多姿态的训练数据，比如正脸、左侧脸、右侧脸的标注数据集进行训练。

3.3 训练自己的人脸检测器

3.3.1 数据集

# wget http://dlib.net/files/data/dlib_face_detector_training_data.tar.gz

这是dlib训练使用的数据集，里面有数千张人脸的标注数据集，此处仅使用frontal_faces.xml，如下图。

3.3.2 训练部分

训练代码参考python_examples/train_object_detection.py，如下：

import os
import sys
import glob
import dlib

# Now let's do the training.  The train_simple_object_detector() function has a
# bunch of options, all of which come with reasonable default values.  The next
# few lines goes over some of these options.
# 超参数
options = dlib.simple_object_detector_training_options()
# Since faces are left/right symmetric we can tell the trainer to train a
# symmetric detector.  This helps it get the most value out of the training
# data.
# 对称检测器
options.add_left_right_image_flips = True
# The trainer is a kind of support vector machine and therefore has the usual
# SVM C parameter.  In general, a bigger C encourages it to fit the training
# data better but might lead to overfitting.  You must find the best C value
# empirically by checking how well the trained detector works on a test set of
# images you haven't trained on.  Don't just leave the value set at 5.  Try a
# few different C values and see what works best for your data.
options.C = 5
# Tell the code how many CPU cores your computer has for the fastest training.
options.num_threads = 4
options.be_verbose = True

training_xml_path = '/home/users/chenzhuo/program/dlib-19-15/python_test/dlib_face_detector_training_data/frontal_faces.xml'
# testing_xml_path = '/home/users/chenzhuo/program/dlib-19-15/python_test/cats/cats_test/cat_test.xml'
# This function does the actual training.  It will save the final detector to
# detector.svm.  The input is an XML file that lists the images in the training
# dataset and also contains the positions of the face boxes.  To create your
# own XML files you can use the imglab tool which can be found in the
# tools/imglab folder.  It is a simple graphical tool for labeling objects in
# images with boxes.  To see how to use it read the tools/imglab/README.txt
# file.  But for this example, we just use the training.xml file included with
# dlib.
dlib.train_simple_object_detector(training_xml_path, "detector.svm", options)

# Now that we have a face detector we can test it.  The first statement tests
# it on the training data.  It will print(the precision, recall, and then)
# average precision.
print("")  # Print blank line to create gap from previous output
print("Training accuracy: {}".format(
    dlib.test_simple_object_detector(training_xml_path, "detector.svm")))
# However, to get an idea if it really worked without overfitting we need to
# run it on images it wasn't trained on.  The next line does this.  Happily, we
# see that the object detector works perfectly on the testing images.
# print("Testing accuracy: {}".format(
#    dlib.test_simple_object_detector(testing_xml_path, "detector.svm")))

将上述代码命名为object_detection_train.py，将代码中training_xml_path改为自己的数据集xml文件路径，进入.py文件所在目录，执行

# python3 object_detection_train.py

3.3.3 测试部分

测试代码训练代码参考python_examples/train_object_detection.py，如下：

import os
import sys
import glob
import dlib
import cv2

if len(sys.argv) != 2:
    print(
        "Give the path to the examples/faces directory as the argument to this "
        "program. For example, if you are in the python_examples folder then "
        "execute this program by running:\n"
        "    ./train_object_detector.py ../examples/faces")
    exit()
faces_folder = sys.argv[1]

# Now let's use the detector as you would in a normal application.  First we
# will load it from disk.
detector = dlib.simple_object_detector("detector.svm")

# We can look at the HOG filter we learned.  It should look like a face.  Neat!
win_det = dlib.image_window()
win_det.set_image(detector)

# Now let's run the detector over the images in the faces folder and display the
# results.
print("Showing detections on the images in the faces folder...")
win = dlib.image_window()
for f in glob.glob(os.path.join(faces_folder, "*.jpg")):
    print("Processing file: {}".format(f))
    # img = dlib.load_rgb_image(f)
    img = cv2.imread(f)
    dets = detector(img)
    print("Number of faces detected: {}".format(len(dets)))
    for k, d in enumerate(dets):
        print("Detection {}: Left: {} Top: {} Right: {} Bottom: {}".format(
            k, d.left(), d.top(), d.right(), d.bottom()))

    win.clear_overlay()
    win.set_image(img)
    win.add_overlay(dets)
    dlib.hit_enter_to_continue()

将上述代码命名为object_detection_test.py，打开VNC客户端，进入.py文件所在目录，执行

# python3 object_detection_test.py /home/users/chenzhuo/program/dlib-19-15/examples/faces

3.3.4 优化部分

目前训练好的人脸检测器为正脸检测器，对侧脸的检测效果较差。
为了提高人脸检测的准确性，可以训练多个人脸检测器进行人脸预测，比如训练正脸检测器、左侧脸检测器、右侧脸检测器等多个检测器进行组合，使用关键操作如下：

image = dlib.load_rgb_image(faces_folder + '/2008_002506.jpg')
detector1 = dlib.fhog_object_detector("detector.svm")
detector2 = dlib.fhog_object_detector("detector.svm")
detectors = [detector1, detector2]
 [boxes, confidences, detector_idxs] = dlib.fhog_object_detector.run_multiple (detectors, image, upsample_num_times=1, adjust_threshold=0.0)
for i in range(len(boxes)):
    print("detector {} found box {} with confidence {}.".format(detector_idxs[i], boxes[i], confidences[i]))

3.4 总结

从上面的训练操作流程看，dlib库不仅可以做人脸检测、识别，还可以做其他物体的检测、识别等功能。

四 C++训练自己的模型

4.1 训练自己的人脸关键点检测器

每一个代码的程序配置参见1.2.2节。选择在Release模式下进行项目配置并运行，加快运行速度。

4.1.1 数据集

4.1.2 训练部分

使用examples/train_shape_predictor_ex.cpp代码进行项目配置后，命令参数中输入标注xml文件所在的目录，点击调试-开始执行，进行模型的训练，生成模型文件sp.dat。

load_image_dataset(images_train, face_boxes_train,faces_directory+"\\***.xml");
load_image_dataset(images_test,face_boxes_test, faces_directory+"\\***.xml");

测试误差：

4.1.3 测试

使用examples/face_landmark_detection_ex.cpp代码进行项目配置后，在命令参数中输入生成的模型文件sp.dat的路径和待检测的图片路径，点击调试-开始执行，测试结果如下：

4.1.4 优化部分

训练时可以用多姿态的训练数据，比如正脸、左侧脸、右侧脸的标注数据集进行训练。

4.2 训练自己的人脸检测器

4.2.1 数据集

使用imglab工具，给训练的图片和测试的图片标注人脸框，训练图片7张，测试图片5张。生成标注文件train.xml和test.xml。

4.2.2 训练部分

使用examples/ fhog_object_detector_ex.cpp代码进行项目配置后，在代码里修改以下语句，将自己标注的xml文件名写入代码相应位置中。

load_image_dataset(images_train, face_boxes_train,faces_directory+"\\***.xml");
load_image_dataset(images_test,face_boxes_test, faces_directory+"\\***.xml");

点击调试-开始执行，训练效果图如下，结果会生成face_predictor.svm模型文件：

4.2.3 测试

示例中没提供测试代码，该部分为自写代码，命名为face_object_detection：

/*
人脸检测器测试
*/

#include 
#include 
#include 
#include 

#include 
#include 


using namespace std;
using namespace dlib;

// ----------------------------------------------------------------------------------------

int main(int argc, char** argv)
{

	try
	{
		// In this example we are going to train a face detector based on the
		// small faces dataset in the examples/faces directory.  So the first
		// thing we do is load that dataset.  This means you need to supply the
		// path to this faces folder as a command line argument so we will know
		// where it is.
		if (argc == 1)
		{
			cout << "Call this program like this:" << endl;
			cout << "./face_detector.svm faces/*.jpg" << endl;
			return 0;
		}

		//定义scanner类型，用于扫描图片并提取特征（HOG）
		typedef scan_fhog_pyramid > image_scanner_type;
		// 加载模型
		object_detector detector;
		deserialize(argv[1]) >> detector;

		//显示hog
		image_window hogwin(draw_fhog(detector), "Learned fHOG detector");

		// 显示测试集的人脸检测结果
		image_window win;
		// Loop over all the images provided on the command line.
		for (int i = 2; i < argc; ++i)
		{
			cout << "processing image " << argv[i] << endl;
			array2d img;
			// 读取图片数据
			load_image(img, argv[i]);
			// Make the image larger so we can detect small faces.
			pyramid_up(img);

			// Now tell the face detector to give us a list of bounding boxes
			// around all the faces in the image.
			// 人脸预测
			std::vector dets = detector(img);
			cout << "Number of faces detected: " << dets.size() << endl;
			win.clear_overlay();
			win.set_image(img);
			win.add_overlay(dets, rgb_pixel(255, 0, 0));
			cout << "Hit enter to process the next image..." << endl;
			cin.get();
		} 

	}
	catch (exception& e)
	{
		cout << "\nexception thrown!" << endl;
		cout << e.what() << endl;
	}
	system("pause");
}

在命令参数中输入生成的模型文件face_predictor.svm的路径和待检测的图片路径，点击调试-开始执行，测试结果如下：

4.2.4 优化部分

目前训练好的人脸检测器为正脸检测器，对侧脸的检测效果较差：

frontal_face_detector detector = get_frontal_face_detector();

训练多个人脸检测器进行人脸预测，比如训练正脸检测器、左侧脸检测器、右侧脸检测器等多个检测器进行组合，使用关键操作如下：

std::vector > my_detectors;
my_detectors.push_back(detector);
std::vector dets = evaluate_detectors(my_detectors, image);

你可能感兴趣的:(深度学习,人脸识别)

pytorch基础-layernormal 与 batchnormal yuweififi pytorch 人工智能 python
nn.LayerNorm（层归一化）和nn.BatchNorm（批量归一化）是深度学习中常用的两种归一化方法，都有助于提高模型的训练效率和稳定性，但它们在归一化维度、应用场景、计算方式等方面存在明显区别，以下为你详细介绍：1、归一化维度nn.LayerNorm：对单个样本的特征维度进行归一化。无论输入数据的形状如何，它会计算每个样本在特征维度上的均值和方差，然后进行归一化。例如，对于一个形状为(b
通过TensorFlow实现简单深度学习模型（2） yyc_audio 人工智能深度学习 python 机器学习
前文我们已经实现了对每批数据的训练，下面继续实现一轮完整的训练。完整的训练循环一轮训练就是对训练数据的每个批量都重复上述训练步骤，而完整的训练循环就是重复多轮训练。deffit(model,images,labels,epochs,batch_size=128):forepoch_counterinrange(epochs):print(f"Epoch{epoch_counter}")batch_
阿里巴巴DIN模型原理与Python实现 eso1983 python 开发语言算法推荐算法
阿里巴巴的DeepInterestNetwork(DIN)是一种用于点击率预测（CTR）的深度学习模型，特别针对电商场景中用户兴趣多样化和动态变化的特性设计。其核心思想是通过注意力机制动态捕捉用户历史行为中与当前候选商品相关的兴趣。1.DIN模型原理1.核心问题传统推荐模型（如Embedding+MLP）将用户历史行为视为固定长度的向量，忽略了用户兴趣的多样性。例如，用户历史行为中可能包含多个互不
月之暗面改进并开源了 Muon 优化算法，对行业有哪些影响？互联网之路. 知识点开源算法
互联网各领域资料分享专区(不定期更新)：Sheet正文月之暗面团队改进并开源的Muon优化算法在深度学习和大模型训练领域引发了广泛关注，其核心创新在于显著降低算力需求（相比AdamW减少48%的FLOPs）并提升训练效率，同时通过开源推动技术生态的共建。1.显著降低大模型训练成本，推动技术普惠算力需求锐减：Muon通过引入权重衰减和一致的RMS更新，解决了原始Muon在大规模训练中的稳定性问题，使
Spring Boot 动态配置数据源全解析 ♢.＊ spring boot 后端 java
亲爱的小伙伴们，在求知的漫漫旅途中，若你对深度学习的奥秘、Java与Python的奇妙世界，亦或是读研论文的撰写攻略有所探寻，那不妨给我一个小小的关注吧。我会精心筹备，在未来的日子里不定期地为大家呈上这些领域的知识宝藏与实用经验分享。每一个点赞，都如同春日里的一缕阳光，给予我满满的动力与温暖，让我们在学习成长的道路上相伴而行，共同进步✨。期待你的关注与点赞哟！引言在企业级应用开发中，单一数据源往往
深入解析：如何编写 Mapper 文件 ♢.＊ oracle 数据库 mybatis
亲爱的小伙伴们，在求知的漫漫旅途中，若你对深度学习的奥秘、Java与Python的奇妙世界，亦或是读研论文的撰写攻略有所探寻，那不妨给我一个小小的关注吧。我会精心筹备，在未来的日子里不定期地为大家呈上这些领域的知识宝藏与实用经验分享。每一个点赞，都如同春日里的一缕阳光，给予我满满的动力与温暖，让我们在学习成长的道路上相伴而行，共同进步✨。期待你的关注与点赞哟！在软件开发尤其是涉及数据库交互的项目中
OpenCV开源机器视觉软件视觉人机器视觉杂说 opencv 开源人工智能
OpenCV（OpenSourceComputerVisionLibrary）是一个开源的计算机视觉和机器学习软件库，广泛应用于实时图像处理、视频分析、物体检测、人脸识别等领域。它由英特尔实验室于1999年发起，现已成为计算机视觉领域最流行的工具之一，支持多种编程语言（如C++、Python、Java）和操作系统（Windows、Linux、macOS、Android、iOS）。核心功能图像处理基
Spring Boot 中 @Transactional 注解全面解析 ♢.＊ spring boot 数据库 sql
亲爱的小伙伴们，在求知的漫漫旅途中，若你对深度学习的奥秘、Java与Python的奇妙世界，亦或是读研论文的撰写攻略有所探寻，那不妨给我一个小小的关注吧。我会精心筹备，在未来的日子里不定期地为大家呈上这些领域的知识宝藏与实用经验分享。每一个点赞，都如同春日里的一缕阳光，给予我满满的动力与温暖，让我们在学习成长的道路上相伴而行，共同进步✨。期待你的关注与点赞哟！引言在企业级应用开发中，数据的一致性和
深度学习模型优化与医疗诊断应用突破智能计算研究中心其他
内容概要近年来，深度学习技术的迭代演进正在重塑医疗诊断领域的实践范式。随着PyTorch与TensorFlow等开源框架的持续优化，模型开发效率显著提升，为医疗场景下的复杂数据处理提供了技术基座。当前研究聚焦于迁移学习与模型压缩算法的协同创新，通过复用预训练模型的泛化能力与降低计算负载，有效解决了医疗数据样本稀缺与硬件资源受限的痛点问题。与此同时，自适应学习机制通过动态调整网络参数更新策略，在病理
【openCV-89】人脸检测华东算法王华东算法王-opencv opencv 人工智能计算机视觉
人脸检测简介人脸检测是计算机视觉中的一个重要任务，旨在从图像或视频中识别并定位出人脸的位置。人脸检测不仅是人脸识别、表情分析、面部特征点检测等高级任务的前置步骤，而且在安防监控、智能家居、自动驾驶等多个领域都具有广泛应用。人脸检测的目标人脸检测的目标是从输入的图像或视频流中自动检测出所有人脸的区域，通常用矩形框（boundingbox）表示人脸的位置。人脸检测不仅要识别图像中的人脸，还要在各种条件
阿里云服务器的作用腾云服务器阿里云服务器云计算
使用阿里云服务器能做什么？大家都知道可以用来搭建网站、数据库、机器学习、Python爬虫、大数据分析等应用，阿里云服务器网来详细说下使用阿里云服务器常见的玩法以及企业或个人用户常见的使用场景：玩转阿里云服务器使用阿里云服务器最常见的应用就是用来搭建网站，例如个人博客、企业网站等；除了搭建网站还可以利用阿里云GPU服务器搭建机器学习和深度学习等AI应用；使用阿里云大数据类型云服务器做数据分析；利用云
阿里云人工智能与机器学习 HaoHao_010 阿里云云服务器云计算服务器
阿里云的人工智能（AI）与机器学习（ML）服务为企业提供了全面的AI解决方案，帮助用户在多个行业实现数据智能化，提升决策效率，推动业务创新。阿里云通过先进的技术和丰富的工具，支持用户开发、部署和管理AI应用。以下是阿里云在人工智能和机器学习方面的主要产品与服务：1.云上机器学习平台—PaaS服务PAI(PlatformforAI)PAI是阿里云推出的人工智能平台，提供一系列机器学习与深度学习工具和
AI探索笔记：浅谈人工智能算法分类安意诚Matrix 机器学习笔记人工智能笔记
人工智能算法分类这是一张经典的图片，基本概况了人工智能算法的现状。这张图片通过三个同心圆展示了人工智能、机器学习和深度学习之间的包含关系，其中人工智能是最广泛的范畴，机器学习是其子集，专注于数据驱动的算法改进，而深度学习则是机器学习中利用多层神经网络进行学习的特定方法。但是随着时代的发展，这张图片表达得也不是太全面了。我更喜欢把人工智能算法做如下的分类：传统机器学习算法-线性回归、逻辑回归、支持向
VQ-Diffusion 深度解析与实战指南晏灵昀Odette
VQ-Diffusion深度解析与实战指南VQ-Diffusion项目地址:https://gitcode.com/gh_mirrors/vqd/VQ-Diffusion1.项目介绍VQ-Diffusion是一个用于文本到图像合成的深度学习模型，基于矢量量化变分自编码器（VQ-VAE）和去噪扩散概率模型（DenoisingDiffusionProbabilisticModel）。该模型通过将DDP
【模块】AKConv卷积模块 dearr__ 扒网络模块深度学习人工智能
论文《AKConv:ConvolutionalKernelwithArbitrarySampledShapesandArbitraryNumberofParameters》1、作用AKConv旨在解决深度学习中标准卷积操作的两个固有限制：限定在局部窗口内，限制了从其他位置捕获信息的能力；卷积核固定大小，限制了对不同目标形状和大小的适应能力。这种新方法允许卷积核具有任意参数和采样形状，提供了一种灵活
DCMNet一种用于目标检测的轻量级骨干结构模型详解及代码复现清风AI 深度学习算法详解及代码复现深度学习机器学习计算机视觉人工智能算法目标检测
模型背景在深度学习技术快速发展的背景下，目标检测领域取得了显著进展。早期的手工特征提取方法如Viola-Jones和HOG逐渐被卷积神经网络（CNN）取代，其中AlexNet在2012年的ILSVRC比赛中表现突出，推动了CNN在计算机视觉中的广泛应用。然而，这些早期模型在精度和效率方面仍存在不足，尤其是在处理复杂场景和小目标时表现不佳。这为DCMNet等新型轻量化目标检测模型的出现提供了契机，旨
注意力机制（Attention Mechanism）详细分类与介绍 Jason_Orton 分类数据挖掘人工智能
注意力机制（AttentionMechanism）是近年来在深度学习中非常流行的一种技术，特别是在自然语言处理（NLP）、计算机视觉等任务中，具有显著的效果。它的核心思想是模仿人类在处理信息时的注意力分配方式，根据不同部分的重要性给予不同的关注程度。1.注意力机制的背景与动机在传统的深度学习模型（如RNN、CNN等）中，信息处理通常是按照固定的规则和结构进行的，模型对输入的各个部分给予相同的关注。
图神经网络：拓扑数据分析的新时代 Jason_Orton 神经网络数据分析人工智能
随着图数据的广泛应用，图神经网络（GraphNeuralNetwork,GNN）作为一种强大的深度学习工具，逐渐成为机器学习领域中的一颗新星。图数据在许多现实世界问题中无处不在，诸如社交网络、交通网络、分子结构、推荐系统等都可以被建模为图结构。图神经网络通过直接处理图结构数据，能够更好地捕捉节点之间的关系信息，从而在众多任务中展现出了优异的性能。本文将深入探讨图神经网络的基本原理、常见的算法、应用
智算中心的核心硬件是什么？ Imagination官方博客
本文来源：游方AI智算中心，作为人工智能时代的关键基础设施，其核心硬件的构成与性能直接影响着智能计算的效率与质量。以下是对智算中心核心硬件的详细阐述：一、AI芯片AI芯片是专门为加速人工智能计算而设计的硬件，能够与各种AI算法协同工作，满足对算力的极高需求。当前主流的AI加速计算芯片包括：1、GPU（图形处理器）GPU是智算中心的算力担当，其强大的并行计算能力使其在深度学习领域大放异彩。GPU芯片
AI之DeepSeek james二次元 AI 人工智能 AI DeepSeek
DeepSeek是一个开源的基于深度学习的搜索引擎，用于在大规模数据中进行高效的内容检索和相似度搜索。它利用深度学习技术，特别是嵌入（embedding）技术，以改进传统搜索引擎中基于关键词的匹配方式，能够对复杂的查询和内容进行更精确和智能的理解。DeepSeek主要侧重于基于语义的搜索，通过将数据（例如文本、图像、音频等）转换为向量表示，来实现更为精准的相似度搜索。它的应用场景包括但不限于自然语
[AI] [ComfyUI]理解ComyUI的基本原理及其图像生成技术技术小甜甜 AI探索者人工智能 AI作画
ComyUI作为一种图像生成框架，其背后的核心技术基于潜在空间的概念，并通过各种深度学习模块实现高效的图像生成与本地部署。本文将详细探讨ComyUI的基本原理，涵盖其在图像生成中的关键概念，包括潜在空间、VAE模块、噪声处理以及CLIP编码器节点的作用。1.潜在空间的存在与生成效率什么是潜在空间？潜在空间（LatentSpace）是指数据压缩后的低维空间。在图像生成中，潜在空间的引入极大地提高了生
深度学习重要论文阅读笔记 ResNet （2025.2.26）北岛寒沫逐界星辰2025 计算机科研深度学习论文阅读笔记
文章目录问题背景数据预处理神经网络模型模型性能知识点积累英语单词积累问题背景随着神经网络变得更深（层数变多），模型的训练过程也会变得更加困难。当神经网络的深度增加，就会出现梯度消失和梯度下降现象，妨碍模型的收敛。不过，这种情况可以通过归一化的模型初始化和中间的归一化层基本解决。但是，尽管在增加了归一化技术的情况下很深的神经网络可以收敛，又出现了另外一个问题，即随着模型深度的增加，模型的准确率反而下
说话人识别系统原理醉心编码人工智能基础编程基础技术类人工智能说话人识别语音识别
在当今数字化和智能化飞速发展的时代，说话人识别系统作为一项关键的生物识别技术，正逐渐融入我们生活的方方面面。简单来说，说话人识别系统就像是一位“语音侦探”，能够通过分析语音中的独特特征，精准地判断出说话者的身份。它与指纹识别、人脸识别等技术一样，都属于生物识别技术的范畴，但有着独特的优势——仅通过声音即可实现身份识别，无需额外的物理接触或视觉捕捉。与语音识别不同，语音识别关注的是语音内容的转写，比
大模型最新面试题系列：深度学习基础（二）人肉推土机大模型最新面试题集锦大全 AI编程人工智能 pytorch python 面试
21.解释模型容量与过拟合的关系，如何在理论上平衡两者？模型容量与过拟合的关系模型容量指的是模型能够学习的复杂模式的能力，通常与模型的参数数量、网络结构的复杂度等相关。过拟合是指模型在训练数据上表现很好，但在未见过的测试数据上表现不佳。当模型容量较低时，模型可能无法学习到数据中的复杂模式，导致欠拟合，即在训练集和测试集上的表现都较差。随着模型容量的增加，模型能够学习到更复杂的模式，在训练集上的表现
深度学习-自学手册谁用了尧哥这个昵称 AI 深度学习
人工智能机器学习神经网络前馈神经网络：没有回路的反馈神经网络：有回路的DNN深度神经网络CNN卷积神经网络RNN循环神经网络LSTM是RNN的一种，长短期记忆网络自然语言处理神经网络神经元-分类器Hebb学习方法，随机–类似SGD一篇神经网络入门BP反向传播，表示很复杂的函数/空间分布从最后一层往前调整参数，反复循环该操作y=a(wx+b)x输入y输出a激活函
53、深度学习-自学之路-自己搭建深度学习框架-14、使用自己的架构搭建一个通过学习模仿莎士比亚风格的2000次的文章。并且在关键层配有详细解释。小宇爱深度学习-自学之路深度学习人工智能神经网络自然语言处理 rnn
importnumpyasnpclassTensor(object):def__init__(self,data,autograd=False,creators=None,creation_op=None,id=None):self.data=np.array(data)self.autograd=autogradself.grad=Noneif(idisNone):self.id=np.rand
自然语言处理（Natural Language Processing, NLP）的主要应用及核心技术彬彬侠自然语言处理 NLP 自然语言处理
自然语言处理（NLP）是人工智能（AI）的一个重要分支，旨在让计算机能够理解、生成和处理人类语言。NLP在多个领域有着广泛的应用，并结合了多种先进的技术，包括机器学习（ML）、深度学习（DL）、统计模型以及规则方法。1.自然语言处理的主要应用1.1机器翻译（MachineTranslation,MT）应用场景：在线翻译：GoogleTranslate、DeepL、BaiduTranslate。跨语
DeepSeek 高阶应用技术详解（4） Evaporator Core #DeepSeek快速入门 DeepSeek进阶开发与应用 deepseek
1.引言在前三篇中，我们探讨了DeepSeek的基础功能、分布式训练、模型优化、模型解释性、超参数优化以及AutoML的应用。本篇将深入探讨DeepSeek在时间序列分析、图神经网络（GNN）和推荐系统中的应用。这些领域是深度学习的前沿方向，具有广泛的实际应用价值。2.DeepSeek在时间序列分析中的应用2.1时间序列分析简介时间序列分析是处理时间相关数据的重要技术，广泛应用于金融、气象、医疗等
用PyTorch玩转数据：从整理到“看图说话开心快乐幸福一家人 pytorch 人工智能 python
最近在实验室鼓捣深度学习项目，发现PyTorch的数据处理流程简直像搭乐高——每个模块都精准卡位。今天就把这套"厨房级"工具链拆解给大家看看，连我这种手残党都能轻松上手。01数据收纳术：你的专属AI管家想象你有一堆杂乱照片需要整理。PyTorch的Dataset类就像智能相册，只要定义好__getitem__（怎么找照片）和__len__（总共有多少张），它就能瞬间把你的数据码得整整齐齐。而Dat
java调用百度云人脸识别API 一抓掉一大把 java
packagecom.sike.controller;importcom.baidu.aip.face.AipFace;importcom.sike.entity.User;importcom.sike.service.UserService;importjakarta.servlet.http.HttpServletRequest;importorg.json.JSONObject;import
解读Servlet原理篇二---GenericServlet与HttpServlet 周凡杨 java HttpServlet 源理 GenericService 源码
在上一篇《解读Servlet原理篇一》中提到，要实现javax.servlet.Servlet接口（即写自己的Servlet应用），你可以写一个继承自javax.servlet.GenericServletr的generic Servlet ，也可以写一个继承自java.servlet.http.HttpServlet的HTTP Servlet（这就是为什么我们自定义的Servlet通常是exte
MySQL性能优化 bijian1013 数据库 mysql
性能优化是通过某些有效的方法来提高MySQL的运行速度，减少占用的磁盘空间。性能优化包含很多方面，例如优化查询速度，优化更新速度和优化MySQL服务器等。本文介绍方法的主要有： a.优化查询 b.优化数据库结构
ThreadPool定时重试 dai_lm java ThreadPool thread timer timertask
项目需要当某事件触发时，执行http请求任务，失败时需要有重试机制，并根据失败次数的增加，重试间隔也相应增加，任务可能并发。由于是耗时任务，首先考虑的就是用线程来实现，并且为了节约资源，因而选择线程池。为了解决不定间隔的重试，选择Timer和TimerTask来完成 package threadpool; public class ThreadPoolTest {
Oracle 查看数据库的连接情况周凡杨 sql oracle 连接
首先要说的是，不同版本数据库提供的系统表会有不同，你可以根据数据字典查看该版本数据库所提供的表。 select * from dict where table_name like '%SESSION%'; 就可以查出一些表，然后根据这些表就可以获得会话信息 select sid,serial#,status,username,schemaname,osuser,terminal,ma
类的继承朱辉辉33 java
类的继承可以提高代码的重用行，减少冗余代码；还能提高代码的扩展性。Java继承的关键字是extends 格式:public class 类名（子类）extends 类名（父类）{ } 子类可以继承到父类所有的属性和普通方法，但不能继承构造方法。且子类可以直接使用父类的public和 protected属性，但要使用private属性仍需通过调用。子类的方法可以重写，但必须和父类的返回值类
android 悬浮窗特效肆无忌惮_ android
最近在开发项目的时候需要做一个悬浮层的动画，类似于支付宝掉钱动画。但是区别在于，需求是浮出一个窗口，之后边缩放边位移至屏幕右下角标签处。效果图如下：一开始考虑用自定义View来做。后来发现开线程让其移动很卡，ListView+动画也没法精确定位到目标点。后来想利用Dialog的dismiss动画来完成。自定义一个Dialog后，在styl
hadoop伪分布式搭建林鹤霄 hadoop
要修改4个文件 1: vim hadoop-env.sh 第九行 2: vim core-site.xml <configuration> &n
gdb调试命令 aigo gdb
原文：http://blog.csdn.net/hanchaoman/article/details/5517362 一、GDB常用命令简介 r run 运行.程序还没有运行前使用 c cuntinue
Socket编程的HelloWorld实例 alleni123 socket
public class Client { public static void main(String[] args) { Client c=new Client(); c.receiveMessage(); } public void receiveMessage(){ Socket s=null; BufferedRea
线程同步和异步百合不是茶线程同步异步
多线程和同步 : 如进程、线程同步，可理解为进程或线程A和B一块配合，A执行到一定程度时要依靠B的某个结果，于是停下来，示意B运行；B依言执行，再将结果给A；A再继续操作。所谓同步，就是在发出一个功能调用时，在没有得到结果之前，该调用就不返回，同时其它线程也不能调用这个方法多线程和异步:多线程可以做不同的事情,涉及到线程通知 &
JSP中文乱码分析 bijian1013 java jsp 中文乱码
在JSP的开发过程中，经常出现中文乱码的问题。首先了解一下Java中文问题的由来： Java的内核和class文件是基于unicode的，这使Java程序具有良好的跨平台性，但也带来了一些中文乱码问题的麻烦。原因主要有两方面，
js实现页面跳转重定向的几种方式 bijian1013 JavaScript 重定向
js实现页面跳转重定向有如下几种方式：一.window.location.href <script language="javascript"type="text/javascript"> window.location.href="http://www.baidu.c
【Struts2三】Struts2 Action转发类型 bit1129 struts2
在【Struts2一】 Struts Hello World http://bit1129.iteye.com/blog/2109365中配置了一个简单的Action，配置如下 <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configurat
【HBase十一】Java API操作HBase bit1129 hbase
Admin类的主要方法注释： 1. 创建表 /** * Creates a new table. Synchronous operation. * * @param desc table descriptor for table * @throws IllegalArgumentException if the table name is res
nginx gzip ronin47 nginx gzip
Nginx GZip 压缩 Nginx GZip 模块文档详见：http://wiki.nginx.org/HttpGzipModule 常用配置片段如下： gzip on; gzip_comp_level 2; # 压缩比例，比例越大，压缩时间越长。默认是1 gzip_types text/css text/javascript; # 哪些文件可以被压缩 gzip_disable &q
java-7.微软亚院之编程判断俩个链表是否相交给出俩个单向链表的头指针，比如 h1 ， h2 ，判断这俩个链表是否相交 bylijinnan java
public class LinkListTest { /** * we deal with two main missions: * * A. * 1.we create two joined-List(both have no loop) * 2.whether list1 and list2 join * 3.print the join
Spring源码学习-JdbcTemplate batchUpdate批量操作 bylijinnan java spring
Spring JdbcTemplate的batch操作最后还是利用了JDBC提供的方法，Spring只是做了一下改造和封装 JDBC的batch操作： String sql = "INSERT INTO CUSTOMER " + "(CUST_ID, NAME, AGE) VALUES (?, ?, ?)";
[JWFD开源工作流]大规模拓扑矩阵存储结构最新进展 comsci 工作流
生成和创建类已经完成,构造一个100万个元素的矩阵模型,存储空间只有11M大,请大家参考我在博客园上面的文档"构造下一代工作流存储结构的尝试",更加相信的设计和代码将陆续推出......... 竞争对手的能力也很强.......,我相信..你们一定能够先于我们推出大规模拓扑扫描和分析系统的....
base64编码和url编码 cuityang base64 url
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.PrintWriter; import java.io.StringWriter; import java.io.UnsupportedEncodingException;
web应用集群Session保持 dalan_123 session
关于使用 memcached 或redis 存储 session ，以及使用 terracotta 服务器共享。建议使用 redis，不仅仅因为它可以将缓存的内容持久化，还因为它支持的单个对象比较大，而且数据类型丰富，不只是缓存 session，还可以做其他用途，一举几得啊。1、使用 filter 方法存储这种方法比较推荐，因为它的服务器使用范围比较多，不仅限于tomcat ，而且实现的原理比较简
Yii 框架里数据库操作详解-[增加、查询、更新、删除的方法 'AR模式'] dcj3sjt126com 数据库
public function getMinLimit () { $sql = "..."; $result = yii::app()->db->createCo
solr StatsComponent（聚合统计） eksliang solr聚合查询 solr stats
StatsComponent 转载请出自出处：http://eksliang.iteye.com/blog/2169134 http://eksliang.iteye.com/ 一、概述 Solr可以利用StatsComponent 实现数据库的聚合统计查询，也就是min、max、avg、count、sum的功能二、参数
百度一道面试题 greemranqq 位运算百度面试寻找奇数算法 bitmap 算法
那天看朋友提了一个百度面试的题目：怎么找出{1,1,2,3,3,4,4,4,5,5,5,5} 找出出现次数为奇数的数字. 我这里复制的是原话，当然顺序是不一定的，很多拿到题目第一反应就是用map,当然可以解决，但是效率不高。还有人觉得应该用算法xxx,我是没想到用啥算法好...！还有觉得应该先排序... 还有觉
Spring之在开发中使用SpringJDBC ihuning spring
在实际开发中使用SpringJDBC有两种方式： 1. 在Dao中添加属性JdbcTemplate并用Spring注入； JdbcTemplate类被设计成为线程安全的，所以可以在IOC 容器中声明它的单个实例，并将这个实例注入到所有的 DAO 实例中。JdbcTemplate也利用了Java 1.5 的特定(自动装箱，泛型，可变长度
JSON API 1.0 核心开发者自述 | 你所不知道的那些技术细节 justjavac json
2013年5月，Yehuda Katz 完成了JSON API(英文，中文) 技术规范的初稿。事情就发生在 RailsConf 之后，在那次会议上他和 Steve Klabnik 就 JSON 雏形的技术细节相聊甚欢。在沟通单一 Rails 服务器库—— ActiveModel::Serializers 和单一 JavaScript 客户端库——&
网站项目建设流程概述 macroli 工作
一.概念网站项目管理就是根据特定的规范、在预算范围内、按时完成的网站开发任务。二.需求分析项目立项　　我们接到客户的业务咨询，经过双方不断的接洽和了解，并通过基本的可行性讨论够，初步达成制作协议，这时就需要将项目立项。较好的做法是成立一个专门的项目小组，小组成员包括：项目经理，网页设计，程序员，测试员，编辑/文档等必须人员。项目实行项目经理制。客户的需求说明书　　第一步是需
AngularJs 三目运算表达式判断 qiaolevip 每天进步一点点学习永无止境众观千象 AngularJS
事件回顾：由于需要修改同一个模板，里面包含2个不同的内容，第一个里面使用的时间差和第二个里面名称不一样，其他过滤器，内容都大同小异。希望杜绝If这样比较傻的来判断if-show or not，继续追究其源码。 var b = "{{", a = "}}"; this.startSymbol = function(a) {
Spark算子：统计RDD分区中的元素及数量 superlxw1234 spark spark算子 Spark RDD分区元素
关键字：Spark算子、Spark RDD分区、Spark RDD分区元素数量 Spark RDD是被分区的，在生成RDD时候，一般可以指定分区的数量，如果不指定分区数量，当RDD从集合创建时候，则默认为该程序所分配到的资源的CPU核数，如果是从HDFS文件创建，默认为文件的Block数。可以利用RDD的mapPartitionsWithInd
Spring 3.2.x将于2016年12月31日停止支持 wiselyman Spring 3
Spring 团队公布在2016年12月31日停止对Spring Framework 3.2.x（包含tomcat 6.x）的支持。在此之前spring团队将持续发布3.2.x的维护版本。请大家及时准备及时升级到Spring
fis纯前端解决方案fis-pure zccst JavaScript
作者：zccst FIS通过插件扩展可以完美的支持模块化的前端开发方案，我们通过FIS的二次封装能力，封装了一个功能完备的纯前端模块化方案pure。 1，fis-pure的安装 $ fis install -g fis-pure $ pure -v 0.1.4 2，下载demo到本地 git clone https://github.com/hefangshi/f