科研日志——使用opencv提取视频中的人脸

科研任务需要,将视频中与文字和音频对应的视频帧根据时间戳提取出来,并截取视频帧中的人脸。使用的工具是opencv,步骤如下:

1. 提取视频帧

Sentence标号的含义

数据集中对视频中的每个sentence进行了标号,拿Ses01F_impro01_F001.png举例说明:

  1. Ses01代表着视频位于Session1
  2. 第一个F代表着视频中的female person 穿戴传感器设备
  3. impro01代表着视频是两种表演方式(impro和script)中的impro类型的第一个视频
  4. F001代表着是视频中的female person说的第一句话

根据标号所代表的含义以及已经处理好的processed_ids.txt文件中sentences的顺序,从IEMOCAP数据集中IEMOCAP_full_release/Session1/dialog/lab含有的sentence时间戳中提取正确的时间顺序,将时间戳储存在processed_timepoints.npy文件中

视频帧的提取

python 代码如下(只提取Session1)

import cv2
import numpy as np

with open('/home/jekin/PycharmProjects/Speech/code/data/processed/IEMOCAP/processed_ids.txt', 'r') as f:
    lines = f.readlines()
    ids = [x.split() for x in lines]

time_points = np.load('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/processed_timepoints.npy')

for i in range(len(ids)):
    # get the video name
    cur_id = ids[i][0]
    typename = ''
    for k in range(len(ids[i][0].split('_')) - 1):
        typename += ids[i][0].split('_')[k]
        if k == len(ids[i][0].split('_')) - 2:
            continue
        else:
            typename += '_'

    cap = cv2.VideoCapture('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/avi/DivX/'
                           + typename + '.avi')
    fps = cap.get(cv2.CAP_PROP_FPS)
    num_frame = fps * time_points[i]
    cap.set(cv2.CAP_PROP_POS_FRAMES, num_frame)
    success, frame=cap.read()
    if success:
        if ids[i][0].split('_')[0][5] == 'F' and ids[i][0].split('_')[-1][0] == 'F':
            cut_frame = frame[:, 0:frame.shape[1]/2]

        elif ids[i][0].split('_')[0][5] == 'F' and ids[i][0].split('_')[-1][0] == 'M':
            cut_frame = frame[:, frame.shape[1]/2:frame.shape[1]]
        elif ids[i][0].split('_')[0][5] == 'M' and ids[i][0].split('_')[-1][0] == 'F':
            cut_frame = frame[:, frame.shape[1]/2:frame.shape[1]]
        else:
            cut_frame = frame[:, 0:frame.shape[1] / 2]
        cv2.imwrite('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/pictures/'
            + cur_id +'.png', cut_frame)

2. 人脸截取

人脸截取代码参考图片人脸截取
应用到项目中的代码如下:

import cv2
import numpy as np

def create_empty_picture(width):
    img = np.ones((width, width), dtype=np.uint8)
    bgr_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
    return bgr_img

with open('/home/jekin/PycharmProjects/Speech/code/data/processed/IEMOCAP/processed_ids.txt', 'r') as f:
    lines = f.readlines()
    ids = [x.split() for x in lines]

# record the number of unsuccessfully processed pictures
numerror = 0
# record the maximum width
maxwidth = 0

for i in range(len(ids)):
    cur_id = ids[i][0]
    # The codes below intend to process the data in Session1
    file_name = ids[i][0] + '.png'
    filepath = '/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/pictures/' + file_name
    img = cv2.imread(filepath)
    if ids[i][0].split('_')[0][5] == 'F' and ids[i][0].split('_')[-1][0] == 'F':
        img = cv2.flip(img, 1)
    elif ids[i][0].split('_')[0][5] == 'M' and ids[i][0].split('_')[-1][0] == 'M':
        img = cv2.flip(img, 1)

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    classifier = cv2.CascadeClassifier(
        "/home/jekin/MERpy/face_recognition/data/haarcascades/haarcascade_profileface.xml")
    color = (0, 255, 0)

    faceRects = classifier.detectMultiScale(
        gray,  scaleFactor=1.02, minNeighbors=3, minSize=(60, 60), maxSize=(70, 70))


    if len(faceRects):
        # for faceRect in faceRects:
        x, y, w, h = faceRects[0]
        cut_face = img[y:y+h, x:x+w]
        cv2.imwrite('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/face_cut/ses1/'
                        + cur_id + '.png', cut_face)

        if w > maxwidth:
            maxwidth = w
        elif h > maxwidth:
            maxwidth = h
    else:
        print(ids[i][0])
        img = create_empty_picture(50)
        cv2.imwrite('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/face_cut/ses1/'
                    + cur_id + '.png', img)
        numerror += 1
print('The number of unsuccessfully processed pictures is:{}'.format(numerror))
print('The maximum width of the cropped faces is:{}'.format(maxwidth))

其中代码片段

faceRects = classifier.detectMultiScale(
        gray,  scaleFactor=1.02, minNeighbors=3, minSize=(60, 60), maxSize=(70, 70))

参数scaleFacter默认值为1.1,数值越大计算速度越快,相应人脸识别精度越低;minSizemaxSize代表着识别的区域大小,可以用其规定算法识别出人脸的矩形框大小范围,避免在一张图片中识别出好几个区域,只有一个是人脸的情况。

你可能感兴趣的:(opencv,人脸)