科研任务需要,将视频中与文字和音频对应的视频帧根据时间戳提取出来,并截取视频帧中的人脸。使用的工具是opencv,步骤如下:
数据集中对视频中的每个sentence进行了标号,拿Ses01F_impro01_F001.png举例说明:
根据标号所代表的含义以及已经处理好的processed_ids.txt文件中sentences的顺序,从IEMOCAP数据集中IEMOCAP_full_release/Session1/dialog/lab含有的sentence时间戳中提取正确的时间顺序,将时间戳储存在processed_timepoints.npy文件中
python 代码如下(只提取Session1)
import cv2
import numpy as np
with open('/home/jekin/PycharmProjects/Speech/code/data/processed/IEMOCAP/processed_ids.txt', 'r') as f:
lines = f.readlines()
ids = [x.split() for x in lines]
time_points = np.load('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/processed_timepoints.npy')
for i in range(len(ids)):
# get the video name
cur_id = ids[i][0]
typename = ''
for k in range(len(ids[i][0].split('_')) - 1):
typename += ids[i][0].split('_')[k]
if k == len(ids[i][0].split('_')) - 2:
continue
else:
typename += '_'
cap = cv2.VideoCapture('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/avi/DivX/'
+ typename + '.avi')
fps = cap.get(cv2.CAP_PROP_FPS)
num_frame = fps * time_points[i]
cap.set(cv2.CAP_PROP_POS_FRAMES, num_frame)
success, frame=cap.read()
if success:
if ids[i][0].split('_')[0][5] == 'F' and ids[i][0].split('_')[-1][0] == 'F':
cut_frame = frame[:, 0:frame.shape[1]/2]
elif ids[i][0].split('_')[0][5] == 'F' and ids[i][0].split('_')[-1][0] == 'M':
cut_frame = frame[:, frame.shape[1]/2:frame.shape[1]]
elif ids[i][0].split('_')[0][5] == 'M' and ids[i][0].split('_')[-1][0] == 'F':
cut_frame = frame[:, frame.shape[1]/2:frame.shape[1]]
else:
cut_frame = frame[:, 0:frame.shape[1] / 2]
cv2.imwrite('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/pictures/'
+ cur_id +'.png', cut_frame)
人脸截取代码参考图片人脸截取
应用到项目中的代码如下:
import cv2
import numpy as np
def create_empty_picture(width):
img = np.ones((width, width), dtype=np.uint8)
bgr_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
return bgr_img
with open('/home/jekin/PycharmProjects/Speech/code/data/processed/IEMOCAP/processed_ids.txt', 'r') as f:
lines = f.readlines()
ids = [x.split() for x in lines]
# record the number of unsuccessfully processed pictures
numerror = 0
# record the maximum width
maxwidth = 0
for i in range(len(ids)):
cur_id = ids[i][0]
# The codes below intend to process the data in Session1
file_name = ids[i][0] + '.png'
filepath = '/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/pictures/' + file_name
img = cv2.imread(filepath)
if ids[i][0].split('_')[0][5] == 'F' and ids[i][0].split('_')[-1][0] == 'F':
img = cv2.flip(img, 1)
elif ids[i][0].split('_')[0][5] == 'M' and ids[i][0].split('_')[-1][0] == 'M':
img = cv2.flip(img, 1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
classifier = cv2.CascadeClassifier(
"/home/jekin/MERpy/face_recognition/data/haarcascades/haarcascade_profileface.xml")
color = (0, 255, 0)
faceRects = classifier.detectMultiScale(
gray, scaleFactor=1.02, minNeighbors=3, minSize=(60, 60), maxSize=(70, 70))
if len(faceRects):
# for faceRect in faceRects:
x, y, w, h = faceRects[0]
cut_face = img[y:y+h, x:x+w]
cv2.imwrite('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/face_cut/ses1/'
+ cur_id + '.png', cut_face)
if w > maxwidth:
maxwidth = w
elif h > maxwidth:
maxwidth = h
else:
print(ids[i][0])
img = create_empty_picture(50)
cv2.imwrite('/home/jekin/SPACE/IEMOCAP_full_release/Session1/dialog/lab/face_cut/ses1/'
+ cur_id + '.png', img)
numerror += 1
print('The number of unsuccessfully processed pictures is:{}'.format(numerror))
print('The maximum width of the cropped faces is:{}'.format(maxwidth))
其中代码片段
faceRects = classifier.detectMultiScale(
gray, scaleFactor=1.02, minNeighbors=3, minSize=(60, 60), maxSize=(70, 70))
参数scaleFacter默认值为1.1,数值越大计算速度越快,相应人脸识别精度越低;minSize和maxSize代表着识别的区域大小,可以用其规定算法识别出人脸的矩形框大小范围,避免在一张图片中识别出好几个区域,只有一个是人脸的情况。