基于PaddleGAN精准唇形合成实现持枪人讲电话已关机
就在上周,坐拥百万粉丝的独立艺术家大谷Spitzer老师利用深度学习技术使宋代诗人苏轼活过来,穿越千年,为屏幕前的你们亲自朗诵其著名古诗~ 点击量近百万,同时激起百万网友热议,到底是什么技术这么牛气?
铛铛铛!!飞桨PaddleGAN这就来给大家揭秘,手把手教大家如何实现唇型的迁移,学习过本项目的你们,从此不仅能让苏轼念诗,还能让蒙娜丽莎播新闻、新闻主播唱Rap… 只有你想不到的,没有PaddleGAN做不到的!
本教程是基于PaddleGAN实现的视频唇形同步模型Wav2lip, 它实现了人物口型与输入语音同步,俗称「对口型」。 比如这样:
不仅仅让静态图像会「说话」,Wav2lip还可以直接将动态的视频,进行唇形转换,输出与目标语音相匹配的视频,自制视频配音不是梦!
本次教程包含四个部分:
若是大家喜欢这个教程,欢迎到Github PaddleGAN主页点击star呀!下面就让我们一起动手实现吧!
Wav2lip实现唇形与语音精准同步突破的关键在于,它采用了唇形同步判别器,以强制生成器持续产生准确而逼真的唇部运动。
此外,该研究通过在鉴别器中,使用多个连续帧而不是单个帧,并使用视觉质量损失(而不仅仅是对比损失)来考虑时间相关性,从而改善了视觉质量。
该wav2lip模型几乎是万能的,适用于任何人脸、任何语音、任何语言,对任意视频都能达到很高的准确率,可以无缝地与原始视频融合,还可以用于转换动画人脸,并且导入合成语音也是可行的
# 下载PaddlePaddle安装包
%cd /home/aistudio/work
/home/aistudio/work
# 从github上克隆PaddleGAN代码(如下载速度过慢,可用gitee源)
!git clone https://gitee.com/PaddlePaddle/PaddleGAN
#!git clone https://github.com/PaddlePaddle/PaddleGAN
# 安装所需安装包
!mkdir sndfile
%cd sndfile
!wget http://www.mega-nerd.com/libsndfile/files/libsndfile-1.0.28.tar.gz
!tar xzvf libsndfile-1.0.28.tar.gz
%cd libsndfile-1.0.28
!./configure --prefix=/home/aistudio/build_libs CFLAGS=-fPIC --enable-shared
!make
!make install
%cd /home/aistudio/work/PaddleGAN
!pip install -r requirements.txt
%cd applications/
重点来啦!!本项目支持大家上传自己准备的视频和音频, 合成任意想要的逼真的配音视频!!
只需在如下命令中的face参数和audio参数分别换成自己的视频和音频路径,然后运行如下命令,就可以生成和音频同步的视频。
程序运行完成后,会在当前文件夹下生成文件名为outfile参数指定的视频文件,该文件即为和音频同步的视频文件。本项目中提供了demo展示所用到的视频和音频文件。具体的参数使用说明如下:
# !export PYTHONPATH=$PYTHONPATH:/home/aistudio/work/PaddleGAN && python tools/wav2lip.py --face /home/aistudio/work/mona46s.mp4 --audio /home/aistudio/work/guangquan.m4a --outfile pp_guangquan_mona46s.mp4
!export PYTHONPATH=$PYTHONPATH:/home/aistudio/work/PaddleGAN && python tools/wav2lip.py --face /home/aistudio/work/meinv.mp4 --audio /home/aistudio/work/guanji.wav --outfile guanji.mp4
/home/aistudio/work/PaddleGAN/ppgan/models/base_model.py:52: DeprecationWarning: invalid escape sequence \/
"""
/home/aistudio/work/PaddleGAN/ppgan/modules/init.py:70: DeprecationWarning: invalid escape sequence \s
"""
/home/aistudio/work/PaddleGAN/ppgan/modules/init.py:134: DeprecationWarning: invalid escape sequence \m
"""
/home/aistudio/work/PaddleGAN/ppgan/modules/init.py:159: DeprecationWarning: invalid escape sequence \m
"""
/home/aistudio/work/PaddleGAN/ppgan/modules/init.py:190: DeprecationWarning: invalid escape sequence \m
"""
/home/aistudio/work/PaddleGAN/ppgan/modules/init.py:227: DeprecationWarning: invalid escape sequence \m
"""
/home/aistudio/work/PaddleGAN/ppgan/modules/dense_motion.py:116: DeprecationWarning: invalid escape sequence \h
"""
Reading video frames...
Number of frames available for inference: 320
(80, 985)
Length of mel chunks: 305
W0130 11:44:24.694782 9610 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
W0130 11:44:24.700759 9610 device_context.cc:372] device: 0, cuDNN Version: 7.6.
100%|████████████████████████████████| 141910/141910 [00:02<00:00, 60080.20it/s]
Model loaded
0%| | 0/3 [00:00, ?it/s]
0%| | 0/109119 [00:00, ?it/s][A
1%|▎ | 915/109119 [00:00<00:11, 9070.79it/s][A
4%|█▎ | 4396/109119 [00:00<00:08, 11656.41it/s][A
9%|███ | 9651/109119 [00:00<00:06, 14779.74it/s][A
15%|████▉ | 16300/109119 [00:00<00:04, 19277.40it/s][A
21%|███████ | 23166/109119 [00:00<00:03, 24581.13it/s][A
28%|█████████ | 30032/109119 [00:00<00:02, 30444.36it/s][A
35%|███████████▍ | 37685/109119 [00:00<00:01, 37156.25it/s][A
41%|█████████████▋ | 45230/109119 [00:00<00:01, 43829.62it/s][A
48%|███████████████▉ | 52864/109119 [00:00<00:01, 50248.47it/s][A
55%|██████████████████▎ | 60491/109119 [00:01<00:00, 55976.44it/s][A
62%|████████████████████▌ | 68019/109119 [00:01<00:00, 60635.65it/s][A
69%|██████████████████████▊ | 75624/109119 [00:01<00:00, 64560.88it/s][A
76%|█████████████████████████▏ | 83203/109119 [00:01<00:00, 67558.23it/s][A
83%|███████████████████████████▍ | 90659/109119 [00:01<00:00, 69516.65it/s][A
90%|█████████████████████████████▋ | 98068/109119 [00:01<00:00, 69695.64it/s][A
100%|████████████████████████████████| 109119/109119 [00:01<00:00, 64286.51it/s][A
0%| | 0/20 [00:00, ?it/s][A
5%|██▏ | 1/20 [00:02<00:41, 2.17s/it][A
10%|████▍ | 2/20 [00:04<00:38, 2.12s/it][A
15%|██████▌ | 3/20 [00:06<00:35, 2.07s/it][A
20%|████████▊ | 4/20 [00:07<00:32, 2.01s/it][A
25%|███████████ | 5/20 [00:09<00:29, 1.97s/it][A
30%|█████████████▏ | 6/20 [00:11<00:27, 1.94s/it][A
35%|███████████████▍ | 7/20 [00:13<00:25, 1.93s/it][A
40%|█████████████████▌ | 8/20 [00:15<00:22, 1.91s/it][A
45%|███████████████████▊ | 9/20 [00:17<00:20, 1.91s/it][A
50%|█████████████████████▌ | 10/20 [00:19<00:19, 1.90s/it][A
55%|███████████████████████▋ | 11/20 [00:21<00:17, 1.91s/it][A
60%|█████████████████████████▊ | 12/20 [00:23<00:15, 1.90s/it][A
65%|███████████████████████████▉ | 13/20 [00:25<00:13, 1.90s/it][A
70%|██████████████████████████████ | 14/20 [00:26<00:11, 1.90s/it][A
75%|████████████████████████████████▎ | 15/20 [00:28<00:09, 1.91s/it][A
80%|██████████████████████████████████▍ | 16/20 [00:30<00:07, 1.91s/it][A
85%|████████████████████████████████████▌ | 17/20 [00:32<00:05, 1.92s/it][A
90%|██████████████████████████████████████▋ | 18/20 [00:34<00:03, 1.92s/it][A
95%|████████████████████████████████████████▊ | 19/20 [00:36<00:01, 1.92s/it][A
100%|███████████████████████████████████████████| 20/20 [00:36<00:00, 1.83s/it][A
100%|█████████████████████████████████████████████| 3/3 [00:41<00:00, 13.73s/it]
ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
libavutil 54. 31.100 / 54. 31.100
libavcodec 56. 60.100 / 56. 60.100
libavformat 56. 40.101 / 56. 40.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 40.101 / 5. 40.101
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.101 / 1. 2.101
libpostproc 53. 3.100 / 53. 3.100
[0;33mGuessed Channel Layout for Input Stream #0.0 : stereo
[0mInput #0, wav, from '/home/aistudio/work/guanji.wav':
Duration: 00:00:12.30, bitrate: 3072 kb/s
Stream #0:0: Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, 2 channels, flt, 3072 kb/s
Input #1, avi, from 'temp/result.avi':
Metadata:
encoder : Lavf58.31.101
Duration: 00:00:12.20, start: 0.000000, bitrate: 1369 kb/s
Stream #1:0: Video: mpeg4 (Simple Profile) (DIVX / 0x58564944), yuv420p, 1094x614 [SAR 1:1 DAR 547:307], 1364 kb/s, 25 fps, 25 tbr, 25 tbn, 25 tbc
[1;36m[libx264 @ 0x1c25ea0] [0m[0;33m-qscale is ignored, -crf is recommended.
[0m[1;36m[libx264 @ 0x1c25ea0] [0musing SAR=1/1
[1;36m[libx264 @ 0x1c25ea0] [0musing cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 AVX2 LZCNT BMI2
[1;36m[libx264 @ 0x1c25ea0] [0mprofile High, level 3.1
[1;36m[libx264 @ 0x1c25ea0] [0m264 - core 148 r2643 5c65704 - H.264/MPEG-4 AVC codec - Copyleft 2003-2015 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=19 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'guanji.mp4':
Metadata:
encoder : Lavf56.40.101
Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1094x614 [SAR 1:1 DAR 547:307], q=-1--1, 25 fps, 12800 tbn, 25 tbc
Metadata:
encoder : Lavc56.60.100 libx264
Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 48000 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc56.60.100 aac
Stream mapping:
Stream #1:0 -> #0:0 (mpeg4 (native) -> h264 (libx264))
Stream #0:0 -> #0:1 (pcm_f32le (native) -> aac (native))
Press [q] to stop, [?] for help
frame= 305 fps=161 q=-1.0 Lsize= 579kB time=00:00:12.30 bitrate= 385.1kbits/s
video:373kB audio:195kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.829867%
[1;36m[libx264 @ 0x1c25ea0] [0mframe I:2 Avg QP:17.00 size: 65208
[1;36m[libx264 @ 0x1c25ea0] [0mframe P:77 Avg QP:17.98 size: 2572
[1;36m[libx264 @ 0x1c25ea0] [0mframe B:226 Avg QP:24.89 size: 233
[1;36m[libx264 @ 0x1c25ea0] [0mconsecutive B-frames: 1.0% 0.7% 0.0% 98.4%
[1;36m[libx264 @ 0x1c25ea0] [0mmb I I16..4: 9.4% 86.0% 4.6%
[1;36m[libx264 @ 0x1c25ea0] [0mmb P I16..4: 0.1% 1.1% 0.0% P16..4: 10.5% 2.2% 2.1% 0.0% 0.0% skip:84.1%
[1;36m[libx264 @ 0x1c25ea0] [0mmb B I16..4: 0.0% 0.1% 0.0% B16..8: 5.9% 0.1% 0.0% direct: 0.0% skip:93.8% L0:49.8% L1:49.1% BI: 1.2%
[1;36m[libx264 @ 0x1c25ea0] [0m8x8 transform intra:86.9% inter:85.1%
[1;36m[libx264 @ 0x1c25ea0] [0mcoded y,uvDC,uvAC intra: 80.0% 88.4% 48.8% inter: 1.4% 2.1% 0.2%
[1;36m[libx264 @ 0x1c25ea0] [0mi16 v,h,dc,p: 40% 31% 20% 9%
[1;36m[libx264 @ 0x1c25ea0] [0mi8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 35% 19% 27% 3% 2% 3% 2% 3% 6%
[1;36m[libx264 @ 0x1c25ea0] [0mi4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 49% 27% 9% 3% 2% 3% 2% 3% 3%
[1;36m[libx264 @ 0x1c25ea0] [0mi8c dc,h,v,p: 32% 24% 34% 10%
[1;36m[libx264 @ 0x1c25ea0] [0mWeighted P-Frames: Y:0.0% UV:0.0%
[1;36m[libx264 @ 0x1c25ea0] [0mref P L0: 73.2% 5.9% 12.9% 8.0%
[1;36m[libx264 @ 0x1c25ea0] [0mref B L0: 79.9% 16.3% 3.8%
[1;36m[libx264 @ 0x1c25ea0] [0mref B L1: 92.7% 7.3%
[1;36m[libx264 @ 0x1c25ea0] [0mkb/s:249.97
[0m
视频比较长的话,运行时间会稍长,建议把视频下载到本地预览
# display the output video
import cv2
import imageio
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML
import warnings
# video display function
def display(driving, fps, size=(8, 6)):
fig = plt.figure(figsize=size)
ims = []
for i in range(len(driving)):
cols = []
cols.append(driving[i])
im = plt.imshow(np.concatenate(cols, axis=1), animated=True)
plt.axis('off')
ims.append([im])
video = animation.ArtistAnimation(fig, ims, interval=1000.0/fps, repeat_delay=1000)
plt.close()
return video
# Display the output video
# 视频比较长的话,运行时间会稍长,建议把视频下载到本地预览,视频保存路径为'/home/aistudio/work/PaddleGAN/applications'
video_path = 'guanji.mp4'
video_frames = imageio.mimread(video_path, memtest=False)
# 获得视频的原分辨率
cap = cv2.VideoCapture(video_path)
fps = cap.get(cv2.CAP_PROP_FPS)
HTML(display(video_frames, fps).to_html5_video())
基于PaddleGAN精准唇形合成实现持枪人讲电话已关机
首先帮大家总结一波:让图片会说话、视频花式配音的魔法–Wav2lip的使用只用三步:
贴心的送上项目传送门:PaddleGAN 记得点Star关注噢~~
PaddleGAN是只能做「对口型」的应用么?NONONO!当然不是!!
接下来就给大家展示下PaddleGAN另外的花式应用,如各类图形影像生成、处理能力。
人脸属性编辑能力能够在人脸识别和人脸生成基础上,操纵面部图像的单个或多个属性,实现换妆、变老、变年轻、变换性别、发色等,一键换脸成为可能;
动作迁移,能够实现肢体动作变换、人脸表情动作迁移等等等等。
强烈鼓励大家玩起来,激发PaddleGAN的潜能!
欢迎加入官方QQ群(1058398620)与各路技术高手交流~~