近期在复现文献《Speech Enhancement Generative Adversarial Network》的代码,代码:https://github.com/santi-pdp/segan。(写这个主要是怕后面给忘了,现在也是处于尝试阶段)
目前有pytorch 和TensorFlow两个版本。我这次主要选择原版TensorFlow0.12.1版本。
首先关于环境配置:
ubuntu18.04 +cuda8.0+cudnn5.1+python2.7
环境最好按照作者提供的配置要求来配,不然会出现不知名的错误。
训练阶段:train_segan.sh
#!/bin/bash
# Place the CUDA_VISIBLE_DEVICES="xxxx" required before the python call
# e.g. to specify the first two GPUs in your system: CUDA_VISIBLE_DEVICES="0,1" python ...
# SEGAN with no pre-emph and no bias in conv layers (just filters to downconv + deconv)
#CUDA_VISIBLE_DEVICES="2,3" python main.py --init_noise_std 0. --save_path segan_vanilla \
# --init_l1_weight 100. --batch_size 100 --g_nl prelu \
# --save_freq 50 --epoch 50
# SEGAN with pre-emphasis to try to discriminate more high freq (better disc of high freqs)
#CUDA_VISIBLE_DEVICES="1,2,3" python main.py --init_noise_std 0. --save_path segan_preemph \
# --init_l1_weight 100. --batch_size 100 --g_nl prelu \
# --save_freq 50 --preemph 0.95 --epoch 86
# Apply pre-emphasis AND apply biases to all conv layers (best SEGAN atm)
CUDA_VISIBLE_DEVICES="0,1" python main.py --init_noise_std 0. --save_path segan_allbiased_preemph \
--init_l1_weight 100. --batch_size 100 --g_nl prelu \
--save_freq 50 --preemph 0.95 --epoch 86 --bias_deconv True \
--bias_downconv True --bias_D_conv True
测试阶段:原来的clean_wav.sh好像只能一次处理一个语音文件,我不知道为什么,但是只能处理一个语音,我一直没改好。
#!/bin/bash
# guia file containing pointers to files to clean up
#if [ $# -lt 1 ]; then
#echo 'ERROR: at least wavname must be provided!'
#echo "Usage: $0
#echo "If no save_path is specified, clean file is saved in current dir"
#exit 1
#fi
#NOISY_WAVNAME='/home/zyf/SEGAN/segan-master1/noisy_testset_wav_16k/p232_022.wav'
NOISY_WAVNAME="/home/zyf/SEGAN/segan-master1/mytest/p232_023.wav"
SAVE_PATH='test_clean_results'
#if [ $# -gt 1 ]; then
# SAVE_PATH="$2"
#fi
#echo "INPUT NOISY WAV: $NOISY_WAVNAME"
#echo "SAVE PATH: $SAVE_PATH"
mkdir -p $SAVE_PATH
python main.py --init_noise_std 0. --save_path segan_v1.1 \
--batch_size 100 --g_nl prelu --weights SEGAN-41700 \
--preemph 0.95 --bias_deconv True \
--bias_downconv True --bias_D_conv True \
--test_wav $NOISY_WAVNAME --save_clean_path $SAVE_PATH
后来,我找到一个github,是其他人针对SEGAN做的修改版本,其中加了一个test_all.sh,是可以批量处理语音的脚本:https://github.com/lordet01/segan/blob/master/test_all.sh
我对其修改如下:
#!/bin/bash
# Set foler pathes for noisy and enhanced files
#if [ $# -lt 2 ]; then
#echo 'ERROR: Noisy and Ehnahced Pathes should be provided!'
#echo "Usage: $0
#exit 1
#fi
NOISY_PATH="/home/zyf/SEGAN/segan-master1/mytest"
ENHANCED_PATH="test_clean_results"
mkdir -p $ENHANCED_PATH
FILES=$NOISY_PATH/*.wav
for f in $FILES
do
sh clean_wav.sh $f $ENHANCED_PATH
done
我做了一个测试,一次对8个含噪语音信号进行降噪处理,用matlab进行语音播放,初步看是有明显降噪效果。