语音信号处理之时域分析-音量

背景知识

分贝

分贝(decibel, db)是一个对数单位, 通常用于描述声音的等级。

image

假设有两个音源A和B,其中音源B的功率P2是音源A功率P1的两倍, 即 P2/P1 = 2

那么在其它条件相同的情况下(声音的频率, 听的距离),衡量两个声音的等级

10 log (P2/P1) = 10 log 2 = 3 dB  //功率相差两倍的声音,等级相差3db

但有时候我们又看到20倍log(),这是咋回事呢?

20 log使用的单位通常是声压 (sound pressure),功率可以看做是声压的平方,其实它们是对等的:

20 log (p2/p1) dB   =  10 log (p2^2/p1^2) dB   = 10 log (P2/P1) dB

标准声音等级

image

参考声压:20 μPa,认为是人耳能感受的极限

0 db 表示什么?

sound level = 20 log (pmeasured/pref) = 20 log 1 = 0 dB

只是表示待测试的声压刚好等于参考声压20 μPa,并不代表没有声音,可以认为该声压人耳无法感知,但振动还是存在的。同理- 20 dB就表示更微弱的振动了,只有参考声压的1/10

声音和距离的关系

image

假设声源辐射的总能量为P,声音是均匀辐射的额,单位面积接收到的能量为I

I  = P/(4πr2)
那么I 就和距离的平方成反比
I2/I1 = (r1^2)/(r2^2)

换句话说: 如果我们将距离加倍,则声压降低2倍,强度降低4倍,声级降低6 dB

音量

音量代表声音的强度,可由一个窗口或一帧内信号振幅的大小来衡量,一般有两种度量方法:
(1)每个帧的振幅的绝对值的总和:

其中为该帧的第i个采样点,n为该帧总的采样点数。这种度量方法的计算量小,但不太符合人的听觉感受。
(2)样本平方和取10为底的对数的10倍

它的单位是分贝(Decibels),是一个对数强度值,比较符合人耳对声音大小的感觉,但计算量稍复杂。

音量计算的Python实现如下:

import math
import numpy as np

# method 1: absSum
def calVolume(waveData, frameSize, overLap):
    wlen = len(waveData)
    step = frameSize - overLap
    frameNum = int(math.ceil(wlen*1.0/step))
    volume = np.zeros((frameNum,1))
    for i in range(frameNum):
        curFrame = waveData[np.arange(i*step,min(i*step+frameSize,wlen))]
        curFrame = curFrame - np.median(curFrame) # zero-justified
        volume[i] = np.sum(np.abs(curFrame))
    return volume

# method 2: 10 times log10 of square sum
def calVolumeDB(waveData, frameSize, overLap):
    wlen = len(waveData)
    step = frameSize - overLap
    frameNum = int(math.ceil(wlen*1.0/step))
    volume = np.zeros((frameNum,1))
    for i in range(frameNum):
        curFrame = waveData[np.arange(i*step,min(i*step+frameSize,wlen))]
        curFrame = curFrame - np.mean(curFrame) # zero-justified
        volume[i] = 10*np.log10(np.sum(curFrame*curFrame))
    return volume


#--------main.py-------
import wave
import pylab as pl
import numpy as np
import volume as vp

# ============ test the algorithm =============
# read wave file and get parameters.
fw = wave.open('aeiou.wav','r')
params = fw.getparams()
print(params)
nchannels, sampwidth, framerate, nframes = params[:4]
strData = fw.readframes(nframes)
waveData = np.fromstring(strData, dtype=np.int16)
waveData = waveData*1.0/max(abs(waveData))  # normalization
fw.close()

# calculate volume
frameSize = 256
overLap = 128
volume11 = vp.calVolume(waveData,frameSize,overLap)
volume12 = vp.calVolumeDB(waveData,frameSize,overLap)

# plot the wave
# 计算时间轴的长度
time = np.arange(0, nframes)*(1.0/framerate)
time2 = np.arange(0, len(volume11))*(frameSize-overLap)*1.0/framerate
pl.subplot(311)
pl.plot(time, waveData)
pl.ylabel("Amplitude")
pl.subplot(312)
pl.plot(time2, volume11)
pl.ylabel("absSum")
pl.subplot(313)
pl.plot(time2, volume12, c="g")
pl.ylabel("Decibel(dB)")
pl.xlabel("time (seconds)")
pl.show()
image.png

参考文献

  1. https://www.animations.physics.unsw.edu.au/jw/dB.htm

  2. http://mirlab.org/jang/books/audioSignalProcessing/basicFeatureVolume.asp?title=5-2%20Volume%20(%AD%B5%B6q)&language=english

  1. http://ibillxia.github.io/blog/2013/05/15/audio-signal-process-time-domain-volume-python-realization/

你可能感兴趣的:(语音信号处理之时域分析-音量)