语音信号语谱图

 语谱图

最近再看深度学习解决语音识别的问题,疑惑的是语音信号的语谱图是个什么东东,特地查了一下,仅供参考:

1 定义:

语音信号的傅里叶分析的显示图形称为语谱图(sonogram或者spectrogram语谱图是一种三维频谱,它是表示语音频谱随时间变化的图形,其纵轴为频率,横轴为时间。任一给定频率成分在给定时刻的强弱用相应点的灰度或色调的浓淡来表示。用语谱图分析语音又称为语谱分析。语谱图中显示了大量的与语音的语句特性有关的信息,它综合了频谱图和时域波形的特点,明显地显示出语音频谱随时间的变化情况,或者说是一种动态的频谱。可以用语谱仪来记录这种谱图。

2 求解:

对于一段语音信号x(t),首先分帧,变为x(m,n)(n为帧长,m为帧的个数),做FFT变换,得到X(m,n),做周期图Y(m,n)Y(m,n) = X(m,n) * X(m,n)’,然后取10 *log10(Y(m,n)),m根据时间变换一下刻度Mn根据频率变化一下刻度N,就(M,N, 10*log10(Y(m,n) 画成二维图就是语谱图了(也可画成三维图)。


3 如何看图(看图说话,哈哈):

我们可以观察语音不同频段的信号强度随时间的变化情况。由于信号本身频率丰富,不太容易看出规律,我们可以观察一下纯粹的语音数据的语谱图(见上图)。从图中可以看到明显的一条条横方向的条纹,我们称为声纹不清楚这个叫法准不准确),有很多应用。条纹的地方实际是颜色深的点聚集的地方,随时间延续,就延长成条纹,也就是表示语音中频率值为该点横坐标值的能量较强,在整个语音中所占比重大,那么相应影响人感知的效果要强烈得多。而一般语音中数据是周期性的,所以,能量强点的频率分布是频率周期的,即存在300Hz强点,则一般在n*300Hz点也会出现强点,所以我们看到的语谱图都是条纹状的。

尽管客观人发声器官的音域是有限度的,即一般人发声最高频率为4000Hz,乐器的音域要比人宽很多,打击乐器的上限可以到20KHz。但是,由于我们数字分析频率时,采用的是算法实现的,一般是FFT,所以其结果是由采样率决定的,即尽管是上限为4000Hz的语音数据,如果采用16Khz的采样率来分析,则仍然可以在4000Hz以上的频段发现有数据分布,则可以认为是算法误差,非客观事实。

 

4 matlab程序(已调试过,正确):

Main

[x,fs,nbits]=wavread('keshi.wav');

specgram(x,512,fs,100)%语谱图函数

xlabel('时间(s)')

ylabel('频率(Hz)')

title('“概率”语谱图')

 

function [yo,fo,to] = specgram(varargin)

%SPECGRAM Spectrogram using aShort-Time Fourier Transform (STFT).

%  SPECGRAM has been replaced by SPECTROGRAM.  SPECGRAM still works but

%  may be removed in the future. Use SPECTROGRAM instead. Type help

%  SPECTROGRAM for details.

%

%  See also PERIODOGRAM, SPECTRUM/PERIODOGRAM, PWELCH, SPECTRUM/WELCH,GOERTZEL.

 

%  Author(s): L. Shure, 1-1-91

%              T. Krauss, 4-2-93, updated

%  Copyright 1988-2010 The MathWorks, Inc.

%  $Revision: 1.8.4.6 $  $Date:2010/02/17 19:00:23 $

 

error(nargchk(1,5,nargin,'struct'))

[msg,x,nfft,Fs,window,noverlap]=specgramchk(varargin);

if ~isempty(msg), error(generatemsgid('SigErr'),msg); end

   

nx = length(x);

nwind = length(window);

if nx < nwind    % zero-pad x if it has length lessthan the window length

   x(nwind)=0;  nx=nwind;

end

x = x(:); % make a column vector for ease later

window = window(:); % be consistent with data set

 

ncol =fix((nx-noverlap)/(nwind-noverlap));

colindex = 1 +(0:(ncol-1))*(nwind-noverlap);

rowindex = (1:nwind)';

if length(x)<(nwind+colindex(ncol)-1)

   x(nwind+colindex(ncol)-1) = 0;   % zero-pad x

end

 

if length(nfft)>1

   df = diff(nfft);

   evenly_spaced = all(abs(df-df(1))/Fs<1e-12);  % evenly spaced flag (boolean)

   use_chirp = evenly_spaced & (length(nfft)>20);

else

   use_chirp = 0;

end

 

if (length(nfft)==1) || use_chirp

   y = zeros(nwind,ncol);

 

    % put x into columns of y with theproper offset

    % should be able to do this withfancy indexing!

   y(:) = x(rowindex(:,ones(1,ncol))+colindex(ones(nwind,1),:)-1);

 

    % Apply the window to the array ofoffset signal segments.

   y = window(:,ones(1,ncol)).*y;

 

    if ~use_chirp     % USE FFT

       % now fft ywhich does the columns

       y = fft(y,nfft);

       if ~any(any(imag(x)))    % x purely real

           if rem(nfft,2),    % nfft odd

                select = 1:(nfft+1)/2;

           else

                select = 1:nfft/2+1;

           end

           y = y(select,:);

       else

           select = 1:nfft;

       end

       f = (select - 1)'*Fs/nfft;

    else % USE CHIRP Z TRANSFORM

       f = nfft(:);

       f1 = f(1);

       f2 = f(end);

       m = length(f);

       w = exp(-1i*2*pi*(f2-f1)/(m*Fs));

       a = exp(1i*2*pi*f1/Fs);

       y = czt(y,m,w,a);

    end

else  % evaluate DFT on given set offrequencies

   f = nfft(:);

   q = nwind - noverlap;

   extras = floor(nwind/q);

   x = [zeros(q-rem(nwind,q)+1,1); x];

    % create windowed DTFT matrix(filter bank)

   D =window(:,ones(1,length(f))).*exp((-1i*2*pi/Fs*((nwind-1):-1:0)).'*f');

   y = upfirdn(x,D,1,q).';

   y(:,[1:extras+1 end-extras+1:end]) = [];

end

 

t = (colindex-1)'/Fs;

 

% take abs, and use image to displayresults

if nargout == 0

   newplot;

    if length(t)==1

       imagesc([0 1/f(2)],f,20*log10(abs(y)+eps));axis xy; colormap(jet)

    else

       % Shift timevector by half window length; the overlap factor has

       % already beenaccounted for in the colindex variable.

       t = ((colindex-1)+((nwind)/2)')/Fs;

       imagesc(t,f,20*log10(abs(y)+eps));axis xy; colormap(jet)

    end

   xlabel('Time')

   ylabel('Frequency')

elseif nargout == 1,

   yo = y;

elseif nargout == 2,

   yo = y;

   fo = f;

elseif nargout == 3,

   yo = y;

   fo = f;

   to = t;

end

 

function [msg,x,nfft,Fs,window,noverlap] = specgramchk(P)

%SPECGRAMCHK Helper function forSPECGRAM.

%  SPECGRAMCHK(P) takes the cell array P and uses each cell as

%  an input argument.  Assumes P hasbetween 1 and 5 elements.

 

msg = [];

 

x = P{1};

if (length(P) > 1) && ~isempty(P{2})

   nfft = P{2};

else

   nfft = min(length(x),256);

end

if (length(P) > 2) && ~isempty(P{3})

   Fs = P{3};

else

   Fs = 2;

end

if length(P) > 3 && ~isempty(P{4})

   window = P{4};

else

    if length(nfft) == 1

       window = hanning(nfft);

    else

       msg = 'You must specify awindow function.';

    end

end

if length(window) == 1, window = hanning(window); end

if (length(P) > 4) && ~isempty(P{5})

   noverlap = P{5};

else

   noverlap = ceil(length(window)/2);

end

% NOW do error checking

if (length(nfft)==1) && (nfft

   msg = 'Requires window''slength to be no greater than the FFT length.';

end

if (noverlap >= length(window)),

   msg = 'Requires NOVERLAPto be strictly less than the window length.';

end

if (length(nfft)==1) && (nfft ~= abs(round(nfft)))

   msg = 'Requires positiveinteger values for NFFT.';

end

if (noverlap ~= abs(round(noverlap))),

   msg = 'Requires positiveinteger value for NOVERLAP.';

end

if min(size(x))~=1,

   msg = 'Requires vector(either row or column) input.';

end

 

感谢:

1http://blog.csdn.net/jiangyangbo/article/details/5899264

2百度

你可能感兴趣的:(语音)