音乐处理原理第一章:音乐表示

标题

  • 乐谱表示
  • 符号表示
    • MIDI表示
    • 计分表示
    • 光学音乐识别
  • 音频表示
    • 波和波形
    • 频率和音调
    • 动态、强度和响度
    • 音色
  • 总结
  • 一些习题

Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications. Meinard Müller 学习笔记

乐谱表示

符号表示

MIDI表示

模拟按击电子琴状态:每一次按键可以用音符编号,key velocity,通道和时间戳表示。

  • midi音调pitch编号note number:0 ~ 127的数,总共128个音调,表示 C − 1 ∼ G # 9 \mathrm{C^-1 \sim G^{\#}9} C1G#9之间音调。
  • key velocity:0 ~ 127的整数,决定音量大小或声音衰减速度
  • 通道channel:0 ~ 15的整数,音声通道
  • 时间戳timestamp:一个整数,表示要等待多少个时钟脉冲数/滴答数。

MIDI把四分之一音符划分成若干个时钟脉冲数/滴答数,个数记作PPQN(pulses per quarter note, 或者TPQN, ticks per quarter note, 或PPQ, TPQ)。每一MIDI文件头部都要设置PPQN作为接下来MIDI序列计算timestamp标准。PPQN 默认为120,即一个四分之一音符为120个时钟脉冲/滴答数。
MIDI也可以设置一个绝对时间的四分之一音符。例如可以设置0.6秒一个四分之一音符,这样可以换算成一个时钟脉冲/滴答数为5毫秒。还有一个计量单位是BPM(beats per minute),0.6秒一个四分之一音符即为100BPM(一分钟打100下四分之一音符)。

计分表示

MusicXML,每一个音符属性都用一个标签表示,例如表示一个 E b 4 \mathrm{E^b4} Eb4音调:

<note>
  <pitch>
  <step>Estep>
  <alter>-1alter>
  <octave>4octave>
  pitch>
note>

光学音乐识别

对乐谱电子图片进行扫描识别

音频表示

波和波形

声音本质是气压振动,波形图反映了声音传播时气压相对于平均气压的变化,波峰指声音传播时气压最高点,波谷指声音传播时气压最低点。气压高低即空气分子疏密程度,分子越密集,气压越高。

频率和音调

  • 周期Period
    波是周期运动。在波形图中,从一个波峰到另一个波峰时间记为一个周期。
  • 频率Frequency
    • 频率f = 1 / 周期T,单位Hz
    • 人耳接受频率为20Hz - 20kHz
    • 频率越高,音调越高
  • 振幅Amplitude
    指波峰到均值的差值。(不是波峰和波谷差值)
  • 相位Phase
    波形图在时间0时的值。

把正弦波认为是最基础的声波,正弦波产生的声音叫谐波音(harmonic sound)或纯音(pure tone)。国际标准把440Hz的正弦波记为音调A4
从听觉感知上,如果两个音调频率成2倍数关系,那么这两个音调听起来是相似的。例如A3(220Hz),A4(440Hz),A5(880Hz)三个音听起来很相似。另外人类感觉到A4的认知距离和A4到A5的认知距离是一样的,所以人类对音调感知本质上是对数关系。
结合MIDI的音调编号和十二平均律,可以推算每个音调对应的频率(A4的MIDI编号是69):
F p i t c h ( p ) = 2 ( p − 69 ) / 12 ⋅ 440 H z F_{pitch}(p) = 2^{(p-69)/12} \cdot 440 \mathrm{Hz} Fpitch(p)=2(p69)/12440Hz
每个半音相差频率是一个常数:
F p i t c h ( p + 1 ) F p i t c h ( p ) = 2 12 \frac{F_{pitch}(p+1)}{F_{pitch}(p)} = \sqrt[12]{2} Fpitch(p)Fpitch(p+1)=122
更一般,可以用cent最为划分音程一个基础单位:一个八度划分成1200个cent,即每个半音100个cent。一个cent音调变化太小,经验表明,成年人可以准确识别出25cent的音调差异,受过训练的人甚至可以识别10cent音调差异。
现实世界则是用分音泛音来表示音调。

  • 分音partial
    一整根弦/空气柱的振动作为基音,称第一分音。然后对这个弦/空气柱进行整数划分,二分之一长为第二分音,三分之一长为第三分音,以此类推。
  • 泛音harmonic
    泛音则是各种分音的整数倍
  • 陪音overtone
    除了基音之外的分音
  • 偏差音inharmonicity
    乐器的泛音频率和基本频率差值

例如一个分音/陪音 ω \omega ω的频率为65.2Hz(C2),那么它的泛音列频率为 ω , 2 ω , 3 ω , 4 ω . . . \omega, 2\omega,3\omega,4\omega... ω,2ω,3ω,4ω...等等。其中2次幂倍数的泛音是高八音度: ω \omega ω为C2, 2 ω 2\omega 2ω为C3, 4 ω 4\omega 4ω为C4; 3 ω 3\omega 3ω和G3相似(纯五度),如图:

音调频率cent差值 0 0 +2 0 -14 +2 -31 0 +4 -14 -49 +2 +41 -31 -12 0
音调 C 2 \mathrm{C2} C2 C 3 \mathrm{C3} C3 G 3 \mathrm{G3} G3 C 4 \mathrm{C4} C4 E 4 \mathrm{E4} E4 G 4 \mathrm{G4} G4 B b 4 \mathrm{B^b4} Bb4 C 5 \mathrm{C5} C5 D 5 \mathrm{D5} D5 E 5 \mathrm{E5} E5 F # 5 \mathrm{F^\#5} F#5 G 5 \mathrm{G5} G5 A b 5 \mathrm{A^b5} Ab5 B b 5 \mathrm{B^b5} Bb5 B 5 \mathrm{B5} B5 C 6 \mathrm{C6} C6
泛音 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  • 注:泛音 3 ω 3\omega 3ω比G3的频率高2个cent

动态、强度和响度

响度loudness,本质指声音强度** intensity**,响度范围即为动态dynamics(音量范围)。
声音功率指单位时间内声源传给空气的能量,而声音强度则指单位面积上的声音功率,单位 W / m 2 W/m^2 W/m2。人类感知最小的声音强度记为听阈(threshold of hearing,TOH)
I T O H = 1 0 − 12 W / m 2 I_{TOH} = 10^{-12}W/m^2 ITOH=1012W/m2
人类感知最大的声音强度记为痛阈(threshold of pain,TOP)
I T O P = 10 W / m 2 I_{TOP} = 10W/m^2 ITOP=10W/m2
实践中,声音强度用分贝衡量:
d B ( I ) = 10 ⋅ l o g 10 ( I I T O H ) I T O H : 0 d B I T O P : 130 d B d B ( 2 I ) − d B ( I ) ≈ 3 dB(I) = 10\cdot log_{10}(\frac{I}{I_{TOH}}) \\ I_{TOH} : 0dB \\ I_{TOP} : 130dB \\ dB(2I) - dB(I) \approx 3 dB(I)=10log10(ITOHI)ITOH:0dBITOP:130dBdB(2I)dB(I)3

  • 根据分贝公式,2倍声音强度的分贝值比原声音强度分贝值大差值约等于3。
  • 另外人会随便感知声音频率的不同,TOHTOP也会跟着变化,一般随着频率的升高而降低。

音色

  • ADSR模型
    把发声时间内波形的波峰(波谷)连成一条曲线,那么根据根据曲线起伏可以划分为A(attack)-D(delay)-S(sustain)-R(release)四个阶段(类比弹钢琴上一个键的过程)。而相同音调的不同音色表现的ADSR曲线是不一样的(尽管它们都可以达到该音调的频率和振幅)。
  • tremolo/vibrato
    都属于颤音,tremolo相当于调幅,vibrato相当于调频。调制有两个必要参数:调制速率和调制幅度。

总结

转录
翻译
合成
识别
音频
符号
乐谱
  • 通过把现实世界乐谱翻译成符号,再由计算机将符号转录成音频播放。

一些习题

  • 任意半音频率比值,或任意频率的半音距离:
    F p i t c h ( p + k ) F p i t c h ( p ) = 2 k / 12 d i s t a n c e ( k ) = 12 ⋅ l o g 2 ω 1 ω 2 \frac{F_{pitch}(p+k)}{F_{pitch(p)}} = 2^{k/12} \\ distance(k) = 12\cdot log_2{\frac{\omega_1}{\omega_2 }} Fpitch(p)Fpitch(p+k)=2k/12distance(k)=12log2ω2ω1
  • 假设一个八度有17个音调,并设一个音调编号p=100,它的频率为1000Hz。音调编号共有256个,记为0~255。那么在这个模型中,p=83,p=66,p=49的音调编号对应频率为多少,相邻两个音调差多少个cent?
    此 刻 任 意 音 调 频 率 比 值 为 : F p i t c h ( p + k ) F p i t c h ( p ) = 2 k / 17 带 入 F p i t c h ( 100 ) = 1000 H z 得 : F p i t c h ( p ) = 2 ( k − 100 ) / 17 ⋅ 1000 H z 故 : F p i t c h ( 83 ) = 500 H z , F p i t c h ( 66 ) = 250 H z , F p i t c h ( 49 ) = 125 H z 一 个 八 度 共 有 1200 个 c e n t , 故 相 邻 音 调 递 增 的 c e n t = 1200 / 17 ≈ 71 此刻任意音调频率比值为:\frac{F_{pitch}(p+k)}{F_{pitch(p)}} = 2^{k/17} \\ 带入F_{pitch}(100)=1000\mathrm{Hz}得:F_{pitch}(p) = 2^{(k-100)/17} \cdot 1000\mathrm{Hz} \\故:F_{pitch}(83) = 500\mathrm{Hz},F_{pitch}(66) = 250\mathrm{Hz},F_{pitch}(49) = 125\mathrm{Hz} \\ 一个八度共有1200个cent,故相邻音调递增的cent=1200/17 \approx 71 Fpitch(p)Fpitch(p+k)=2k/17Fpitch(100)=1000HzFpitch(p)=2(k100)/171000HzFpitch(83)=500HzFpitch(66)=250HzFpitch(49)=125Hz1200centcent=1200/1771
  • 写一个简单程序,转换音调和MIDI音调编号。
  • 写一个简单程序,计算C2的16个泛音的频率,并找到距离它们最近的音调。同理计算 B b 4 \mathrm{B^b4} Bb4
def pitch_sharp():
    return 'C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'


def pitch_flat():
    return 'C', 'Db', 'D', 'Eb', 'E', 'F', 'Gb', 'G', 'Ab', 'A', 'Bb', 'B'


def to_pitch(num):
    index, bias = num % 12, str(num // 12 - 1)
    result_s, result_f = pitch_sharp()[index] + bias, pitch_flat()[index] + bias
    return set((result_s, result_f))


def check_pitch(pitch):
    if pitch[1] == '#':
        return pitch_sharp().index(pitch[:2]), pitch[2:]
    elif pitch[1] == 'b':
        return pitch_flat().index(pitch[:2]), pitch[2:]
    else:
        return pitch_sharp().index(pitch[:1]), pitch[1:]


def to_num(pitch):
    index, bias = check_pitch(pitch)
    return (int(bias) + 1) * 12 + index

def cent_round(cent):
    return round(cent % 100 if cent % 100 < 50 else cent % 100 - 100)


def gen_harmonic(pitch, n):
    i_ = 1
    while i_ <= n:
        diff_cent_ = math.log2(i_) * 1200
        yield cent_round(diff_cent_), to_pitch(round(diff_cent_ / 100) + to_num(pitch))
        i_ = i_ + 1
# gen_harmonic('Bb4', 16) output:
# diff: 0 , pitch: {'A#4', 'Bb4'}
# diff: 0 , pitch: {'Bb5', 'A#5'}
# diff: 2 , pitch: {'F6'}
# diff: 0 , pitch: {'A#6', 'Bb6'}
# diff: -14 , pitch: {'D7'}
# diff: 2 , pitch: {'F7'}
# diff: -31 , pitch: {'G#7', 'Ab7'}
# diff: 0 , pitch: {'A#7', 'Bb7'}
# diff: 4 , pitch: {'C8'}
# diff: -14 , pitch: {'D8'}
# diff: -49 , pitch: {'E8'}
# diff: 2 , pitch: {'F8'}
# diff: 41 , pitch: {'Gb8', 'F#8'}
# diff: -31 , pitch: {'G#8', 'Ab8'}
# diff: -12 , pitch: {'A8'}
# diff: 0 , pitch: {'Bb8', 'A#8'}
  • 五度相生律Pythagorean tuning,由毕达哥拉斯提出,只使用3:2的比率生成音调频率。毕达哥拉斯音阶Pythagorean scale是只由纯五度(3:2)和八度(2:1)构造的音阶。现对C2操作,频率不断乘3/2,如果产生的频率高于C3频率,则除以2。以此类推,能产生13个频率值(包括最初的C2)。最后一个频率值最接近C2,它和C2的差值被称做毕达哥拉斯逗号Pythagorean comma。用程序模拟过程,并计算距离它们最近的十二音律平均音调和对应差值
def diff_cent(w1, w2):
    return 1200 * math.log2(w1 / w2)


def pythagorean(new_freq, freq, idx):
    return 1.5 * new_freq if 1.5 * new_freq / freq < 2 else 0.75 * new_freq


def gen_tuning(pitch, func):
    i_, n, freq_ = 1, 12, to_freq(pitch)
    new_freq_ = freq_
    while i_ <= n:
        new_freq_ = func(new_freq_, freq_, i_)
        diff_cent_ = diff_cent(new_freq_, freq_)
        new_pitch_ = to_pitch(round(diff_cent_ / 100) + to_num(pitch))
        yield new_freq_ / freq_, new_pitch_, to_freq(tuple(new_pitch_)[0]) / freq_, cent_round(diff_cent_)
        i_ = i_ + 1
# gen_tuning('C2', pythagorean) output:
# pythagorean ratio: 1.5  pitch: {'G2'}  frequency ratio: 1.4983070768766817  diff cent: 2
# pythagorean ratio: 1.125  pitch: {'D2'}  frequency ratio: 1.1224620483093728  diff cent: 4
# pythagorean ratio: 1.6875000000000002  pitch: {'A2'}  frequency ratio: 1.681792830507429  diff cent: 6
# pythagorean ratio: 1.265625  pitch: {'E2'}  frequency ratio: 1.2599210498948734  diff cent: 8
# pythagorean ratio: 1.8984375  pitch: {'B2'}  frequency ratio: 1.887748625363387  diff cent: 10
# pythagorean ratio: 1.423828125  pitch: {'Gb2', 'F#2'}  frequency ratio: 1.414213562373095  diff cent: 12
# pythagorean ratio: 1.06787109375  pitch: {'C#2', 'Db2'}  frequency ratio: 1.0594630943592953  diff cent: 14
# pythagorean ratio: 1.6018066406250002  pitch: {'G#2', 'Ab2'}  frequency ratio: 1.5874010519681994  diff cent: 16
# pythagorean ratio: 1.2013549804687502  pitch: {'Eb2', 'D#2'}  frequency ratio: 1.189207115002721  diff cent: 18
# pythagorean ratio: 1.8020324707031254  pitch: {'Bb2', 'A#2'}  frequency ratio: 1.7817974362806788  diff cent: 20
# pythagorean ratio: 1.3515243530273442  pitch: {'F2'}  frequency ratio: 1.3348398541700344  diff cent: 22
# pythagorean ratio: 1.013643264770508  pitch: {'C2'}  frequency ratio: 1.0  diff cent: 23

毕达哥拉斯逗号为23

  • 三分损益法:先三分损一再三分益一循环;第六次之后调转,先三分益一再三分损一。
def chinese_harmonic(new_freq, freq, idx):
    return 1.5 * new_freq if (idx % 2 != 0) ^ (idx > 6) else 0.75 * new_freq
# gen_tuning('C2', chinese_harmonic) output:
# chinese ratio: 1.5  pitch: {'G2'}  frequency ratio: 1.4983070768766817  diff cent: 2
# chinese ratio: 1.125  pitch: {'D2'}  frequency ratio: 1.1224620483093728  diff cent: 4
# chinese ratio: 1.6875000000000002  pitch: {'A2'}  frequency ratio: 1.681792830507429  diff cent: 6
# chinese ratio: 1.265625  pitch: {'E2'}  frequency ratio: 1.2599210498948734  diff cent: 8
# chinese ratio: 1.8984375  pitch: {'B2'}  frequency ratio: 1.887748625363387  diff cent: 10
# chinese ratio: 1.423828125  pitch: {'F#2', 'Gb2'}  frequency ratio: 1.414213562373095  diff cent: 12
# chinese ratio: 1.06787109375  pitch: {'Db2', 'C#2'}  frequency ratio: 1.0594630943592953  diff cent: 14
# chinese ratio: 1.6018066406250002  pitch: {'Ab2', 'G#2'}  frequency ratio: 1.5874010519681994  diff cent: 16
# chinese ratio: 1.2013549804687502  pitch: {'Eb2', 'D#2'}  frequency ratio: 1.189207115002721  diff cent: 18
# chinese ratio: 1.8020324707031254  pitch: {'A#2', 'Bb2'}  frequency ratio: 1.7817974362806788  diff cent: 20
# chinese ratio: 1.3515243530273442  pitch: {'F2'}  frequency ratio: 1.3348398541700344  diff cent: 22
# chinese ratio: 2.027286529541016  pitch: {'C3'}  frequency ratio: 2.0  diff cent: 23

结果表明五度相生律三分损益法是一样的调律。

你可能感兴趣的:(音频)