Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications. Meinard Müller 学习笔记
模拟按击电子琴状态:每一次按键可以用音符编号,key velocity,通道和时间戳表示。
MIDI把四分之一音符划分成若干个时钟脉冲数/滴答数,个数记作PPQN(pulses per quarter note, 或者TPQN, ticks per quarter note, 或PPQ, TPQ)。每一MIDI文件头部都要设置PPQN作为接下来MIDI序列计算timestamp标准。PPQN 默认为120,即一个四分之一音符为120个时钟脉冲/滴答数。
MIDI也可以设置一个绝对时间的四分之一音符。例如可以设置0.6秒一个四分之一音符,这样可以换算成一个时钟脉冲/滴答数为5毫秒。还有一个计量单位是BPM(beats per minute),0.6秒一个四分之一音符即为100BPM(一分钟打100下四分之一音符)。
MusicXML,每一个音符属性都用一个标签表示,例如表示一个 E b 4 \mathrm{E^b4} Eb4音调:
<note>
<pitch>
<step>Estep>
<alter>-1alter>
<octave>4octave>
pitch>
note>
对乐谱电子图片进行扫描识别
声音本质是气压振动,波形图反映了声音传播时气压相对于平均气压的变化,波峰指声音传播时气压最高点,波谷指声音传播时气压最低点。气压高低即空气分子疏密程度,分子越密集,气压越高。
把正弦波认为是最基础的声波,正弦波产生的声音叫谐波音(harmonic sound)或纯音(pure tone)。国际标准把440Hz的正弦波记为音调A4。
从听觉感知上,如果两个音调频率成2倍数关系,那么这两个音调听起来是相似的。例如A3(220Hz),A4(440Hz),A5(880Hz)三个音听起来很相似。另外人类感觉到A4的认知距离和A4到A5的认知距离是一样的,所以人类对音调感知本质上是对数关系。
结合MIDI的音调编号和十二平均律,可以推算每个音调对应的频率(A4的MIDI编号是69):
F p i t c h ( p ) = 2 ( p − 69 ) / 12 ⋅ 440 H z F_{pitch}(p) = 2^{(p-69)/12} \cdot 440 \mathrm{Hz} Fpitch(p)=2(p−69)/12⋅440Hz
每个半音相差频率是一个常数:
F p i t c h ( p + 1 ) F p i t c h ( p ) = 2 12 \frac{F_{pitch}(p+1)}{F_{pitch}(p)} = \sqrt[12]{2} Fpitch(p)Fpitch(p+1)=122
更一般,可以用cent最为划分音程一个基础单位:一个八度划分成1200个cent,即每个半音100个cent。一个cent音调变化太小,经验表明,成年人可以准确识别出25cent的音调差异,受过训练的人甚至可以识别10cent音调差异。
现实世界则是用分音,泛音来表示音调。
例如一个分音/陪音 ω \omega ω的频率为65.2Hz(C2),那么它的泛音列频率为 ω , 2 ω , 3 ω , 4 ω . . . \omega, 2\omega,3\omega,4\omega... ω,2ω,3ω,4ω...等等。其中2次幂倍数的泛音是高八音度: ω \omega ω为C2, 2 ω 2\omega 2ω为C3, 4 ω 4\omega 4ω为C4; 3 ω 3\omega 3ω和G3相似(纯五度),如图:
音调频率cent差值 | 0 | 0 | +2 | 0 | -14 | +2 | -31 | 0 | +4 | -14 | -49 | +2 | +41 | -31 | -12 | 0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
音调 | C 2 \mathrm{C2} C2 | C 3 \mathrm{C3} C3 | G 3 \mathrm{G3} G3 | C 4 \mathrm{C4} C4 | E 4 \mathrm{E4} E4 | G 4 \mathrm{G4} G4 | B b 4 \mathrm{B^b4} Bb4 | C 5 \mathrm{C5} C5 | D 5 \mathrm{D5} D5 | E 5 \mathrm{E5} E5 | F # 5 \mathrm{F^\#5} F#5 | G 5 \mathrm{G5} G5 | A b 5 \mathrm{A^b5} Ab5 | B b 5 \mathrm{B^b5} Bb5 | B 5 \mathrm{B5} B5 | C 6 \mathrm{C6} C6 |
泛音 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
响度loudness,本质指声音强度** intensity**,响度范围即为动态dynamics(音量范围)。
声音功率指单位时间内声源传给空气的能量,而声音强度则指单位面积上的声音功率,单位 W / m 2 W/m^2 W/m2。人类感知最小的声音强度记为听阈(threshold of hearing,TOH):
I T O H = 1 0 − 12 W / m 2 I_{TOH} = 10^{-12}W/m^2 ITOH=10−12W/m2
人类感知最大的声音强度记为痛阈(threshold of pain,TOP) :
I T O P = 10 W / m 2 I_{TOP} = 10W/m^2 ITOP=10W/m2
实践中,声音强度用分贝衡量:
d B ( I ) = 10 ⋅ l o g 10 ( I I T O H ) I T O H : 0 d B I T O P : 130 d B d B ( 2 I ) − d B ( I ) ≈ 3 dB(I) = 10\cdot log_{10}(\frac{I}{I_{TOH}}) \\ I_{TOH} : 0dB \\ I_{TOP} : 130dB \\ dB(2I) - dB(I) \approx 3 dB(I)=10⋅log10(ITOHI)ITOH:0dBITOP:130dBdB(2I)−dB(I)≈3
def pitch_sharp():
return 'C', 'C#', 'D', 'D#', 'E', 'F', 'F#', 'G', 'G#', 'A', 'A#', 'B'
def pitch_flat():
return 'C', 'Db', 'D', 'Eb', 'E', 'F', 'Gb', 'G', 'Ab', 'A', 'Bb', 'B'
def to_pitch(num):
index, bias = num % 12, str(num // 12 - 1)
result_s, result_f = pitch_sharp()[index] + bias, pitch_flat()[index] + bias
return set((result_s, result_f))
def check_pitch(pitch):
if pitch[1] == '#':
return pitch_sharp().index(pitch[:2]), pitch[2:]
elif pitch[1] == 'b':
return pitch_flat().index(pitch[:2]), pitch[2:]
else:
return pitch_sharp().index(pitch[:1]), pitch[1:]
def to_num(pitch):
index, bias = check_pitch(pitch)
return (int(bias) + 1) * 12 + index
def cent_round(cent):
return round(cent % 100 if cent % 100 < 50 else cent % 100 - 100)
def gen_harmonic(pitch, n):
i_ = 1
while i_ <= n:
diff_cent_ = math.log2(i_) * 1200
yield cent_round(diff_cent_), to_pitch(round(diff_cent_ / 100) + to_num(pitch))
i_ = i_ + 1
# gen_harmonic('Bb4', 16) output:
# diff: 0 , pitch: {'A#4', 'Bb4'}
# diff: 0 , pitch: {'Bb5', 'A#5'}
# diff: 2 , pitch: {'F6'}
# diff: 0 , pitch: {'A#6', 'Bb6'}
# diff: -14 , pitch: {'D7'}
# diff: 2 , pitch: {'F7'}
# diff: -31 , pitch: {'G#7', 'Ab7'}
# diff: 0 , pitch: {'A#7', 'Bb7'}
# diff: 4 , pitch: {'C8'}
# diff: -14 , pitch: {'D8'}
# diff: -49 , pitch: {'E8'}
# diff: 2 , pitch: {'F8'}
# diff: 41 , pitch: {'Gb8', 'F#8'}
# diff: -31 , pitch: {'G#8', 'Ab8'}
# diff: -12 , pitch: {'A8'}
# diff: 0 , pitch: {'Bb8', 'A#8'}
def diff_cent(w1, w2):
return 1200 * math.log2(w1 / w2)
def pythagorean(new_freq, freq, idx):
return 1.5 * new_freq if 1.5 * new_freq / freq < 2 else 0.75 * new_freq
def gen_tuning(pitch, func):
i_, n, freq_ = 1, 12, to_freq(pitch)
new_freq_ = freq_
while i_ <= n:
new_freq_ = func(new_freq_, freq_, i_)
diff_cent_ = diff_cent(new_freq_, freq_)
new_pitch_ = to_pitch(round(diff_cent_ / 100) + to_num(pitch))
yield new_freq_ / freq_, new_pitch_, to_freq(tuple(new_pitch_)[0]) / freq_, cent_round(diff_cent_)
i_ = i_ + 1
# gen_tuning('C2', pythagorean) output:
# pythagorean ratio: 1.5 pitch: {'G2'} frequency ratio: 1.4983070768766817 diff cent: 2
# pythagorean ratio: 1.125 pitch: {'D2'} frequency ratio: 1.1224620483093728 diff cent: 4
# pythagorean ratio: 1.6875000000000002 pitch: {'A2'} frequency ratio: 1.681792830507429 diff cent: 6
# pythagorean ratio: 1.265625 pitch: {'E2'} frequency ratio: 1.2599210498948734 diff cent: 8
# pythagorean ratio: 1.8984375 pitch: {'B2'} frequency ratio: 1.887748625363387 diff cent: 10
# pythagorean ratio: 1.423828125 pitch: {'Gb2', 'F#2'} frequency ratio: 1.414213562373095 diff cent: 12
# pythagorean ratio: 1.06787109375 pitch: {'C#2', 'Db2'} frequency ratio: 1.0594630943592953 diff cent: 14
# pythagorean ratio: 1.6018066406250002 pitch: {'G#2', 'Ab2'} frequency ratio: 1.5874010519681994 diff cent: 16
# pythagorean ratio: 1.2013549804687502 pitch: {'Eb2', 'D#2'} frequency ratio: 1.189207115002721 diff cent: 18
# pythagorean ratio: 1.8020324707031254 pitch: {'Bb2', 'A#2'} frequency ratio: 1.7817974362806788 diff cent: 20
# pythagorean ratio: 1.3515243530273442 pitch: {'F2'} frequency ratio: 1.3348398541700344 diff cent: 22
# pythagorean ratio: 1.013643264770508 pitch: {'C2'} frequency ratio: 1.0 diff cent: 23
毕达哥拉斯逗号为23
def chinese_harmonic(new_freq, freq, idx):
return 1.5 * new_freq if (idx % 2 != 0) ^ (idx > 6) else 0.75 * new_freq
# gen_tuning('C2', chinese_harmonic) output:
# chinese ratio: 1.5 pitch: {'G2'} frequency ratio: 1.4983070768766817 diff cent: 2
# chinese ratio: 1.125 pitch: {'D2'} frequency ratio: 1.1224620483093728 diff cent: 4
# chinese ratio: 1.6875000000000002 pitch: {'A2'} frequency ratio: 1.681792830507429 diff cent: 6
# chinese ratio: 1.265625 pitch: {'E2'} frequency ratio: 1.2599210498948734 diff cent: 8
# chinese ratio: 1.8984375 pitch: {'B2'} frequency ratio: 1.887748625363387 diff cent: 10
# chinese ratio: 1.423828125 pitch: {'F#2', 'Gb2'} frequency ratio: 1.414213562373095 diff cent: 12
# chinese ratio: 1.06787109375 pitch: {'Db2', 'C#2'} frequency ratio: 1.0594630943592953 diff cent: 14
# chinese ratio: 1.6018066406250002 pitch: {'Ab2', 'G#2'} frequency ratio: 1.5874010519681994 diff cent: 16
# chinese ratio: 1.2013549804687502 pitch: {'Eb2', 'D#2'} frequency ratio: 1.189207115002721 diff cent: 18
# chinese ratio: 1.8020324707031254 pitch: {'A#2', 'Bb2'} frequency ratio: 1.7817974362806788 diff cent: 20
# chinese ratio: 1.3515243530273442 pitch: {'F2'} frequency ratio: 1.3348398541700344 diff cent: 22
# chinese ratio: 2.027286529541016 pitch: {'C3'} frequency ratio: 2.0 diff cent: 23
结果表明五度相生律和三分损益法是一样的调律。