WAV文件格式参考1
WAV文件格式参考2
摘要需要理解的重要信息
Wave Files Store Audio Data
....
Currently, the WaveFile gem supports these sample formats:
Integer PCM at 8, 16, 24, or 32 bits per sample (format code 1)
Floating point PCM at 32 or 64 bits per sample (format code 3)
The formats above inside a WAVE_FORMAT_EXTENSIBLE container (format code 65534)
当前,WaveFile gem支持以下示例格式:
每个样本为8、16、24或32位的整数PCM(格式代码1)
浮点PCM每个样本32或64位(格式代码3)
WAVE_FORMAT_EXTENSIBLE容器内的上述格式(格式代码65534)
Format Code 样本数据存储格式
Indicates how the sample data for the wave file is stored. The most common format is integer PCM, which has a code of 1. Other formats include floating point PCM (3), ADPCM (2), A-law (6), μ-law (7), and WaveFormatExtensible (65534).
指示如何存储wave文件的样本数据。 最常见的格式是整数PCM,其代码为1。其他格式包括浮点PCM(3),ADPCM(2),A-law(6),μ-law(7)和WaveFormatExtensible(65534)。
Number of channels 声(通)道数
Typically a file will have 1 channel (mono) or 2 channels (stereo). A 5.1 surround sound file will have 6 channels.
通常,文件将具有1个通道(单声道)或2个通道(立体声)。 一个5.1环绕声文件将具有6个通道。
Sample rate 每秒出现的样本帧数
The number of sample frames that occur each second. A typical value would be 44,100, which is the same as an audio CD.
每秒出现的样本帧数。 典型值为44100,与音频CD相同。
Bytes per second (a.k.a. byte rate) 字节速率
The spec calls this byte rate, which means the number of bytes required for one second of audio data. This is equal to the bytes per sample frame times the sample rate. So with a bytes per sample frame of 4, and a sample rate of 44,100, this should equal 176,400.
规范将此字节速率称为“字节速率”,这表示一秒钟的音频数据所需的字节数。 这等于每个采样帧的字节乘以采样率。 因此,每个样本帧的字节数为4,采样率为44,100,则该值应等于176,400。
Bytes per sample frame (a.k.a. block align) 规格称为块对齐
Called block align by the spec, this is the number of bytes required to store a single sample frame, i.e. a single sample for each channel. (Sometimes a sample frame is also referred to as a block). Each sample, and the sample frame as a whole, must be a whole number of bytes (i.e. it must be a multiple of 8 bits). For example, a sample frame size of 16 bits is valid, but 12 bits is not.
这是存储单个样本帧(即每个通道的单个样本)所需的字节数。 (有时将样本帧也称为块)。 每个样本以及整个样本帧必须是整数个字节(即,它必须是8位的倍数)。 例如,样本帧大小为16位有效,但12位无效。
要计算该值,首先将每个样本的位舍入到8的下一个整数倍(如有必要),再除以8,然后乘以通道数。 例如:
Bits per sample 每个样本的位数
For integer PCM data, typical values will be 8, 16, or 32. If the sample format doesn’t require this field, it should be set to 0.
对于整数PCM数据,典型值为8、16或32。如果样本格式不需要此字段,则应将其设置为0。
Integer PCM Data Chunk
Format code: 1
This is the most common format, and consists of raw PCM samples as integers. The bits per sample field will indicate the range of the samples:
这是最常见的格式,由原始PCM样本作为整数组成。 每个样本字段的位数将指示样本范围:
Important! Notice that 8-bit samples are unsigned, while larger bit depths are signed.
Samples in a multi-channel PCM wave file are interleaved. That is, in a stereo file, one sample for the left channel will be followed by one sample for the right channel, followed by another sample for the left channel, then right channel, and so forth.
One set of interleaved samples is called a sample frame (also called a block). A sample frame will contain one sample for each channel. In a monophonic file, a sample frame will consist of 1 sample. In a stereo file, a sample frame has 2 samples (one for the left channel, one for the right channel). In a 5-channel file, a sample frame has 5 samples. The bytes per sample frame field in the format chunk gives the size in bytes of each sample frame. This can be useful when seeking to a particular sample frame in the file.
For example, for a 2 channel file with 16-bit PCM samples, the sample data would look like this:
大概意思是 : PCM波形文件中的样本以多通道交错的方式插入。 一个Sample Frame中有左声道数据,和右声道数据, 多个Sample Frame 组成PCM波形文件
Floating Point PCM Data Chunk
Format code: 3
Alternately, PCM samples can be stored as floating point values. This is essentially the same as integer PCM format (i.e. format code 1), except that samples are in the range -1.0 to 1.0. The bits per sample field should be set to 32 or 64 to indicate the precision of the values. Sample frames should be layed out in the same way as described in the “Integer PCM Data Chunk” section above.
可以将PCM样本存储为浮点值。 这与整数PCM格式(即格式代码1)基本相同,不同之处在于样本的范围是-1.0到1.0。 每个样本字段的位应设置为32或64,以指示值的精度。 样本帧的布局应与上文“整数PCM数据块”部分中所述的相同方式进行。
值得注意的是:
如果我们通过ffprobe查看任意一个wav文件,比如接下来的步骤已经把PCM转换为WAV的文件
ffprobe record_to_pcm.wav
Input #0, wav, from 'record_to_pcm.wav':
Metadata:
encoder : Lavf58.45.100
Duration: 00:00:04.06, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
关于pcm_s16le ([1][0][0][0] / 0x0001) 通常对应的 格式代码就是1, 如果是Mac平台, 本身默认的是pcm_f32le, 浮点类型, 那么就需要 对应的格式代码3
WAV文件格式布局:
前面有44个字节的文件头,紧跟在后面的就是音频数据(比如PCM数据)
- NumChannels:声道数
- SampleRate:采样率(Hz)
- ByteRate:每秒多少个字节(Byte/s)
- BitsPerSample:每个样本的位数
每一个chunk(数据块)都由3部分组成:
id:chunk的标识
data size:chunk的数据部分大小,字节为单位
data,chunk的数据部分
整个WAV文件是一个RIFF chunk,它的data由3部分组成:
format:文件类型
fmt chunk
音频参数相关的chunk
它的data里面有采样率、声道数、位深度等参数信息
data chunk
音频数据相关的chunk
它的data就是真正的音频数据(比如PCM数据)
RIFF chunk除去data chunk的data(音频数据)后,剩下的内容可以称为:WAV文件头,一般是44字节。
通过ffmpeg命令行PCM转WAV
ffmpeg -ar 44100 -ac 2 -f f32le -i record_to_pcm.pcm record_to_pcm.wav
[f32le @ 0x7fedce80de00] Estimating duration from bitrate, this may be inaccurate
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, f32le, from 'record_to_pcm.pcm':
Duration: 00:00:04.06, bitrate: 2822 kb/s
Stream #0:0: Audio: pcm_f32le, 44100 Hz, stereo, flt, 2822 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'record_to_pcm.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 700kB time=00:00:04.06 bitrate=1411.4kbits/s speed= 722x
video:0kB audio:700kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.010882%
通过ffprobe命令行查看文件信息
ffprobe record_to_pcm.wav
Input #0, wav, from 'record_to_pcm.wav':
Metadata:
encoder : Lavf58.45.100
Duration: 00:00:04.06, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
详细文件信息
大小: 716878字节
二进制格式
播放(正常)
通过代码编写WAV Header
/*
* 复习 :
* short占据的内存大小是2 个byte;
* int占据的内存大小是4 个byte;
* long占据的内存大小是4 个byte;
* float占据的内存大小是4 个byte;
* double占据的内存大小是8 个byte;
* char占据的内存大小是1 个byte。
*
*
* uint8_t 无符号1个字节的整型
* uint16_t 无符号2个字节的整型
* uint32_t 无符号4个字节的整型
* uint64_t 无符号8个字节的整型
*
*/
#define AUDIO_FORMAT_PCM 1
#define AUDIO_FORMAT_FLOAT 3
// 注意 :
// 注意 :
// 注意 :
// 注意 : 这里转成WAV 对应的编码格式是 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
// 注意 : 也就是如果电脑本身是 f32le的编码格式, 那么就播放不了, 这里就需要重采样为f32le
// 创建一个结构体
typedef struct {
// RIFF Chunk
uint8_t riffChunkID[4] = {'R','I','F','F'};
uint32_t riffChunkSize;
// DATA
uint8_t format[4] = {'W','A','V','E'};
// FMT Chunk
uint8_t fmtChunkID[4] = {'f','m','t',' '};
uint32_t fmtChunkSize = 16;
//编码格式(音频编码)
uint16_t audioFormat = AUDIO_FORMAT_FLOAT;
//声道数
uint16_t numChannel;
//采样率
uint32_t sampleRate;
//字节率
uint32_t byteRate;
//一个样本的字节数
uint16_t blockAlign;
// 位深度
uint16_t bitsPerSample;
// DATA Chunk
uint8_t dataChunkID[4] = {'d','a','t','a'};
uint32_t dataChunkSize;
} WAVHeader;
class WAVhander
{
public:
static void pcm_To_wav(WAVHeader &header,
char *pcmFile,
char *wavFile);
};
转换
// WAV格式参考微软文档 https://docs.microsoft.com/en-us/previous-versions/windows/desktop/bb280526(v=vs.85)
void WAVhander::pcm_To_wav(WAVHeader &header,
char *pcmFileName,
char *wavFileName) {
// 检索 设置格式类型的数据 的最小原子单位(以字节为单位)。
// blockAlign属性的值必须等于Channels与bitsPerSample的乘积除以8(每字节位数), 固定公式。
// 软件必须一次处理多个blockAlign字节数据。 写入设备和从设备读取的数据必须始终从块的开头开始。
header.blockAlign = header.bitsPerSample * header.numChannel >> 3;
// 比特率(Byte/s) = 采样率 * blockAlign
header.byteRate = header.sampleRate * header.blockAlign;
//打开PCM文件
QFile pcmFile(pcmFileName);
if (!pcmFile.open(QFile::ReadOnly)) {
qDebug() << "PCM文件打开失败" << pcmFileName;
return;
};
//(音频数据大小)文件大小
header.dataChunkSize = pcmFile.size();
// 文件Data所有数据大小 + WAVHeader大小 - 固定RIFFID -
header.riffChunkSize = header.dataChunkSize
+ sizeof(WAVHeader)
- sizeof(header.riffChunkID)
- sizeof(header.riffChunkSize);
//打开WAV文件
QFile wavFile(wavFileName);
if (!wavFile.open(QFile::WriteOnly)) {
qDebug() << "PCM文件打开失败" << wavFileName;
pcmFile.close();
return;
};
//首先写入头部数据
wavFile.write((const char *)&header,sizeof (WAVHeader));
//写入data数据
char buffer[1024];
int size;
while ((size = pcmFile.read(buffer,sizeof(buffer))) > 0) {
wavFile.write(buffer,size);
}
qDebug() << "转换完成";
pcmFile.close();
wavFile.close();
}
调用函数
WAVHeader header;
header.numChannel = 2;
header.sampleRate = 44100;
header.bitsPerSample = 32; //
WAVhander::pcm_To_wav(header,
(char *)"/Users/lumi/Desktop/record_to_pcm.pcm",
(char *)"/Users/lumi/Desktop/pcm_To_wav.wav" );
转换后文件
大小: 1433644字节
跟ffmpeg转出来的相差也太多了716878 字节
(716878 * 2 = 1433756) - 1433644 = 112 字节 以2倍计算大概也相差了112个字节
对比
因为采样的位数从原来的16变成了32, 因为Mac下用格式代码3, 所以计算大了一倍左右.
因为Mac下用格式代码1 会有杂音, 可以试试.
demo: 3_record_pcm_to_wav