MP3文件按帧(frame)依次存储,解码时也是逐帧解码,所以我们应该首先弄清MP3帧内的数据的封装形式。帧的结构如下图所示:
图中sync表示帧同步字,每一帧以同步字开始;side info表示帧边信息;main_data_end是帧边信息结构中的一个成员。
每一帧依次存放帧头(header)、帧边信息(side information)、主信息(main information)。其中的主信息由增益因子(scale factor)和主数据(main data)组成。
每一帧的帧头长4字节,如果帧头信息中的protection_bit为零,帧头之后有2字节的循环冗余校检(CRC)数据。帧头在前面已经讲解过了,不再赘述。
帧边信息的长度计算方法见《 (二)用JAVA编写MP3解码器——帧头信息解码 》内Header.java第207行。
主信息的长度计算方法见《 (二)用JAVA编写MP3解码器——帧头信息解码 》内Header.java第218行。
从上图可以看出,某一帧的主信息可能不在它本身这一帧内,可能有一部分存放在上一帧内,也有可能有一部分数据存放在下一帧内。例如上图中第3帧的主信息就存放在3个相邻的帧内、第1帧的数据包含相邻的三帧的主数据。这样使某些帧主信息太小的帧(如上图中的第1帧)不至于帧的长度太短,提高存储效率。
对固定码率(CBR)编码的MP3,每一帧的长度几乎相等,由于有的帧有附加位而多一字节;对平均码率(ABR)编码的MP3,每一帧的长度也几乎相等;对变化码率(VBR)编码的MP3来说,每一帧的长度不同,长度相差会比较大,但最长的帧不超过1732字节。
VBR编码的MP3的第一帧没有存储以哈夫曼编码的主数据,存储的是VBR编码信息,其中TOC域存放的是帧百分比和文件长度的对应关系,我们知道VBR编码的MP3每帧长度不一定相同,但每帧解码成为PCM数据长度是相同的,即每帧的播放时长相同。利用TOC域的数据就使文件按时间轴拖放播放时,定位到正确的文件位置成为可能。
MPEG 1.0采用标准的立体声编码的帧,每一帧的主信息被划分为两个粒度组(granule),每个粒度组又分为两个声道(channel)。知道了MP3帧数据的封装形式,就不难理解后文中多处出现的for(gr...)和for(ch...)循环语句了。
Layer3.java内的解码一帧MP3(MPEG 1.0 Audio Layer III简称MP3)定义的数据结构和初始化解码器用到的许多常量如下:
/* * Layer3.java -- 解码Layer3 * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. */ package decoder; public final class Layer3 implements ILayer123 { private static Header objHeader; private static Synthesis objFilter; private static BitStream objInBitStream; private static int intWhichChannel; private static int intMaxGr; private static int intChannels; private static int intFirstChannel; private static int intLastChannel; private static int[] intSfbIdxLong; private static int[] intSfbIdxShort; private static boolean boolIntensityStereo; //强度立体声:true public Layer3(BitStream bs,Header h, Synthesis filter, int wch) { objInBitStream = bs; objHeader = h; intChannels = objHeader.getChannels(); intWhichChannel = wch; intMaxGr = (objHeader.getVersion() == Header.MPEG1) ? 2 : 1; objFilter = filter; objSI = new SideInfo(); objHuffBits = new HuffmanBits(objInBitStream); objSideBS = new BitStream(36); scfL = new int[2][23]; scfS = new int[2][3][13]; is = new int[32 * 18 + 4]; xr = new float[2][32][18]; intWidthLong = new int[22]; intWidthShort = new int[13]; floatRawOut = new float[36]; floatPrevBlck = new float[2][32][18]; cs = new float[] { 0.857492925712f, 0.881741997318f, 0.949628649103f, 0.983314592492f, 0.995517816065f, 0.999160558175f, 0.999899195243f, 0.999993155067f }; ca = new float[] { -0.5144957554270f, -0.4717319685650f, -0.3133774542040f, -0.1819131996110f, -0.0945741925262f, -0.0409655828852f, -0.0141985685725f, -0.00369997467375f }; int i; /* * floatPowIS[]:用于查表求 v^(4/3),v是经哈夫曼解码出的一个(正)值,该值的范围是0..8191 */ floatPowIS = new float[8207]; for (i = 0; i < 8207; i++) floatPowIS[i] = (float) Math.pow(i, 4.0 / 3.0); /* * pow_2[] -- 用于查表求 2 ^ exp, exp 是长块(long block)逆量化的指数 */ floatPow2 = new float[256 + 118 + 4]; for (i = -256; i < 118 + 4; i++) floatPow2[i + 256] = (float) Math.pow(2.0, -0.25 * (i + 210)); //initMDCT(); if (intChannels == 2) switch (intWhichChannel) { case Decoder.CH_LEFT: intFirstChannel = intLastChannel = 0; break; case Decoder.CH_RIGHT: intFirstChannel = intLastChannel = 1; break; case Decoder.CH_BOTH: default: intFirstChannel = 0; intLastChannel = 1; break; } else intFirstChannel = intLastChannel = 0; //--------------------------------------------------------------------- //待解码文件的不同特征用到不同的变量.初始化: //--------------------------------------------------------------------- int intSfreq = objHeader.getSampleFrequency(); intSfreq += (objHeader.getVersion() == Header.MPEG1) ? 0 : ((objHeader.getVersion() == Header.MPEG2) ? 3 : 6); /* * ANNEX B,Table 3-B.8. Layer III scalefactor bands */ switch (intSfreq) { case 0: /* MPEG 1, sampling_frequency=0, 44.1kHz */ intSfbIdxLong = new int[] { 0, 4, 8, 12, 16, 20, 24, 30, 36, 44, 52, 62, 74, 90, 110, 134, 162, 196, 238, 288, 342, 418, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 16, 22, 30, 40, 52, 66, 84, 106, 136, 192 }; break; case 1: /* MPEG 1, sampling_frequency=1, 48kHz */ intSfbIdxLong = new int[] { 0, 4, 8, 12, 16, 20, 24, 30, 36, 42, 50, 60, 72, 88, 106, 128, 156, 190, 230, 276, 330, 384, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 16, 22, 28, 38, 50, 64, 80, 100, 126, 192 }; break; case 2: /* MPEG 1, sampling_frequency=2, 32kHz */ intSfbIdxLong = new int[] { 0, 4, 8, 12, 16, 20, 24, 30, 36, 44, 54, 66, 82, 102, 126, 156, 194, 240, 296, 364, 448, 550, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 16, 22, 30, 42, 58, 78, 104, 138, 180, 192 }; break; case 3: /* MPEG 2, sampling_frequency=0, 22.05kHz */ intSfbIdxLong = new int[] { 0, 6, 12, 18, 24, 30, 36, 44, 54, 66, 80, 96, 116, 140, 168, 200, 238, 284, 336, 396, 464, 522, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 18, 24, 32, 42, 56, 74, 100, 132, 174, 192 }; break; case 4: /* MPEG 2, sampling_frequency=1, 24kHz */ intSfbIdxLong = new int[] { 0, 6, 12, 18, 24, 30, 36, 44, 54, 66, 80, 96, 114, 136, 162, 194, 232, 278, 330, 394, 464, 540, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 18, 26, 36, 48, 62, 80, 104, 136, 180, 192 }; break; case 5: /* MPEG 2, sampling_frequency=2, 16kHz */ intSfbIdxLong = new int[] { 0, 6, 12, 18, 24, 30, 36, 44, 54, 66, 80, 96, 116, 140, 168, 200, 238, 284, 336, 396, 464, 522, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 18, 26, 36, 48, 62, 80, 104, 134, 174, 192 }; break; case 6: /* MPEG 2.5, sampling_frequency=0, 11.025kHz */ intSfbIdxLong = new int[] { 0, 6, 12, 18, 24, 30, 36, 44, 54, 66, 80, 96, 116, 140, 168, 200, 238, 284, 336, 396, 464, 522, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 18, 26, 36, 48, 62, 80, 104, 134, 174, 192 }; break; case 7: /* MPEG 2.5, sampling_frequency=1, 12kHz */ intSfbIdxLong = new int[] { 0, 6, 12, 18, 24, 30, 36, 44, 54, 66, 80, 96, 116, 140, 168, 200, 238, 284, 336, 396, 464, 522, 576 }; intSfbIdxShort = new int[] { 0, 4, 8, 12, 18, 26, 36, 48, 62, 80, 104, 134, 174, 192 }; break; case 8: /* MPEG 2.5, sampling_frequency=2, 8kHz */ intSfbIdxLong = new int[] { 0, 12, 24, 36, 48, 60, 72, 88, 108, 132, 160, 192, 232, 280, 336, 400, 476, 566, 568, 570, 572, 574, 576 }; intSfbIdxShort = new int[] { 0, 8, 16, 24, 36, 52, 72, 96, 124, 160, 162, 164, 166, 192 }; break; } for (i = 0; i < 22; i++) intWidthLong[i] = intSfbIdxLong[i + 1] - intSfbIdxLong[i]; for (i = 0; i < 13; i++) intWidthShort[i] = intSfbIdxShort[i + 1] - intSfbIdxShort[i]; //--------------------------------------------------------------------- //强度立体声: boolIntensityStereo = objHeader.isIStereo(); if(boolIntensityStereo) { if (objHeader.getVersion() == Header.MPEG1) //MPEG 1.0 is_coef = new float[] { 0.0f, 0.211324865f, 0.366025404f, 0.5f, 0.633974596f, 0.788675135f, 1.0f }; else //MPEG 2.0/2.5 lsf_is_coef = new float[][] { { 0.840896415f, 0.707106781f, 0.594603558f, 0.5f, 0.420448208f, 0.353553391f, 0.297301779f, 0.25f, 0.210224104f, 0.176776695f, 0.148650889f, 0.125f, 0.105112052f, 0.088388348f, 0.074325445f }, { 0.707106781f, 0.5f, 0.353553391f, 0.25f, 0.176776695f, 0.125f, 0.088388348f, 0.0625f, 0.044194174f, 0.03125f, 0.022097087f, 0.015625f, 0.011048543f, 0.0078125f, 0.005524272f } }; } //----------------------------------------------------------------- //MPEG 2.0/2.5 if (objHeader.getVersion() != Header.MPEG1) { i_slen2 = new int[256]; // MPEG 2.0 slen for intensity stereo n_slen2 = new int[512]; // MPEG 2.0 slen for 'normal' mode slen_tab2 = new byte[][][] { { { 6, 5, 5, 5 }, { 6, 5, 7, 3 }, { 11, 10, 0, 0 }, { 7, 7, 7, 0 }, { 6, 6, 6, 3 }, { 8, 8, 5, 0 } }, { { 9, 9, 9, 9 }, { 9, 9, 12, 6 }, { 18, 18, 0, 0 }, { 12, 12, 12, 0 }, { 12, 9, 9, 6 }, { 15, 12, 9, 0 } }, { { 6, 9, 9, 9 }, { 6, 9, 12, 6 }, { 15, 18, 0, 0 }, { 6, 15, 12, 0 }, { 6, 12, 9, 6 }, { 6, 18, 9, 0 } } }; int j, k, l, n; for (i = 0; i < 5; i++) for (j = 0; j < 6; j++) for (k = 0; k < 6; k++) { n = k + j * 6 + i * 36; i_slen2[n] = i | (j << 3) | (k << 6) | (3 << 12); } for (i = 0; i < 4; i++) for (j = 0; j < 4; j++) for (k = 0; k < 4; k++) { n = k + j * 4 + i * 16; i_slen2[n + 180] = i | (j << 3) | (k << 6) | (4 << 12); } for (i = 0; i < 4; i++) for (j = 0; j < 3; j++) { n = j + i * 3; i_slen2[n + 244] = i | (j << 3) | (5 << 12); n_slen2[n + 500] = i | (j << 3) | (2 << 12) | (1 << 15); } for (i = 0; i < 5; i++) for (j = 0; j < 5; j++) for (k = 0; k < 4; k++) for (l = 0; l < 4; l++) { n = l + k * 4 + j * 16 + i * 80; n_slen2[n] = i | (j << 3) | (k << 6) | (l << 9); } for (i = 0; i < 5; i++) for (j = 0; j < 5; j++) for (k = 0; k < 4; k++) { n = k + j * 4 + i * 20; n_slen2[n + 400] = i | (j << 3) | (k << 6) | (1 << 12); } } } // 解码帧边信息等代码... }
需要根据MP3编码的不同特征而初始化为不同常量的部分在构造方法内初始化,这些常量值及含义在手册内都查得到。用到的常量、变量很多,看了让人挠头。构造方法中的形参中有3个类class BitStream、class Header、class Synthesis变量,这三个类的实例先于class Layer的对象实例被初始化。解码的输入是位流对象(BitStream),解码输到PCM缓冲区(在class Synthesis内定义);形参int wch表示输出哪一个声道。
解码的过程就是由经过编码的MP3文件还原成PCM数据,再将PCM数据送入音频硬件播放。我把解码一帧MP3划分为10个步骤,这些步骤在下文中陆续写出来,各个步骤内也将谈到class Layer3内定义的变量,将下文解码帧边信息等各步骤的代码放在class Layer3内就封装好了解码一帧MP3的一个类了。
上一篇:(五)用JAVA编写MP3解码器——解析文件信息
下一篇:(七)用JAVA编写MP3解码器——解码帧边信息
【本程序下载地址】http://jmp123.sourceforge.net/