MPEG编码方法简述

首先,数字音频编码是十分有必要的,原因是无压缩的数字音频信息中包含着很多的冗余信息,其中冗余信息的来源有两个:
一、音频信号中本来就包括的冗余信息,首先是信号幅度分布中小幅度的样值比大幅度样值概率高很多,再一个就是样本之间本身也有相关性。
二、根据人耳特性,存在掩蔽效应等特性,有部分音域在某些情况下本身就听不到,无需编码。

MPEG编码方法简述_第1张图片

首先看一下MPEG-1 Audio LayerII的编码方式框图,可以看到,最上面一行是子带编码的部分,主要的思路是每一帧分为32个子带,对于LayerII来说每个子带又分为36个样点,从而进行编码。下面的部分是MPEG压缩思路,通过心理声学模型和动态比特分配来进行压缩。

音频信息要通过一个多相滤波器组,使每一帧都变换为32个等宽的子带,以此为基础编码,既可以是实现压缩的目的,同时每帧子带足够多,也保证了信号的时间分辨率较高,也就是保证了质量。

这里有几个概念需要提前说明,一个是掩蔽效应,一个是临界频带。

临界频带的定义是,当某个纯音被以他为中心频率且具有一定带宽的连续噪声掩蔽时,若该纯音刚好被听到时的功率等于频带内的噪声功率,这个带宽就是临界频带宽度。
人类的听觉系统相当于25个带通滤波器,也就是可以理解为,人耳可以根据听到的声音在哪个频域,就用哪个滤波器来听声音:
MPEG编码方法简述_第2张图片
提到掩蔽效应需要先提到最小可听阈这个概念,如下图所示,图中每一条曲线,人耳感知到的响度是相同的。也就是说人耳感知响度由声强级和频率来决定,红色线以上的区域会感受到疼痛,蓝线一下则听不到了,因此蓝色线也就是最小可听阈。
MPEG编码方法简述_第3张图片

那么从图中我们可以分析出来,也就是一旦我们要编码的部分低于了最小可听阈值,这一部分就根本不用编码了。

再回到掩蔽效应这个概念,高强度的纯音会掩蔽到该纯音频率附近的声音,也就是说会使该频率附近的最小可听阈值提高,因此考虑到掩蔽效应,可以更多的减少需要编码的部分,如下图所示:
MPEG编码方法简述_第4张图片

这里心理声学模型也就是考虑到了这一点,同时,还要考虑到,乐音和噪声的掩蔽能力不同,噪声的掩蔽能力强,乐音的掩蔽能力弱:
MPEG编码方法简述_第5张图片

由图可以看出,左图的Threshold比右图要高,因此掩蔽的部分更多。

计算每一个自带的全局掩蔽阈值如下:
MPEG编码方法简述_第6张图片

考虑完掩蔽效应后,接着需要考虑的是量化编码,这里先对每一个子带计算MNR值,也就是掩蔽噪声比,计算方式是MNR = SNR - SMR,SNR是信噪比,SMR是信掩比,目的是为了找到最低的MNR值,通过反复计算,不断地给最低MNR值的子带分配更多比特,然后重新计算MNR,直到比特分配完。

然后是计算比例因子,每个自带36个样点,每12样值计算一个比例因子,然后每个子带样值都除以比例因子进行归一化,进行量化计算。

除此之外,还需要对量化级别在3、5、9级时,采用颗粒优化,可以让量化比特数更少。

最后也就是封装成帧,如下图:
MPEG编码方法简述_第7张图片

具体代码过长,部分main函数如下:

int main (int argc, char **argv)
{
  typedef double SBS[2][3][SCALE_BLOCK][SBLIMIT];
  SBS *sb_sample;
  typedef double JSBS[3][SCALE_BLOCK][SBLIMIT];
  JSBS *j_sample;
  typedef double IN[2][HAN_SIZE];
  IN *win_que;
  typedef unsigned int SUB[2][3][SCALE_BLOCK][SBLIMIT];
  SUB *subband;
  frame_info frame;
  frame_header header;
  char original_file_name[MAX_NAME_SIZE];
  char encoded_file_name[MAX_NAME_SIZE];
  short **win_buf;
  static short buffer[2][1152];
  static unsigned int bit_alloc[2][SBLIMIT], scfsi[2][SBLIMIT];
  static unsigned int scalar[2][3][SBLIMIT], j_scale[3][SBLIMIT];
  static double smr[2][SBLIMIT], lgmin[2][SBLIMIT], max_sc[2][SBLIMIT];
  // FLOAT snr32[32];
  short sam[2][1344];  /* was [1056]; */
  int model, nch, error_protection;
  static unsigned int crc;
  int sb, ch, adb;
  unsigned long frameBits, sentBits = 0;
  unsigned long num_samples;
  int lg_frame;
  int i;
  /* Used to keep the SNR values for the fast/quick psy models */
  static FLOAT smrdef[2][32];
  static int psycount = 0;
  extern int minimum;
  time_t start_time, end_time;
  int total_time;
  sb_sample = (SBS *) mem_alloc (sizeof (SBS), "sb_sample");
  j_sample = (JSBS *) mem_alloc (sizeof (JSBS), "j_sample");
  win_que = (IN *) mem_alloc (sizeof (IN), "Win_que");
  subband = (SUB *) mem_alloc (sizeof (SUB), "subband");
  win_buf = (short **) mem_alloc (sizeof (short *) * 2, "win_buf");
  /* clear buffers */
  memset ((char *) buffer, 0, sizeof (buffer));
  memset ((char *) bit_alloc, 0, sizeof (bit_alloc));
  memset ((char *) scalar, 0, sizeof (scalar));
  memset ((char *) j_scale, 0, sizeof (j_scale));
  memset ((char *) scfsi, 0, sizeof (scfsi));
  memset ((char *) smr, 0, sizeof (smr));
  memset ((char *) lgmin, 0, sizeof (lgmin));
  memset ((char *) max_sc, 0, sizeof (max_sc));
  //memset ((char *) snr32, 0, sizeof (snr32));
  memset ((char *) sam, 0, sizeof (sam));
  global_init ();  
  header.extension = 0;
  frame.header = &header;
  frame.tab_num = -1;  /* no table loaded */
  frame.alloc = NULL;
  header.version = MPEG_AUDIO_ID; /* Default: MPEG-1 */
  total_time = 0;
  time(&start_time);     
  programName = argv[0];
  if (argc == 1)  /* no command-line args */
    short_usage ();
  else
    parse_args (argc, argv, &frame, &model, &num_samples, original_file_name,
  encoded_file_name);
  print_config (&frame, &model, original_file_name, encoded_file_name);
  /* this will load the alloc tables and do some other stuff */
  hdr_to_frps (&frame);
  nch = frame.nch;
  error_protection = header.error_protection;
  while (get_audio (musicin, buffer, num_samples, nch, &header) > 0) {
    if (glopts.verbosity > 1)
      if (++frameNum % 10 == 0)
 fprintf (stderr, "[%4u]\r", frameNum);
    fflush (stderr);
    win_buf[0] = &buffer[0][0];
    win_buf[1] = &buffer[1][0];
    adb = available_bits (&header, &glopts);
    lg_frame = adb / 8;
    if (header.dab_extension) {
      /* in 24 kHz we always have 4 bytes */
      if (header.sampling_frequency == 1)
 header.dab_extension = 4;
/* You must have one frame in memory if you are in DAB mode                 */
/* in conformity of the norme ETS 300 401 http://www.etsi.org               */
      /* see bitstream.c            */
      if (frameNum == 1)
 minimum = lg_frame + MINIMUM;
      adb -= header.dab_extension * 8 + header.dab_length * 8 + 16;
    }
    {
      int gr, bl, ch;
      /* New polyphase filter
  Combines windowing and filtering. Ricardo Feb'03 */
      for( gr = 0; gr < 3; gr++ )
 for ( bl = 0; bl < 12; bl++ )
   for ( ch = 0; ch < nch; ch++ )
     WindowFilterSubband( &buffer[ch][gr * 12 * 32 + 32 * bl], ch,
     &(*sb_sample)[ch][gr][bl][0] );
    }
#ifdef REFERENCECODE
    {
      /* Old code. left here for reference */
      int gr, bl, ch;
      for (gr = 0; gr < 3; gr++)
 for (bl = 0; bl < SCALE_BLOCK; bl++)
   for (ch = 0; ch < nch; ch++) {
     window_subband (&win_buf[ch], &(*win_que)[ch][0], ch);
     filter_subband (&(*win_que)[ch][0], &(*sb_sample)[ch][gr][bl][0]);
   }
    }
#endif
#ifdef NEWENCODE
    scalefactor_calc_new(*sb_sample, scalar, nch, frame.sblimit);
    find_sf_max (scalar, &frame, max_sc);
    if (frame.actual_mode == MPG_MD_JOINT_STEREO) {
      /* this way we calculate more mono than we need */
      /* but it is cheap */
      combine_LR_new (*sb_sample, *j_sample, frame.sblimit);
      scalefactor_calc_new (j_sample, &j_scale, 1, frame.sblimit);
    }
#else
    scale_factor_calc (*sb_sample, scalar, nch, frame.sblimit);
    pick_scale (scalar, &frame, max_sc);
    if (frame.actual_mode == MPG_MD_JOINT_STEREO) {
      /* this way we calculate more mono than we need */
      /* but it is cheap */
      combine_LR (*sb_sample, *j_sample, frame.sblimit);
      scale_factor_calc (j_sample, &j_scale, 1, frame.sblimit);
    }
#endif
    if ((glopts.quickmode == TRUE) && (++psycount % glopts.quickcount != 0)) {
      /* We're using quick mode, so we're only calculating the model every
         'quickcount' frames. Otherwise, just copy the old ones across */
      for (ch = 0; ch < nch; ch++) {
 for (sb = 0; sb < SBLIMIT; sb++)
   smr[ch][sb] = smrdef[ch][sb];
      }
    } else {
      /* calculate the psymodel */
      switch (model) {
      case -1:
 psycho_n1 (smr, nch);
 break;
      case 0: /* Psy Model A */
 psycho_0 (smr, nch, scalar, (FLOAT) s_freq[header.version][header.sampling_frequency] * 1000); 
 break;
      case 1:
 psycho_1 (buffer, max_sc, smr, &frame);
 break;
      case 2:
 for (ch = 0; ch < nch; ch++) {
   psycho_2 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
       (FLOAT) s_freq[header.version][header.sampling_frequency] *
       1000, &glopts);
 }
 break;
      case 3:
 /* Modified psy model 1 */
 psycho_3 (buffer, max_sc, smr, &frame, &glopts);
 break;
      case 4:
 /* Modified Psycho Model 2 */
 for (ch = 0; ch < nch; ch++) {
   psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
       (FLOAT) s_freq[header.version][header.sampling_frequency] *
       1000, &glopts);
 }
 break; 
      case 5:
 /* Model 5 comparse model 1 and 3 */
 psycho_1 (buffer, max_sc, smr, &frame);
 fprintf(stdout,"1 ");
 smr_dump(smr,nch);
 psycho_3 (buffer, max_sc, smr, &frame, &glopts);
 fprintf(stdout,"3 ");
 smr_dump(smr,nch);
 break;
      case 6:
 /* Model 6 compares model 2 and 4 */
 for (ch = 0; ch < nch; ch++) 
   psycho_2 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
      (FLOAT) s_freq[header.version][header.sampling_frequency] *
      1000, &glopts);
 fprintf(stdout,"2 ");
 smr_dump(smr,nch);
 for (ch = 0; ch < nch; ch++) 
   psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
       (FLOAT) s_freq[header.version][header.sampling_frequency] *
       1000, &glopts);
 fprintf(stdout,"4 ");
 smr_dump(smr,nch);
 break;
      case 7:
 fprintf(stdout,"Frame: %i\n",frameNum);
 /* Dump the SMRs for all models */ 
 psycho_1 (buffer, max_sc, smr, &frame);
 fprintf(stdout,"1");
 smr_dump(smr, nch);
 psycho_3 (buffer, max_sc, smr, &frame, &glopts);
 fprintf(stdout,"3");
 smr_dump(smr,nch);
 for (ch = 0; ch < nch; ch++) 
   psycho_2 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], //snr32,
      (FLOAT) s_freq[header.version][header.sampling_frequency] *
      1000, &glopts);
 fprintf(stdout,"2");
 smr_dump(smr,nch);
 for (ch = 0; ch < nch; ch++) 
   psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
       (FLOAT) s_freq[header.version][header.sampling_frequency] *
       1000, &glopts);
 fprintf(stdout,"4");
 smr_dump(smr,nch);
 break;
      case 8:
 /* Compare 0 and 4 */ 
 psycho_n1 (smr, nch);
 fprintf(stdout,"0");
 smr_dump(smr,nch);
 for (ch = 0; ch < nch; ch++) 
   psycho_4 (&buffer[ch][0], &sam[ch][0], ch, &smr[ch][0], // snr32,
       (FLOAT) s_freq[header.version][header.sampling_frequency] *
       1000, &glopts);
 fprintf(stdout,"4");
 smr_dump(smr,nch);
 break;
      default:
 fprintf (stderr, "Invalid psy model specification: %i\n", model);
 exit (0);
      }
      if (glopts.quickmode == TRUE)
 /* copy the smr values and reuse them later */
 for (ch = 0; ch < nch; ch++) {
   for (sb = 0; sb < SBLIMIT; sb++)
     smrdef[ch][sb] = smr[ch][sb];
 }
      if (glopts.verbosity > 4) 
 smr_dump(smr, nch);
    }
#ifdef NEWENCODE
    sf_transmission_pattern (scalar, scfsi, &frame);
    main_bit_allocation_new (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
    //main_bit_allocation (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
    if (error_protection)
      CRC_calc (&frame, bit_alloc, scfsi, &crc);
    write_header (&frame, &bs);
    //encode_info (&frame, &bs);
    if (error_protection)
      putbits (&bs, crc, 16);
    write_bit_alloc (bit_alloc, &frame, &bs);
    //encode_bit_alloc (bit_alloc, &frame, &bs);
    write_scalefactors(bit_alloc, scfsi, scalar, &frame, &bs);
    //encode_scale (bit_alloc, scfsi, scalar, &frame, &bs);
    subband_quantization_new (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
         *subband, &frame);
    //subband_quantization (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
    //   *subband, &frame);
    write_samples_new(*subband, bit_alloc, &frame, &bs);
    //sample_encoding (*subband, bit_alloc, &frame, &bs);
#else
    transmission_pattern (scalar, scfsi, &frame);
    main_bit_allocation (smr, scfsi, bit_alloc, &adb, &frame, &glopts);
    if (error_protection)
      CRC_calc (&frame, bit_alloc, scfsi, &crc);
    encode_info (&frame, &bs);
    if (error_protection)
      encode_CRC (crc, &bs);
    encode_bit_alloc (bit_alloc, &frame, &bs);
    encode_scale (bit_alloc, scfsi, scalar, &frame, &bs);
    subband_quantization (scalar, *sb_sample, j_scale, *j_sample, bit_alloc,
     *subband, &frame);
    sample_encoding (*subband, bit_alloc, &frame, &bs);
#endif
    /* If not all the bits were used, write out a stack of zeros */
    for (i = 0; i < adb; i++)
      put1bit (&bs, 0);
    if (header.dab_extension) {
      /* Reserve some bytes for X-PAD in DAB mode */
      putbits (&bs, 0, header.dab_length * 8);     
      for (i = header.dab_extension - 1; i >= 0; i--) {
 CRC_calcDAB (&frame, bit_alloc, scfsi, scalar, &crc, i);
 /* this crc is for the previous frame in DAB mode  */
 if (bs.buf_byte_idx + lg_frame < bs.buf_size)
   bs.buf[bs.buf_byte_idx + lg_frame] = crc;
 /* reserved 2 bytes for F-PAD in DAB mode  */
 putbits (&bs, crc, 8);
      }
      putbits (&bs, 0, 16);
    }
    frameBits = sstell (&bs) - sentBits;
    if (frameBits % 8) { /* a program failure */
      fprintf (stderr, "Sent %ld bits = %ld slots plus %ld\n", frameBits,
        frameBits / 8, frameBits % 8);
      fprintf (stderr, "If you are reading this, the program is broken\n");
      fprintf (stderr, "email [mfc at NOTplanckenerg.com] without the NOT\n");
      fprintf (stderr, "with the command line arguments and other info\n");
      exit (0);
    }
    sentBits += frameBits;
  }
  close_bit_stream_w (&bs);
  if ((glopts.verbosity > 1) && (glopts.vbr == TRUE)) {
    int i;
#ifdef NEWENCODE
    extern int vbrstats_new[15];
#else
    extern int vbrstats[15];
#endif
    fprintf (stdout, "VBR stats:\n");
    for (i = 1; i < 15; i++)
      fprintf (stdout, "%4i ", bitrate[header.version][i]);
    fprintf (stdout, "\n");
    for (i = 1; i < 15; i++)
#ifdef NEWENCODE
      fprintf (stdout,"%4i ",vbrstats_new[i]);
#else
      fprintf (stdout, "%4i ", vbrstats[i]);
#endif
    fprintf (stdout, "\n");
  }
  fprintf (stderr,
    "Avg slots/frame = %.3f; b/smp = %.2f; bitrate = %.3f kbps\n",
    (FLOAT) sentBits / (frameNum * 8),
    (FLOAT) sentBits / (frameNum * 1152),
    (FLOAT) sentBits / (frameNum * 1152) *
    s_freq[header.version][header.sampling_frequency]);
  if (fclose (musicin) != 0) {
    fprintf (stderr, "Could not close \"%s\".\n", original_file_name);
    exit (2);
  }
  fprintf (stderr, "\nDone\n");
  time(&end_time);
  total_time = end_time - start_time;
  printf("total time is %d\n", total_time);  
  exit (0);
}

为了观察编码过程中的各参数,利用trace输出到txt文件中:
在全局声明:
在这里插入图片描述
在main函数中声明:
在这里插入图片描述
在main函数中添加:
MPEG编码方法简述_第8张图片
生成文件:

比例因子:
声道1:
子带[1]: 9 11 10 
子带[2]: 14 14 14 
子带[3]: 19 17 19 
子带[4]: 25 23 24 
子带[5]: 29 28 28 
子带[6]: 23 25 23 
子带[7]: 22 22 24 
子带[8]: 22 22 21 
子带[9]: 29 27 27 
子带[10]: 30 30 32 
子带[11]: 30 29 28 
子带[12]: 27 28 28 
子带[13]: 25 24 28 
子带[14]: 26 27 23 
子带[15]: 23 23 21 
子带[16]: 26 23 25 
子带[17]: 28 31 30 
子带[18]: 30 33 31 
子带[19]: 30 30 29 
子带[20]: 28 29 29 
子带[21]: 29 31 28 
子带[22]: 32 30 30 
子带[23]: 40 39 41 
子带[24]: 54 50 50 
子带[25]: 53 52 55 
子带[26]: 55 53 54 
子带[27]: 51 53 55 
子带[28]: 53 53 52 
子带[29]: 53 53 54 
子带[30]: 53 52 52

比特分配: 
 声道1 
子带[1]: 8
子带[2]: 8
子带[3]: 6
子带[4]: 8
子带[5]: 7
子带[6]: 8
子带[7]: 8
子带[8]: 6
子带[9]: 5
子带[10]: 6
子带[11]: 6
子带[12]: 7
子带[13]: 6
子带[14]: 6
子带[15]: 6
子带[16]: 5
子带[17]: 5
子带[18]: 5
子带[19]: 4
子带[20]: 6
子带[21]: 3
子带[22]: 3
子带[23]: 0
子带[24]: 0
子带[25]: 0
子带[26]: 0
子带[27]: 0
子带[28]: 0
子带[29]: 0
子带[30]: 0

你可能感兴趣的:(MPEG编码方法简述)