iOS 音视频采集与编码

学习过音视频的都知道，不常用的话就会容易忘记。因此，记下以前学些的点滴。（这不是入门贴）

主要分为以下主题做学习记录：

视频采集与编码
音频采集与编码

video.png

视频采集与编码

视频采集

获取输入设备

NSError *deviceError;
    AVCaptureDeviceInput *inputDevice;
    for (AVCaptureDevice *device in [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo])
    {
        if ([device position] == AVCaptureDevicePositionFront)
        {
            inputDevice = [AVCaptureDeviceInput deviceInputWithDevice:device error:&deviceError];
        }
    }

输出设备

 AVCaptureVideoDataOutput *outputDevice = [[AVCaptureVideoDataOutput alloc]init];
    NSString *key = (NSString *)kCVPixelBufferPixelFormatTypeKey;
    
    NSNumber *val = [NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange];
    NSDictionary *videoSettings = [NSDictionary dictionaryWithObject:val forKey:key];
    outputDevice.videoSettings = videoSettings;
    [outputDevice setSampleBufferDelegate:self queue:dispatch_get_main_queue()];

启动相机

    AVCaptureSession *captureSession = [[AVCaptureSession alloc]init];
    [captureSession addInput:inputDevice];
    [captureSession addOutput:outputDevice];
    
    [captureSession beginConfiguration];
    captureSession.sessionPreset = AVCaptureSessionPreset640x480;
    [outputDevice connectionWithMediaType:AVMediaTypeVideo];
    [captureSession commitConfiguration];
    
    AVCaptureVideoPreviewLayer *previewLayer = [[AVCaptureVideoPreviewLayer alloc]initWithSession:captureSession];
    previewLayer.videoGravity = AVLayerVideoGravityResizeAspectFill;
    [self.view.layer addSublayer:previewLayer];
    previewLayer.frame = self.view.bounds;
    [captureSession startRunning];

获取输出的CMSampleBufferRef

-(void) captureOutput:(AVCaptureOutput*)captureOutput 
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer 
fromConnection:(AVCaptureConnection*)connection
{
      CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);
      //TODO ： Send to server
}

基于现在是颜值的社会，美颜什么的功能必须要有。因此需要在视频录制的时候直接做处理。GPUImage是不二之选。下面的Demo code 创建了一个摄像机，并把美化后的图像显示在屏幕上。

//Camera configuration
GPUImageVideoCamera *videoCamera = [[GPUImageVideoCamera alloc] initWithSessionPreset:AVCaptureSessionPreset640x480 cameraPosition:AVCaptureDevicePositionBack];
videoCamera.outputImageOrientation = UIInterfaceOrientationPortrait;
videoCamera.frameRate = 24;

//Filter
GPUImageFilter *brightnessFilter = [[GPUImageBrightnessFilter alloc] init];
GPUImageWhiteBalanceFilter *whiteBalanceFilter = [[GPUImageWhiteBalanceFilter alloc] init];
GPUImageOutput *output = [[GPUImageFilter alloc] init];

//Display
GPUImageView *filteredVideoView = [[GPUImageView alloc] initWithFrame:CGRectMake(0.0, 0.0, viewWidth, viewHeight)];

//Targets
[videoCamera addTarget: brightnessFilter];
[brightnessFilter addTarget: whiteBalanceFilter];
[whiteBalanceFilter addTarget: output];
[whiteBalanceFilter addTarget:filteredVideoView];

//Get the render output
[output setFrameProcessingCompletionBlock:^(GPUImageOutput *output, CMTime time) {
        GPUImageFramebuffer *imageFramebuffer = output.framebufferForOutput;
        CVPixelBufferRef pixelBuffer = [imageFramebuffer pixelBuffer];
        //TODO : Send to server
    }];

[videoCamera startCameraCapture];

在GPUImage 里面，Filter是非常重要的。理解它也很简单。首先是原图输入，然后是一系列的filter处理（duang ～～～加各种特效）。最后就是输出。

filters.png

从上面的demo code可以看到，使用系统库拿到的是CVImageBufferRef ，GPUImage 拿到的是CVPixelBufferRef ，其实它们是同个东西。

typedef CVImageBufferRef CVPixelBufferRef;

视频数据有了，接下来就是编码了。

视频编码

视频编码有硬编码和软编码。

区别:
软编码：使用CPU进行编码
硬编码：使用非CPU进行编码，如显卡GPU、专用的DSP、FPGA、ASIC芯片等

比较
软编码：实现直接、简单，参数调整方便，升级易，但CPU负载重，性能较硬编码低，低码率下质量通常比硬编码要好一点。
硬编码：性能高，低码率下通常质量低于软编码器，但部分产品在GPU硬件平台移植了优秀的软编码算法（如X264）的，质量基本等同于软编码。

下面主要讲iOS 的硬编码。配置编码器，不熟悉的请自行google。

    VTCompressionSessionRef compressionSession = NULL;
    OSStatus status = VTCompressionSessionCreate(NULL,width,height, kCMVideoCodecType_H264, NULL, NULL, NULL, VideoCompressonOutputCallback, (__bridge void *)self, &compressionSession);
    
    CGFloat videoBitRate = 800*1024;
    CGFloat videoFrameRate = 24;
    CGFloat videoMaxKeyframeInterval = 48;

    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, (__bridge CFTypeRef)@(videoMaxKeyframeInterval));
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration, (__bridge CFTypeRef)@(videoMaxKeyframeInterval/videoFrameRate));
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ExpectedFrameRate, (__bridge CFTypeRef)@(videoFrameRate));
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_AverageBitRate, (__bridge CFTypeRef)@(videoBitRate));
    NSArray *limit = @[@(videoBitRate * 1.5/8), @(1)];
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_DataRateLimits, (__bridge CFArrayRef)limit);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Main_AutoLevel);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanTrue);
    VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_H264EntropyMode, kVTH264EntropyMode_CABAC);
    VTCompressionSessionPrepareToEncodeFrames(compressionSession);

有几个概念比较重要：

码率
简单来说就是指在压缩视频的时候给这个视频指定一个参数，用以告诉压缩软件期望的压缩后视频的大小。码率的英文名为bps(bit per second)，就是用平均每秒多少bit来衡量一个视频大小。更多请点击查看
帧率
用于测量显示帧数的量度。所谓的测量单位为每秒显示帧数(Frames per Second，简称：FPS）或“赫兹”（Hz）

代码中配置的码率是800 * 1024 bit/s ,帧率是24。还有一个叫最大关键帧间隔，可设置为帧率的两倍。

VideoCompressonOutputCallback 是硬编码后的回调，

static void VideoCompressonOutputCallback(void *VTref, void *VTFrameRef, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer){
    if (!sampleBuffer) return;
    CFArrayRef array = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true);
    if (!array) return;
    CFDictionaryRef dic = (CFDictionaryRef)CFArrayGetValueAtIndex(array, 0);
    if (!dic) return;

    BOOL keyframe = !CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync);
    uint64_t timeStamp = [((__bridge_transfer NSNumber *)VTFrameRef) longLongValue];

    HardwareVideoEncoder *videoEncoder = (__bridge HardwareVideoEncoder *)VTref;
    if (status != noErr) {
        return;
    }

    if (keyframe && !videoEncoder->sps) {
        CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);

        size_t sparameterSetSize, sparameterSetCount;
        const uint8_t *sparameterSet;
        OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0);
        if (statusCode == noErr) {
            size_t pparameterSetSize, pparameterSetCount;
            const uint8_t *pparameterSet;
            OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0);
            if (statusCode == noErr) {
                videoEncoder->sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize];
                videoEncoder->pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];
            }
        }
    }


    CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    size_t length, totalLength;
    char *dataPointer;
    OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer);
    if (statusCodeRet == noErr) {
        size_t bufferOffset = 0;
        static const int AVCCHeaderLength = 4;
        while (bufferOffset < totalLength - AVCCHeaderLength) {
            uint32_t NALUnitLength = 0;
            memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength);

            NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);

            VideoFrame *videoFrame = [VideoFrame new];
            videoFrame.timestamp = timeStamp;
            videoFrame.data = [[NSData alloc] initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength];
            videoFrame.isKeyFrame = keyframe;
            videoFrame.sps = videoEncoder->sps;
            videoFrame.pps = videoEncoder->pps;

            if (videoEncoder.h264Delegate && [videoEncoder.h264Delegate respondsToSelector:@selector(videoEncoder:videoFrame:)]) {
                [videoEncoder.h264Delegate videoEncoder:videoEncoder videoFrame:videoFrame];
            }

            bufferOffset += AVCCHeaderLength + NALUnitLength;
        }
    }
}

程序调用：encodeVideoData:timeStamp 进行硬编码，数据处理完之后会从VideoCompressonOutputCallback输出。

- (void)encodeVideoData:(CVPixelBufferRef)pixelBuffer timeStamp:(uint64_t)timeStamp {
    if(_isBackGround) return;
    frameCount++;
    CMTime presentationTimeStamp = CMTimeMake(frameCount, (int32_t)videoFrameRate);
    VTEncodeInfoFlags flags;
    CMTime duration = CMTimeMake(1, (int32_t)videoFrameRate);
    NSDictionary *properties = nil;
    if (frameCount % (int32_t)_configuration.videoMaxKeyframeInterval == 0) {
        properties = @{(__bridge NSString *)kVTEncodeFrameOptionKey_ForceKeyFrame: @YES};
    }
    NSNumber *timeNumber = @(timeStamp);

    OSStatus status = VTCompressionSessionEncodeFrame(compressionSession, pixelBuffer, presentationTimeStamp, duration, (__bridge CFDictionaryRef)properties, (__bridge_retained void *)timeNumber, &flags);
    if(status != noErr){
        [self resetCompressionSession];
    }
}

整个过程如下图，从摄像机拿到的视频数据，使用CompressionSession进行硬编码，最后输出准备用于传输的videoFrame。

videoEncode.png

在硬编码这个过程中，有一点需要注意的。当手机遇到电量较低、充电时，必然会导致手机电池严重发热发烫；此种情况下iPhone手机的h264硬编码性能有相当大概率的性能衰减，编码输出帧率严重下降；

手机H264编码器编码统计得到的实际输出帧率低于预期帧率，如摄像头采集帧率30fps、H264硬编码预期帧率20fps、实际输出帧率小于15fps；手机发热后性能H264硬编码器性能下降，二输入帧率30fps加剧编码器的能耗压力；

解决：采取主动平滑丢帧的策略，将输入帧率降低到编码器实际输出帧率以上，如实际输出帧率15fps，输入帧率调整成18fps，以降低编码器压力；待编码器实际编码输出帧率逐渐升高到18fps后，再提高输入帧率使编码实际输出帧率符合设计预期。

音频采集与编码

音频采集

获得输入设备

AudioComponentDescription acd;
acd.componentType = kAudioUnitType_Output;
acd.componentSubType = kAudioUnitSubType_RemoteIO;
acd.componentManufacturer = kAudioUnitManufacturer_Apple;
acd.componentFlags = 0;
acd.componentFlagsMask = 0;

AudioComponent *component = AudioComponentFindNext(NULL, &acd);

创建AudioUnit

AudioComponentInstance componetInstance;
OSStatus status = noErr;
status = AudioComponentInstanceNew(self.component, &_componetInstance);

if (noErr != status) {
    [self handleAudioInitializeError];
}

UInt32 flagOne = 1;

AudioUnitSetProperty(self.componetInstance, kAudioOutputUnitProperty_EnableIO, kAudioUnitScope_Input, 1, &flagOne, sizeof(flagOne));

AudioStreamBasicDescription desc = {0};
desc.mSampleRate = 44100;
desc.mFormatID = kAudioFormatLinearPCM;
desc.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
desc.mChannelsPerFrame = 2;
desc.mFramesPerPacket = 1;
desc.mBitsPerChannel = 16;
desc.mBytesPerFrame = desc.mBitsPerChannel / 8 * desc.mChannelsPerFrame;
desc.mBytesPerPacket = desc.mBytesPerFrame * desc.mFramesPerPacket;

AURenderCallbackStruct cb;
cb.inputProcRefCon = (__bridge void *)(self);
cb.inputProc = InputBufferCallback;
AudioUnitSetProperty(componetInstance, kAudioUnitProperty_StreamFormat, kAudioUnitScope_Output, 1, &desc, sizeof(desc));
AudioUnitSetProperty(componetInstance, kAudioOutputUnitProperty_SetInputCallback, kAudioUnitScope_Global, 1, &cb, sizeof(cb));

status = AudioUnitInitialize(componetInstance);

if (noErr != status) {
    [self handleAudioInitializeError];
}

获取音频数据，其中handleInputBuffer为AudioUnit的输入回调。

static OSStatus handleInputBuffer(void *inRefCon,
                                  AudioUnitRenderActionFlags *ioActionFlags,
                                  const AudioTimeStamp *inTimeStamp,
                                  UInt32 inBusNumber,
                                  UInt32 inNumberFrames,
                                  AudioBufferList *ioData) {
    @autoreleasepool {
        AudioCapture *source = (__bridge AudioCapture *)inRefCon;
        if (!source) return -1;

        AudioBuffer buffer;
        buffer.mData = NULL;
        buffer.mDataByteSize = 0;
        buffer.mNumberChannels = 1;

        AudioBufferList buffers;
        buffers.mNumberBuffers = 1;
        buffers.mBuffers[0] = buffer;

        OSStatus status = AudioUnitRender(source.componetInstance,
                                          ioActionFlags,
                                          inTimeStamp,
                                          inBusNumber,
                                          inNumberFrames,
                                          &buffers);

        if (source.muted) {
            for (int i = 0; i < buffers.mNumberBuffers; i++) {
                AudioBuffer ab = buffers.mBuffers[i];
                memset(ab.mData, 0, ab.mDataByteSize);
            }
        }

        if (!status) {
            if (source.delegate && [source.delegate respondsToSelector:@selector(captureOutput:audioData:)]) {
                [source.delegate captureOutput:source audioData:[NSData dataWithBytes:buffers.mBuffers[0].mData length:buffers.mBuffers[0].mDataByteSize]];
            }
        }
        return status;
    }
}

至于音频的处理，主要是用到WebRTC，后面会有专门讲WebRTC 的文件，介绍如何消除回音等。

音频编码

创建编码器

AudioConverterRef m_converter;
AudioStreamBasicDescription inputFormat = {0};
inputFormat.mSampleRate = 44100;
inputFormat.mFormatID = kAudioFormatLinearPCM;
inputFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked;
inputFormat.mChannelsPerFrame = (UInt32)2;
inputFormat.mFramesPerPacket = 1;
inputFormat.mBitsPerChannel = 16;
inputFormat.mBytesPerFrame = inputFormat.mBitsPerChannel / 8 * inputFormat.mChannelsPerFrame;
inputFormat.mBytesPerPacket = inputFormat.mBytesPerFrame * inputFormat.mFramesPerPacket;

AudioStreamBasicDescription outputFormat; // 这里开始是输出音频格式
memset(&outputFormat, 0, sizeof(outputFormat));
outputFormat.mSampleRate =44100;       // 采样率保持一致
outputFormat.mFormatID = kAudioFormatMPEG4AAC;            // AAC编码 kAudioFormatMPEG4AAC kAudioFormatMPEG4AAC_HE_V2
outputFormat.mChannelsPerFrame = 2;
outputFormat.mFramesPerPacket = 1024;                     // AAC一帧是1024个字节

const OSType subtype = kAudioFormatMPEG4AAC;
AudioClassDescription requestedCodecs[2] = {
    {
        kAudioEncoderComponentType,
        subtype,
        kAppleSoftwareAudioCodecManufacturer
    },
    {
        kAudioEncoderComponentType,
        subtype,
        kAppleHardwareAudioCodecManufacturer
    }
};

OSStatus result = AudioConverterNewSpecific(&inputFormat, &outputFormat, 2, requestedCodecs, &m_converter);;
UInt32 outputBitrate = 96000;
UInt32 propSize = sizeof(outputBitrate);


if(result == noErr) {
    result = AudioConverterSetProperty(m_converter, kAudioConverterEncodeBitRate, propSize, &outputBitrate);
}

编码器准备好了，可以对采集的PCM数据进行编码。这里会有两个缓冲区：AudioBuffe，AACBuffer，大小都是audioBufferSize。当PCM的数据和AudioBuffer中的数据大小超过AudioBuffer 的大小，才送到AAC Encoder。

AudioEncode.png

- (void)encodeAudioData:(nullable NSData*)audioData timeStamp:(uint64_t)timeStamp {
    if(leftLength + audioData.length >= audioBufferSize){
        NSInteger totalSize = leftLength + audioData.length;
        NSInteger encodeCount = totalSize / audioBufferSize;
        char *totalBuf = malloc(totalSize);
        char *p = totalBuf;
        
        memset(totalBuf, (int)totalSize, 0);
        memcpy(totalBuf, audioBuffer, leftLength);
        memcpy(totalBuf + leftLength, audioData.bytes, audioData.length);
        
        for(NSInteger index = 0;index < encodeCount;index++){
            [self encodeBuffer:p  timeStamp:timeStamp];
            p += audioBufferSize;
        }
        
        leftLength = totalSize % audioBufferSize;
        memset(audioBuffer, 0, audioBufferSize);
        memcpy(audioBuffer, totalBuf + (totalSize -leftLength), leftLength);
        free(totalBuf);
    }else{
        memcpy(audioBuffer+leftLength, audioData.bytes, audioData.length);
        leftLength = leftLength + audioData.length;
    }
}

接下来就是编码从AudioBuffer中送过来的数据。其中inputDataProc 为编码器转化数据时的回调。


- (void)encodeBuffer:(char*)buf timeStamp:(uint64_t)timeStamp{
    
    AudioBuffer inBuffer;
    inBuffer.mNumberChannels = 1;
    inBuffer.mData = buf;
    inBuffer.mDataByteSize = audioBufferSize;
    
    AudioBufferList buffers;
    buffers.mNumberBuffers = 1;
    buffers.mBuffers[0] = inBuffer;
    
    
    // 初始化一个输出缓冲列表
    AudioBufferList outBufferList;
    outBufferList.mNumberBuffers = 1;
    outBufferList.mBuffers[0].mNumberChannels = inBuffer.mNumberChannels;
    outBufferList.mBuffers[0].mDataByteSize = inBuffer.mDataByteSize;   // 设置缓冲区大小
    outBufferList.mBuffers[0].mData = aacBuffer;           // 设置AAC缓冲区
    UInt32 outputDataPacketSize = 1;
    if (AudioConverterFillComplexBuffer(m_converter, inputDataProc, &buffers, &outputDataPacketSize, &outBufferList, NULL) != noErr) {
        return;
    }
    
    AudioFrame *audioFrame = [AudioFrame new];
    audioFrame.timestamp = timeStamp;
    audioFrame.data = [NSData dataWithBytes:aacBuffer length:outBufferList.mBuffers[0].mDataByteSize];
    
    char exeData[2];
    ///// flv编码音频头 44100 为0x12 0x10
    exeData[0] = 0x12;
    exeData[1] = 0x10;
    audioFrame.audioInfo = [NSData dataWithBytes:exeData length:2];
    if (self.aacDeleage && [self.aacDeleage respondsToSelector:@selector(audioEncoder:audioFrame:)]) {
        [self.aacDeleage audioEncoder:self audioFrame:audioFrame];
    }
    
}

OSStatus inputDataProc(AudioConverterRef inConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription * *outDataPacketDescription, void *inUserData)
{
    AudioBufferList bufferList = *(AudioBufferList *)inUserData;
    ioData->mBuffers[0].mNumberChannels = 1;
    ioData->mBuffers[0].mData = bufferList.mBuffers[0].mData;
    ioData->mBuffers[0].mDataByteSize = bufferList.mBuffers[0].mDataByteSize;
    return noErr;
}

那么44100 为0x12 0x10 ，这数据是怎么得出的？

获取sampleIndex

//https://wiki.multimedia.cx/index.php?title=MPEG-4_Audio
- (NSInteger)sampleRateIndex:(NSInteger)frequencyInHz {
    NSInteger sampleRateIndex = 0;
    switch (frequencyInHz) {
    case 96000:
        sampleRateIndex = 0;
        break;
    case 88200:
        sampleRateIndex = 1;
        break;
    case 64000:
        sampleRateIndex = 2;
        break;
    case 48000:
        sampleRateIndex = 3;
        break;
    case 44100:
        sampleRateIndex = 4;
        break;
    case 32000:
        sampleRateIndex = 5;
        break;
    case 24000:
        sampleRateIndex = 6;
        break;
    case 22050:
        sampleRateIndex = 7;
        break;
    case 16000:
        sampleRateIndex = 8;
        break;
    case 12000:
        sampleRateIndex = 9;
        break;
    case 11025:
        sampleRateIndex = 10;
        break;
    case 8000:
        sampleRateIndex = 11;
        break;
    case 7350:
        sampleRateIndex = 12;
        break;
    default:
        sampleRateIndex = 15;
    }
    return sampleRateIndex;
}

根据公式就可以计算出44100的asc。

asc[0] = 0x10 | ((sampleRateIndex>>1) & 0x7);
asc[1] = ((sampleRateIndex & 0x1)<<7) | ((numberOfChannels & 0xF) << 3);

asc[0] = 0x10 | ((4>>1) & 0x7) = 0x12
asc[1] = ((4 & 0x1)<<7) | ((2 & 0xF) << 3) = 0x10

结论

经过音视频编码后，最终得到的VideoFrame和Video Frame ，它们包含了当前数据的时间戳和数据。

下一篇将会记录如何通过rtmp发送数据。