In previous posts, I showed you how to create a custom camera using AVFoundation and how to process an image with the accelerate framework. Let’s now combine both results to create a (quasi-) real-time (I’ll explain later what I mean with quasi) video processing.
To appreciate what we are going to do, we need to build a custom camera preview. If we want to process a video buffer and show the result in real-time, we cannot use theAVCaptureVideoPreviewLayer
as shown in this post, because that camera preview renders the signal directly and does not offer any way to process it, before the rendering. To make this possible, you need to take the video buffer, process it and then render it on a custom CALayer
. Let’s see how to do that.
As I already demonstated here, setting the AVFoundation stack is quite straightfoward (thank you, Apple): you need to create a capture session (AVCaptureSession
), then a capture device (AVCaptureDevice
) and add it to the session as a device input (AVCaptureDeviceInput). Translating this in source code, this becomes:
// Create the capture session
AVCaptureSession *captureSession = [AVCaptureSession new];
[captureSession setSessionPreset:AVCaptureSessionPresetLow];
// Capture device
AVCaptureDevice *captureDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
// Device input
AVCaptureDeviceInput *deviceInput = [AVCaptureDeviceInput deviceInputWithDevice:captureDevice error:nil];
if ( [captureSession canAddInput:deviceInput] )
[captureSession addInput:deviceInput];
Until here, nothing is new with respct to the previous post. Here, instead, where the new stuffs come in place. First of all, we need to define a video data output (AVCaptureVideoDataOutput
) and add it to the session:
AVCaptureVideoDataOutput *dataOutput = [AVCaptureVideoDataOutput new];
dataOutput.videoSettings = [NSDictionary dictionaryWithObject:[NSNumber numberWithUnsignedInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange] forKey:(NSString *)kCVPixelBufferPixelFormatTypeKey];
[dataOutput setAlwaysDiscardsLateVideoFrames:YES];
if ( [captureSession canAddOutput:dataOutput]
[captureSession addOutput:dataOutput];
Here, I defined the output format as YUV (YpCbCr 4:2:0). If you don’t know what I am talking about, I suggest you to give a look at this article. YUV or, more correctly, YCbCr is a very common video format and I use it here, because, except when the color brings some usefull information, you usually use graylevel images for image processing. So, the YUV format provides a signal with the intensity component (the Y) and 2 cromatic components (the U and the V).
Additionally, we need to create a new layer and use it as our rendering destination:
CALayer *customPreviewLayer = [CALayer layer];
customPreviewLayer.bounds = CGRectMake(0, 0, self.view.frame.size.height, self.view.frame.size.width);
customPreviewLayer.position = CGPointMake(self.view.frame.size.width/2., self.view.frame.size.height/2.);
customPreviewLayer.affineTransform = CGAffineTransformMakeRotation(M_PI/2);
We can add this layer to any other layer. I’m going to add it to my view controller view layer:
[self.view.layer addSublayer:customPreviewLayer];
The last step of the initial configuration is to create a GCD queue that is going to manage the video buffer and set our class as delegate of the video data output sample buffer (AVCaptureVideoDataOutputSampleBufferDelegate
):
dispatch_queue_t queue = dispatch_queue_create("VideoQueue", DISPATCH_QUEUE_SERIAL);
[dataOutput setSampleBufferDelegate:self queue:queue];
Now, remeber to add the following frameworks to your project:
Since the view controller is now the delegate of the capture video data output, you can implement the following callback:
- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
fromConnection:(AVCaptureConnection *)connection
AVFoundation fires this delegate method as soon as it has a data buffer available. So, you can use it to collet the video buffer frames, process them and render them on the layer that we previously created. For the moment, let’s collect the video buffer frames and render them on the layer. Later, we’ll give a look at the image processing.
The previous delegate method returns the sampleBuffer
of type CMSampleBufferRef
. This is a Core Media object we can bring into Core Video:
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
Let’s lock the buffer base address:
CVPixelBufferLockBaseAddress(imageBuffer, 0);
Then, let’s extract some useful image information:
size_t width = CVPixelBufferGetWidthOfPlane(imageBuffer, 0);
size_t height = CVPixelBufferGetHeightOfPlane(imageBuffer, 0);
size_t bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0);
Remember the video buffer is in YUV format, so I extract the luma component from the buffer in this way:
Pixel_8 *lumaBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0);
Now, let’s render this buffer on the layer. To do so, we need to use Core Graphics: create a color space, create a graphic context and render the buffer onto the graphic context using the created color space:
CGColorSpaceRef grayColorSpace = CGColorSpaceCreateDeviceGray();
CGContextRef context = CGBitmapContextCreate(lumaBuffer, width, height, 8, bytesPerRow, grayColorSpace, kCGImageAlphaNone);
CGImageRef dstImage = CGBitmapContextCreateImage(context);
So, the dstImage
is a Core Graphics image (CGImage
), created from the captured buffer. Finally, we render this image on the layer, changing its contents. We do that on the main queue:
dispatch_sync(dispatch_get_main_queue(), ^{
customPreviewLayer.contents = (__bridge id)dstImage;
});
Now, let’s do some clean-up (we are good citizens, right?).
CGImageRelease(dstImage);
CGContextRelease(context);
CGColorSpaceRelease(grayColorSpace);
If you build and run, you’ll see the camera in action with your camera preview.
Now, let’s start with some funny stuffs. Let’s process the buffer before rendering it. For this, I am going to use the Accelerate framework. The Pixel_8 *lumaBuffer
would be the input of my algorithm. I need to convert the this buffer into a vImage_Buffer
and prepare a vImage_Buffer
for the output of the image processing algorithm. Add this code after the line generating thelumaBuffer
:
...
Pixel_8 *lumaBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0);
const vImage_Buffer inImage = { lumaBuffer, height, width, bytesPerRow };
Pixel_8 *outBuffer = (Pixel_8 *)calloc(width*height, sizeof(Pixel_8));
const vImage_Buffer outImage = { outBuffer, height, width, bytesPerRow };
[self maxFromImage:inImage toImage:outImage];
...
The -maxFromImage:toImage:
method does all the work. Just for fun, I process the input image with a morphological operator that minimizes a region of interest within the image. Here it is:
- (void)maxFromImage:(const vImage_Buffer)src toImage:(const vImage_Buffer)dst
{
int kernelSize = 7;
vImageMin_Planar8(&src, &dst, NULL, 0, 0, kernelSize, kernelSize, kvImageDoNotTile);
}
If you now run it, rendering the outImage
on the custom preview, you should obtain something like this:
You can download from here the example.
As I mentioned at the beginning of this post, this processing is done in quasi- real-time. The limitiation derives from the accelerate framework. This framework is optimized for the CPU that is anyway a limited resource. Depending on the final application, this limitation could not be important. However, if you start to add more processing before the rendering, you will see what I mean. Again, the result is really dependent on the application, but if you really want to process and display the processed results in real-time, maybe you should think of using the GPU… but this is something for a future post.
Geppy