MID Multimedia frameworks
As an application vendor you may want to take advantage of this hardware video decode so that your application get the best out of the MID platform. You may not need to use the VA API in your code yourself. There exists several multimedia frameworks today that have been optimized to use this capability. Any application built on top of those frameworks can get the benefits of the platform without having VA API insights.
Two frameworks are available on the Moblin* platform: Helix* and Gstreamer*.
Helix framework is capable of using the video decode hardware acceleration. As a result every player built on top of the Helix framework benefits from this feature like the RealPlayer* for MID (http://www.helixcommunity.org).
Gstreamer is a very popular multimedia framework in the open source community. Many open source media players are based on this framework in the Linux world : Totem*, Rhythm*, Banshee*,...The company Fluendo* (http://www.fluendo.com) is providing optimized codecs for the Gstreamer framework for the Intel® GMA 500 chipset. By using the Fluendo optimized codecs, all these applications can benefit seamlessly from the video hardware acceleration.
An implementation of the FFmpeg codecs using the VA API has been developed by Splitted-Desktop Systems* (http://www.splitted-desktop.com) , which resulted in dramatic performance improvements with video playbacks in MPlayer* on the current Intel® processor-based MIDs using the Intel® GMA 500 chipset. For reference, the sources are available at this location: http://www.splitted-desktop.com/en/libva/
Typical code structure
The code implementing a video decoding with the VA API must follow a certain structure.
After an initialization phase, the client negotiates a mutually acceptable configuration with the server. It locks down profile, entry point, and other attributes that are not varying along the stream decoding. Once the configuration is set and accepted by the server, the client creates a decode context. This decode context can be seen as a virtualized hardware decode pipeline. The decode pipeline must be configured by passing a number of datasets.
The program is now ready to start decoding the stream. The client gets and fill decode buffers with slices and macroblock level data. The decode buffers are sent to the server until the server is able to decode and render the frame. The client then reiterate the operation with the decode buffers over and over to decode the bit stream. See below the typical flowchart of a decoder using the VA API. We will detail the different phases of the algorithm in the coming chapters.

Initialization Phase
Setting display
1 |
x11_display = XOpenDisplay(NULL); |
2 |
vaDisplay = vaGetDisplay(x11_display); |
3 |
vaStatus = vaInitialize(vaDisplay, &major_ver, &minor_ver); |
Negotiating and creating configuration
In order to determine the level of hardware acceleration supported on a particular platform, the client needs to make sure the hardware supports the desired video profile (format) and the entry points available for that profile. For this, the client queries the driver on its capabilities using the vaQueryConfigEntrypoints. Depending on the driver answer the client can take the appropriate action. Find here a code sample showing a configuration negotiation phase.
01 |
vaQueryConfigEntrypoints(vaDisplay, VAProfileMPEG2Main, entrypoints, |
04 |
for (vld_entrypoint = 0; vld_entrypoint < num_entrypoints; vld_entrypoint++) { |
05 |
if (entrypoints[vld_entrypoint] == VAEntrypointVLD) |
08 |
if (vld_entrypoint == num_entrypoints) { |
14 |
attrib.type = VAConfigAttribRTFormat; |
15 |
vaGetConfigAttributes(vaDisplay, VAProfileMPEG2Main, VAEntrypointVLD, |
17 |
if ((attrib.value & VA_RT_FORMAT_YUV420) == 0) { |
22 |
vaStatus = vaCreateConfig(vaDisplay, VAProfileMPEG2Main, VAEntrypointVLD, |
23 |
&attrib, 1,&config_id); |
Decode context
Once a decode configuration has been created, the next step is to create a decode context which represents a virtual hardware decode pipeline. This virtual decode pipeline outputs decoded pixels to a render target called "Surface". The decoded frames are stored in Surfaces and can subsequently be rendered to X drawables defined in the first phase.
The client creates two objects. It creates first a Surface object. This object gathers the parameters of the render target to be created by the driver like picture width, height and format. The second object is a "Context" object. The Context object is bound with a Surface object when it is created. Once a surface is bound to a given context, it can not be used to create another context. The association is removed when the context is destroyed. Both contexts and surfaces are identified by unique IDs and its implementation specific internals are kept opaque to the client. Any operation whether it is data transfer or frame decoding will be given this context ID as a parameter to determine which virtual decode pipeline is used. See below a code sample showing how to set the decode context.
09 |
VAContextID vaContext; |
11 |
vaStatus = vaCreateContext(vaDisplay, config_id, |
13 |
((CLIP_HEIGHT+15)/16)*16, |
Decoding frames
For decoding frames, we need to feed the virtual pipeline with parameter and bit stream data so that it can decode the compressed video frames. There are several types of data to send:
- Some configuration data like inverse quantization matrix buffer, picture parameter buffer, slice buffer parameter or other data structure required for the different formats supported. This data parameterize the virtual pipeline before sending the actual data stream for decode.
- The bitstream data. It needs to be sent in a structured way so that the driver can interpret it and decode it correctly.
There is a unique data transfer mechanism that allows the client to pass both types of data to the driver.
Creating Buffer
The way to send parameter and bit stream data to the driver is through "Buffers". The buffer data store is managed by the library while the client identifies each buffer with a unique Id assigned by the driver.
There are two methods to set the contents of the buffers that hold either parameters or bit stream data. The first one actually copies the data to the driver data store. To do this you in need to invoke vaCreateBuffer with a non null "data" parameter. In that case, a memory space is allocated in the data store on the server side and the data is copied from into this memory space. This is the way it is used in the sample code provided:
01 |
static VAPictureParameterBufferMPEG2 pic_param={ |
04 |
forward_reference_picture:0xffffffff, |
05 |
backward_reference_picture:0xffffffff, |
06 |
picture_coding_type:1, |
13 |
frame_pred_frame_dct:1, |
14 |
concealment_motion_vectors:0, |
25 |
vaStatus = vaCreateBuffer(vaDisplay, vaContext, |
26 |
VAPictureParameterBufferType, |
27 |
sizeof (VAPictureParameterBufferMPEG2), |
If you call it with a null "data" parameter, the buffer object is created but the memory space is not assigned in the data store. By invoking vaMapBuffer(), the client get access to the buffer address space in the data store. This prevents doing memory copies of data from the client to the server address space. The client can then fill the buffer with data. After the buffer is filled with data and before it is actually transferred to the virtual pipeline, it must be unmapped calling vaUnmapBuffer(). Find here a code example:
2 |
VABufferID picture_buf; |
3 |
VAPictureParameterBufferMPEG2 *picture_param; |
4 |
vaCreateBuffer(dpy, context, VAPictureParameterBufferType, sizeof (VAPictureParameterBufferMPEG2), 1, NULL, &picture_buf); |
5 |
vaMapBuffer(dpy, picture_buf, &picture_param); |
6 |
picture_param->horizontal_size = 720; |
7 |
picture_param->vertical_size = 480; |
8 |
picture_param->picture_coding_type = 1; |
9 |
vaUnmapBuffer(dpy, picture_buf); |
Sending the parameters and bitstream for decode
For decoding frames we need to send stream parameters first: the inverse quantization matrix buffer, the picture parameter buffer, the slice buffer parameter or other data structures required for the given format. Then the data stream can be sent to the virtual pipeline. This data is passed using the data transfer mechanism described in the previous chapter. The transfer of data is invoked through vaRenderPicture call.
For each frame to render, you need to Go through a vaBeginPicture/vaRenderPicture/vaEndPicture sequence. In this sequence, once the necessary parameters like the inverse quantize matrix or the picture parameter buffer or any other parameter needed depending on the format, are set, the data stream can be sent to the driver for decoding. The decode buffers are sent to the virtual pipeline owing to vaRenderPicture calls. When all the data related to the frame are sent, the vaEndPicture() call makes the end of rendering for the picture. This is a non blocking call so the client can start another vaBeginPicture/vaRenderPicture/vaEndPicture sequence while the hardware is decoding the current frame that has been submitted. The vaPutSurface call will send the decode output surface to the X drawable. It performs a de-interlacing (if needed) color space conversion and scaling to the destination rectangle. Find here a code sample describing the decode sequence.
01 |
vaBeginPicture(vaDisplay, vaContext, vaSurface); |
02 |
vaStatus = vaRenderPicture(vaDisplay,vaContext, &vaPicParamBuf, 1); |
03 |
vaStatus = vaRenderPicture(vaDisplay,vaContext, &vaIQMatrixBuf, 1); |
04 |
vaStatus = vaRenderPicture(vaDisplay,vaContext, &vaSliceParamBuf, 1); |
05 |
vaStatus = vaRenderPicture(vaDisplay,vaContext, &vaSliceDataBuf, 1); |
06 |
vaEndPicture(vaDisplay,vaContext); |
08 |
vaStatus = vaSyncSurface(vaDisplay, vaContext, vaSurface); |
11 |
win = XCreateSimpleWindow(x11_display, RootWindow(x11_display, 0), 0, 0, |
12 |
win_width,win_height, 0, 0, WhitePixel(x11_display, 0)); |
13 |
XMapWindow(x11_display, win); |
14 |
XSync(x11_display, True); |
16 |
vaStatus = vaPutSurface(vaDisplay, vaSurface, win, |
17 |
0,0,surf_width,surf_height, |
18 |
0,0,win_width,win_height, |
Additional capabilities
The VA API provides also other capabilities than just decoding acceleration. It provides functions for
- client and library synchronization
- subpicture blending in the decoded video stream
- host based post-processing by retrieving image data from decoded surfaces.
You can get more details on these capabilities in going through the VA API specifications. The API, which is currently in the version 0.29, will evolve overtime adding incremental functionalities supported by future version of chipsets.
Performance
Let's compare the performance of video playback on the current MID platforms. The first test compares the playback performance with Totem player on a Compal Jax* 10 MID platform. The Intel® GMA 500 chipset used in this MID is the UL11L. The Intel® AtomTM processor is the Z500 at 800 MHZ. In this test we will limit ourselves to SD content as the UL11L chipset is not supporting HD content decode.

The measurements taken below are measuring the CPU usage of a full video playback including audio decode. The first measurement shows the system cpu usage when doing a playback on Totem player with the software FFmpeg codecs (no hardware acceleration). The second one shows the system CPU usage when doing a playback on RealPlayer for MID with the hardware accelerated codecs.
Video format |
resolution |
fps |
Max CPU usage Totem + FFmpeg codecs |
Max CPU usage RealPlayer for MID with hardware accelerated codecs |
MPEG-2 |
720x480 |
30 |
72% |
39% |
MPEG-4 |
720x480 |
22 |
50% |
31% |
H.264 |
640x360 |
60 |
100% |
27.5% |
VC-1 |
720x480 |
25 |
100% |
31% |
The usage of the VA API allows the CPU usage to drop significantly when the hardware video decode is used reducing significantly the power drain on the battery. Note that when the CPU reaches 100% the system is not capable anymore to match the targeted frame rate. Indeed, the frame rate drops to a few frames per second, giving a pretty degraded experience.
In the second test we will use a platform with an Intel® AtomTM processor Z530 at 1.6GHZ and a US15W GMA 500 chipset. Unlike in the previous test, this version of chipset is capable of decoding HD content. The playback is done with the regular FFmpeg codecs (without hardware acceleration) and the second one with the Fluendo codecs using hardware acceleration through the VA API. Note here that we are only measuring pure video decode. There is no audio decode happening.
Video format |
resolution |
fps |
Max CPU usage FFmpeg codecs |
Max CPU usage Fluendo video codecs with hardware acceleration |
MPEG-2 |
480x576 |
25 |
22.8% |
18% |
MPEG-4 |
640x272 |
24 |
22.4% |
10% |
H.264 |
1280x544 |
30 |
100% |
13% |
VC-1 |
1280x720 |
25 |
100% |
15.5% |
The playback has been activated here using gst-launch-0.10 tool with the following command line : gst-launch-0.10 filesrc location= ! decodebin ! queue ! xvimagesink. The system had Intel® Hyper-threading Technology disabled. When reaching 100% usage, the playback experience is significantly degraded as the encoded fps cannot be delivered by the system. It drops to a few frames per second making the experience pretty poor.
Summary
As MIDs are becoming more and more widespread, video playback on this devices is seen as one of the major usage model especially as mobile TV and Video on Demand are really becoming popular. To be able to experience video playback in optimal conditions and to extend the battery life of the device, it is essential that the video players are using the hardware video decode capability provided in the platform.
Independent software vendors (ISV) have the choice to build their players on top of multimedia frameworks optimized for such platforms as Helix or Gstreamer, or to implement this decoding using the standard public API: VA API. It's a tremendous opportunity to get into this new growing segment and bring outstanding video support to the handheld world.
Additional Resources
ISVs that are considering using hardware acceleration will benefit from the following resources:
The VA API specifications are published on the freedesktop.org site, For more information, please visit:http://www.freedesktop.org/wiki/Software/vaapi.
If you look for information on Fluendo codecs, go to http://www.fluendo.com
Information on RealPlayer for MID can be found athttps://helixcommunity.org/licenses/realplayer_for_mid_faq.html
The sources of Mplayer using VA API provided by Splitted-Desktop Systems are available there:http://www.splitted-desktop.com/en/libva/
For software development on the MIDs, Intel® Developer Zone offers technical resources at:http://software.intel.com/en-us/appup/
About the Author
Philippe Michelon has a long history of software optimization on numerous Intel® architectures. Philippe works as Application Engineer in the Intel® Software and Services Group in Grenoble in France, with ISV- and service-enabling for Intel's new mobile client platforms. Currently his focus is on MIDs.
Philippe holds a M.S in Computational and Mathematical Engineering and can be reached [email protected]
Greetings
Special thanks to Jonathan Bian and Sengquan Yuan for their contribution to this paper.
Sample code
Sample code decoding a hardcoded mpeg2 stream with VA API
053 |
static unsigned char mpeg2_clip[]={ |
054 |
0x00,0x00,0x01,0xb3,0x01,0x00,0x10,0x13,0xff,0xff,0xe0,0x18,0x00,0x00,0x01,0xb5, |
055 |
0x14,0x8a,0x00,0x01,0x00,0x00,0x00,0x00,0x01,0xb8,0x00,0x08,0x00,0x00,0x00,0x00, |
056 |
0x01,0x00,0x00,0x0f,0xff,0xf8,0x00,0x00,0x01,0xb5,0x8f,0xff,0xf3,0x41,0x80,0x00, |
057 |
0x00,0x01,0x01,0x13,0xe1,0x00,0x15,0x81,0x54,0xe0,0x2a,0x05,0x43,0x00,0x2d,0x60, |
058 |
0x18,0x01,0x4e,0x82,0xb9,0x58,0xb1,0x83,0x49,0xa4,0xa0,0x2e,0x05,0x80,0x4b,0x7a, |
059 |
0x00,0x01,0x38,0x20,0x80,0xe8,0x05,0xff,0x60,0x18,0xe0,0x1d,0x80,0x98,0x01,0xf8, |
060 |
0x06,0x00,0x54,0x02,0xc0,0x18,0x14,0x03,0xb2,0x92,0x80,0xc0,0x18,0x94,0x42,0x2c, |
061 |
0xb2,0x11,0x64,0xa0,0x12,0x5e,0x78,0x03,0x3c,0x01,0x80,0x0e,0x80,0x18,0x80,0x6b, |
062 |
0xca,0x4e,0x01,0x0f,0xe4,0x32,0xc9,0xbf,0x01,0x42,0x69,0x43,0x50,0x4b,0x01,0xc9, |
063 |
0x45,0x80,0x50,0x01,0x38,0x65,0xe8,0x01,0x03,0xf3,0xc0,0x76,0x00,0xe0,0x03,0x20, |
064 |
0x28,0x18,0x01,0xa9,0x34,0x04,0xc5,0xe0,0x0b,0x0b,0x04,0x20,0x06,0xc0,0x89,0xff, |
065 |
0x60,0x12,0x12,0x8a,0x2c,0x34,0x11,0xff,0xf6,0xe2,0x40,0xc0,0x30,0x1b,0x7a,0x01, |
066 |
0xa9,0x0d,0x00,0xac,0x64 |
072 |
static VAPictureParameterBufferMPEG2 pic_param={ |
075 |
forward_reference_picture:0xffffffff, |
076 |
backward_reference_picture:0xffffffff, |
077 |
picture_coding_type:1, |
081 |
intra_dc_precision:0, |
084 |
frame_pred_frame_dct:1, |
085 |
concealment_motion_vectors:0, |
089 |
repeat_first_field:0, |
090 |
progressive_frame:1 , |
097 |
static VAIQMatrixBufferMPEG2 iq_matrix = { |
098 |
load_intra_quantiser_matrix:1, |
099 |
load_non_intra_quantiser_matrix:1, |
100 |
load_chroma_intra_quantiser_matrix:0, |