http://www.songho.ca/opengl/gl_pbo.html
Related Topics: Vertex Buffer Object (VBO), Frame Buffer Object (FBO)
Download: pboUnpack.zip, pboPack.zip
Update: Pixel buffer object extension is promoted as a core feature of OpenGL version 2.1, and removed ARB suffix.
OpenGL ARB_pixel_buffer_object extension is very close to ARB_vertex_buffer_object. It simply expands ARB_vertex_buffer_object extension in order to store not only vertex data but also pixel data into the buffer objects. This buffer object storing pixel data is called Pixel Buffer Object (PBO). ARB_pixel_buffer_object extension borrows all VBO framework and APIs, plus, adds 2 additional "target" tokens. These tokens assist the PBO memory manger (OpenGL driver) to determine the best location of the buffer object; system memory, shared memory or video memory. Also, the target tokens clearly specify that the bound PBO will be used in one of 2 different operations; GL_PIXEL_PACK_BUFFER to transfer pixel data to a PBO, or GL_PIXEL_UNPACK_BUFFER to transfer pixel data from PBO.
For example, glReadPixels() and glGetTexImage() are "pack" pixel operations, and glDrawPixels(), glTexImage2D() and glTexSubImage2D() are "unpack" operations. When a PBO is bound with GL_PIXEL_PACK_BUFFER token, glReadPixels() reads pixel data from a OpenGL framebuffer and write (pack) the data into the PBO. When a PBO is bound with GL_PIXEL_UNPACK_BUFFER token, glDrawPixels() reads (unpack) pixel data from the PBO and copy them to OpenGL framebuffer.
The main advantage of PBO is fast pixel data transfer to and from a graphics card through DMA (Direct Memory Access) without involing CPU cycles. And, the other advantage of PBO is asynchronous DMA transfer. Let's compare a conventional texture transfer method with using a Pixel Buffer Object. The left side of the following diagram is a conventional way to load texture data from an image source (image file or video stream). The source is first loaded into the system memory, and then, copied from the system memory to an OpenGL texture object with glTexImage2D(). These 2 transfer processes (load and copy) are all performed by CPU.
On the contrary in the right side diagram, the image source can be directly loaded into a PBO, which is controlled by OpenGL. CPU still involves to load the source to the PBO, but, not for transferring the pixel data from a PBO to a texture object. Instead, GPU (OpenGL driver) manages copying data from a PBO to a texture object. This means OpenGL performs a DMA transfer operation without wasting CPU cycles. Further, OpenGL can schedule an asynchronous DMA transfer for later execution. Therefore, glTexImage2D() returns immediately, and CPU can perform something else without waiting the pixel transfer is done.
There are 2 major PBO approaches to improve the performance of the pixel data transfer: streaming texture update and asynchronous read-back from the framebuffer.
As mentioned earlier, Pixel Buffer Object borrows all APIs from Vertex Buffer Object. The only difference is there are 2 additional tokens for PBOs: GL_PIXEL_PACK_BUFFER and GL_PIXEL_UNPACK_BUFFER. GL_PIXEL_PACK_BUFFER is for transferring pixel data from OpenGL to your application, and GL_PIXEL_UNPACK_BUFFER means transferring pixel data from an application to OpenGL. OpenGL refers to these tokens to determine the best memory space of a PBO, for example, a video memory for uploading (unpacking) textures, or system memory for reading (packing) the framebuffer. However, these target tokens are solely hint. OpenGL driver decides the appropriate location for you.
Creating a PBO requires 3 steps;
If you specify a NULL pointer to the source array in glBufferData(), then PBO allocates only a memory space with the given data size. The last parameter of glBufferData() is another performance hint for PBO to provide how the buffer object will be used. GL_STREAM_DRAW is for streaming texture upload and GL_STREAM_READ is for asynchronous framebuffer read-back.
Please check VBO for more details.
PBO provides a memory mapping mechanism to map the OpenGL controlled buffer object to the client's memory address space. So, the client can modify a portion of the buffer object or the entire buffer by using glMapBuffer()and glUnmapBuffer().
void* glMapBuffer(GLenum target, GLenum access)
GLboolean glUnmapBuffer(GLenum target)
glMapBuffer() returns the pointer to the buffer object if success. Otherwise it returns NULL. The target parameter is either GL_PIXEL_PACK_BUFFER or GL_PIXEL_UNPACK_BUFFER. The second parameter, accessspecifies what to do with the mapped buffer; read data from the PBO (GL_READ_ONLY), write data to the PBO (GL_WRITE_ONLY), or both (GL_READ_WRITE).
Note that if GPU is still working with the buffer object, glMapBuffer() will not return until GPU finishes its job with the corresponding buffer object. To avoid this stall(wait), call glBufferData() with NULL pointer right before glMapBuffer(). Then, OpenGL will discard the old buffer, and allocate new memory space for the buffer object.
The buffer object must be unmapped with glUnmapBuffer() after use of the PBO. glUnmapBuffer() returns GL_TRUE if success. Otherwise, it returns GL_FALSE.
Download the source and binary: pboUnpack.zip (Updated: 2018-09-05).
This demo application uploads (unpack) streaming textures to an OpenGL texture object using PBO. You can switch to the different transfer modes (single PBO, double PBOs and without PBO) by pressing the space key, and compare the performance differences.
The texture sources are written directly on the mapped pixel buffer every frame in the PBO modes. Then, these data are transferred from the PBO to a texture object using glTexSubImage2D(). By using PBO, OpenGL can perform asynchronous DMA transfer between a PBO and a texture object. It significantly increases the texture upload performance. If asynchronous DMA transfer is supported, glTexSubImage2D() should return immediately, and CPU can process other jobs without waiting the actual texture copy.
Streaming texture uploads with 2 PBOs
To maximize the streaming transfer performance, you may use multiple pixel buffer objects. The diagram shows that 2 PBOs are used simultaneously; glTexSubImage2D() copies the pixel data from a PBO while the texture source is being written to the other PBO.
For nth frame, PBO 1 is used for glTexSubImage2D() and PBO 2 is used to get new texture source. For n+1th frame, 2 pixel buffers are switching the roles and continue to update the texture. Because of asynchronous DMA transfer, the update and copy processes can be performed simultaneously. CPU updates the texture source to a PBO while GPU copies texture from the other PBO.
// "index" is used to copy pixels from a PBO to a texture object
// "nextIndex" is used to update pixels in the other PBO
index = (index + 1) % 2;
nextIndex = (index + 1) % 2;
// bind the texture and PBO
glBindTexture(GL_TEXTURE_2D, textureId);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboIds[index]);
// copy pixels from PBO to texture object
// Use offset instead of ponter.
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0);
// bind PBO to update texture source
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboIds[nextIndex]);
// Note that glMapBuffer() causes sync issue.
// If GPU is working with this buffer, glMapBuffer() will wait(stall)
// until GPU to finish its job. To avoid waiting (idle), you can call
// first glBufferData() with NULL pointer before glMapBuffer().
// If you do that, the previous data in PBO will be discarded and
// glMapBuffer() returns a new allocated pointer immediately
// even if GPU is still working with the previous data.
glBufferData(GL_PIXEL_UNPACK_BUFFER, DATA_SIZE, 0, GL_STREAM_DRAW);
// map the buffer object into client's memory
GLubyte* ptr = (GLubyte*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
if(ptr)
{
// update data directly on the mapped buffer
updatePixels(ptr, DATA_SIZE);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER); // release the mapped buffer
}
// it is good idea to release PBOs with ID 0 after use.
// Once bound with 0, all pixel operations are back to normal ways.
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
Download the source and binary: pboPack.zip (Updated: 2018-09-05).
This demo application reads (pack) the pixel data from the framebuffer (left-side) to a PBO, then, draws it back to the right side of the window after modifying the brightness of the image. You can toggle PBO on/off by pressing the space key, and measure the performance of glReadPixels().
Conventional glReadPixels() blocks the pipeline and waits until all pixel data are transferred. Then, it returns control to the application. On the contrary, glReadPixels() with PBO can schedule asynchronous DMA transfer and returns immediately without stall. Therefore, the application (CPU) can execute other process right away, while transferring data with DMA by OpenGL (GPU).
Asynchronous glReadPixels() with 2 PBOs
This demo uses 2 pixel buffers. At frame n, the application reads the pixel data from OpenGL framebuffer to PBO 1using glReadPixels(), and processes the pixel data in PBO 2. These read and process can be performed simultaneously, because glReadPixels() to PBO 1 returns immediately and CPU starts to process data in PBO 2 without delay. And, we alternate between PBO 1 and PBO 2 on every frame.
// "index" is used to read pixels from framebuffer to a PBO
// "nextIndex" is used to update pixels in the other PBO
index = (index + 1) % 2;
nextIndex = (index + 1) % 2;
// set the target framebuffer to read
glReadBuffer(GL_FRONT);
// read pixels from framebuffer to PBO
// glReadPixels() should return immediately.
glBindBuffer(GL_PIXEL_PACK_BUFFER, pboIds[index]);
glReadPixels(0, 0, WIDTH, HEIGHT, GL_BGRA, GL_UNSIGNED_BYTE, 0);
// map the PBO to process its data by CPU
glBindBuffer(GL_PIXEL_PACK_BUFFER, pboIds[nextIndex]);
GLubyte* ptr = (GLubyte*)glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
if(ptr)
{
processPixels(ptr, ...);
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
}
// back to conventional pixel operation
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);