Real-Time Rendering大纲

FPS:frames per second

A coarse division of the real-time rendering pipeline into four main

1 stages—application, geometry processing, rasterization, and pixel processing。



1.1 application

the application stage is driven by the application and is therefore typically implemented in software running on general-purpose CPUs.
include collision detection, global acceleration algorithms, animation, physics simulation, and many others, depending on the type of application.

1.2 geometry processing

deals with transforms, projections,and all other types of geometry handling.
performed on a graphics processing unit (GPU) that contains many programmable cores as well as fixed-operation hardware.
This stage is further divided into the following functional stages:vertex shading, projection, clipping, and screen mapping


1.2.1 Vertex Shading

The vertex shader is now a more general unit dedicated to setting up the data associated with each vertex
There are two main tasks of vertex shading, namely, to compute the position for a vertex and to evaluate whatever the programmer may like to have as vertex output data, such as a normal and texture coordinates.

1.2.2 orthographic projection and perspective projection


1.2.3 Optional Vertex Processing:tessellation,geometry shading, and stream output.

tessellation:hull shader, tessellator, and domain shader
geometry shader:it takes in primitives of various sorts and can produce new vertices.
stream output:This stage lets us use the GPU as a geometry engine。
geometry shader and stream output both typically used for particle simulations

1.2.4 clipping

It is the primitives that are partially inside the view volume that require clipping.
The advantage of performing the view transformation and projection before clipping is that it makes the clipping problem consistent; primitives are always clipped against the unit cube.

perspective division and normalized device coordinates

Finally, perspective division is performed, which places the resulting triangles’ positions into three-dimensional normalized device coordinates.
The last step in the geometry stage is to convert from this space to window coordinates.

1.2.5 Screen Mapping

the screen mapping is a translation followed by a scaling operation.The new x- and ycoordinates are said to be screen coordinates.

1.3 rasterization stage

typically takes as input three vertices, forming a triangle, and finds all pixels that are considered inside that triangle.
The window coordinates along with this remapped z-value are passed on to the rasterizer stage.

1.3.1 triangle setup (also called primitive assembly) and triangle traversal

triangle setup:In this stage the differentials, edge equations, and other data for the triangle are computed.
triangle traversal:Finding which samples or pixels are inside a triangle is often called triangle traversal. Each triangle fragment’s properties are generated using data interpolated among the three triangle vertices。

1.4 pixel processing stage

executes a program per pixel to determine its color and may perform depth testing to see whether it is visible or not. It may also perform per-pixel operations such as blending the newly computed color with a previous color.
The rasterization and pixelprocessing stages are also processed entirely on the GPU.
pixel processing stage is divided into pixel shading and merging。

1.4.1 Pixel Shading

Any per-pixel shading computations are performed here, using the interpolated shading data as input. The end result is one or more colors to be passed on to the next stage.

1.4.2 Merging

It is the responsibility of the merging stage to combine the fragment color produced by the pixel shading stage with the color currently stored in the buffer.
This stage is also responsible for resolving visibility.
Transparency is one of the major weaknesses of the basic z-buffer.
contain color buffer, z-buffer, alpha channel, stencil buffer.
However, some APIs have support for raster order views, also called pixel shader ordering, which enable programmable blending capabilities.
The framebuffer generally consists of all the buffers on a system.
double buffering:back buffer, front buffer.The swapping often occurs during vertical retrace.

2 The Graphics Processing Unit

single instruction, multiple data(SIMD)
Threads that use the same shader program are bundled into groups, called warps by NVIDIA and wavefronts by AMD.


instruction set architecture (ISA)
A processor that implements this model is called a common-shader core in DirectX, and a GPU with such cores is said to have a unified shader architecture.

Shaders are programmed using C-like shading languages such as DirectX’s High-Level Shading Language (HLSL) and the OpenGL Shading Language (GLSL). DirectX’s HLSL can be compiled to virtual machine bytecode, also called the intermediate language (IL or DXIL), to provide hardware independence.

The basic data types are 32-bit single-precision floating point scalars and vectors

A draw call invokes the graphics API to draw a group of primitives, so causing the graphics pipeline to execute and run its shaders.
Each programmable shader stage has two types of inputs: uniform inputs, with values that remain constant throughout a draw call (but can be changed between draw calls), and varying inputs, data that come from the triangle’s vertices or from rasterization.

The number of available constant registers for uniforms is much larger than those registers available for varying inputs or outputs.
The virtual machine also has general-purpose temporary registers, which are used for scratch space.

The term flow control refers to the use of branching instructions to change the flow of code execution.
Static flow control branches are based on the values of uniform inputs.
Dynamic flow control is based on the values of varying inputs。

2.1. vertex shader

some data manipulation happens before this stage. In what DirectX calls the input assembler [175, 530, 1208], several streams of data can be woven together to form the sets of vertices and primitives sent down the pipeline.

**blending for animating joints
**silhouette rendering
**Object generation, by creating a mesh only once and having it be deformed by the vertex shader.
**Animating character’s bodies and faces using skinning and morphing techniques.
**Procedural deformations, such as the movement of flags, cloth, or water
**Particle creation, by sending degenerate (no area) meshes down the pipeline and having these be given an area as needed.
**Lens distortion, heat haze, water ripples, page curls, and other effects, by using the entire framebuffer’s contents as a texture on a screen-aligned mesh undergoing procedural deformation.
**Applying terrain height fields by using vertex texture fetch

2.2 tessellation stage

Beyond memory savings, this feature can keep the bus between CPU and GPU from becoming the bottleneck for an animated character or object whose shape is changing each frame.
hull shader, tessellator, and domain shader
In OpenGL the hull shader is the tessellation control shader and the domain shader the tessellation evaluation shader
the input to the hull shader is a special patch primitive
The hull shader has two functions. First, it tells the tessellator how many triangles should be generated, and in what configuration. Second, it performs processing on each of the control points.
Also, optionally, the hull shader can modify the incoming patch description, adding or removing control points as desired.
The hull shader outputs its set of control points, along with the tessellation control data, to the domain shader.


The tessellator is a fixed-function stage in the pipeline
It has the task of adding several new vertices for the domain shader to process.
The hull shader sends the tessellator information about what type of tessellation surface is desired: triangle, quadrilateral, or isoline. Isolines are sets of line strips, sometimes used for hair rendering
The other important values sent by the hull shader are the tessellation factors (tessellation levels in OpenGL). These are of two types: inner and outer edge. The two inner factors determine how much tessellation occurs inside the triangle or quadrilateral. The outer factors determine how much each exterior edge is split
The hull shader always outputs a patch, a set of control point locations,sends it to the domain shader. sends it to the domain shader.
The control points for the curved surface from the hull shader are used by each invocation of the domain shader to compute the output values for each vertex.
The domain shader has a data flow pattern like that of a vertex shader, with each input vertex from the tessellator being processed and generating a corresponding output vertex.

The tessellator performs an involved but fixed-function process of generating the vertices, giving them positions, and specifying what triangles or lines they form
The domain shader takes the barycentric coordinates generated for each point and uses these in the patch’s evaluation equation to generate the position, normal, texture coordinates, and other vertex information desired.

2.3. The geometry shader

The geometry shader is designed for modifying incoming data or making a limited number of copies.

2.4. Stream Output

After vertices are processed by the vertex shader (and, optionally, the tessellation and geometry shaders), these can be output in a stream, i.e., an ordered array, in addition to being sent on to the rasterization stage.
the focus of much of its use is transforming vertices and returning them for further processing.
Primitives are guaranteed to be sent to the stream output target in the order that they were input, meaning the vertex order will be maintained

2.5. the pixel shader

This piece of a triangle partially or fully overlapping the pixel is called a fragment.
Normally we use perspective-correct interpolation, so that the worldspace distances between pixel surface locations increase as an object recedes in the distance.
Other interpolation options are available, such as screen-space interpolation, where perspective projection is not taken into account.
A pixel shader also has the unique ability to discard an incoming fragment, i.e., generate no output.
multiple render targets (MRT):The number of instructions a pixel shader can execute has grown considerably over time.This increase gave rise to the idea of multiple render targets (MRT). Instead of sending results of a pixel shader’s program to just the color and z-buffer, multiple sets of values could be generated for each fragment and saved to different buffers, each called a render target.
deferred shading:A single rendering pass could generate a color image in one target, object identifiers in another, and world-space distances in a third. This ability has also given rise to a different type of rendering pipeline, called deferred shading, where visibility and shading are done in separate passes.
UAV:DirectX 11 introduced a buffer type that allows write access to any location, the unordered access view (UAV).OpenGL 4.3 calls this a shader storage buffer object (SSBO).
无序访问视图:DirectX 11引入了一种允许对任何位置进行写访问的缓冲区类型,即无序访问视图(UAV)。OpenGL 4.3称之为着色器存储缓冲对象(SSBO)。
In the standard pipeline, the fragment results are sorted in the merger stage before being processed.
ROVs:Rasterizer order views (ROVs) were introduced in DirectX 11.3 to enforce an order of execution.
光栅化顺序视图:DirectX 11.3中引入了光栅化顺序视图(ROV)来强制执行顺序。

2.6.the merging stage

the merging stage is where the depths and colors of the individual fragments (generated in the pixel shader) are combined with the framebuffer. DirectX calls this stage the output merger;OpenGL refers to it as per-sample operations.
On most traditional pipeline diagrams (including our own), this stage is where stencil-buffer and z-buffer operations occur.
If the fragment is visible, another operation that takes place in this stage is color blending.
early-z:To avoid waste, many GPUs perform some merge testing before the pixel shader is executed。The fragment’s z-depth (and whatever else is in use, such as the stencil buffer or scissoring) is used for testing visibility.This functionality is called early-z
Color blending in particular can be set up to perform a large number of different operations. The most common are combinations of multiplication, addition, and subtraction involving the color and alpha values, but other operations are possible, such as minimum and maximum, as well as bitwise logic operations.
dual source-color blending:DirectX 10 added the capability to blend two colors from the pixel shader with the framebuffer color. This capability is called dual source-color blending and cannot be used in conjunction with multiple render targets.
DirectX 10增加了将像素着色器中的两种颜色与帧缓冲区颜色混合的功能。这种功能称为双源颜色混合,不能与多个渲染目标结合使用。

2.6.the Compute Shader

One important advantage of compute shaders is that they can access data generated on the GPU. Sending data from the GPU to the CPU incurs a delay, so performance can be improved if processing and results can be kept resident on the GPU [1403]. Post-processing, where a rendered image is modified in some way, is a common use of compute shaders.
Compute shaders are also useful for particle systems, mesh processing such as facial animation [134], culling [1883, 1884], image filtering [1102, 1710], improving depth precision [991], shadows [865], depth of field [764], and any other tasks where a set of GPU processors can be brought to bear.

Chapter 4 Transforms
A transform is an operation that takes entities such as points, vectors, or colors and converts them in some way.
A linear transform is one that preserves vector addition and scalar multiplication.
Scaling and rotation transforms, in fact all linear transforms for three-element vectors, can be represented using a 3 × 3 matrix.
缩放和旋转变换,实际上是三元素向量的所有线性变换,可以用一个3 × 3矩阵来表示。


To prove that this is linear, the two conditions (Equations 4.1 and 4.2) need to be fulfilled.


