from humus's web
Shader programming tips #1
float3 lightVec = normalize(In.lightVec);
float3 normal = normalize(In.normal);
float diffuse = saturate(dot(lightVec, normal));
float lightVecRSQ = rsqrt(dot(In.lightVec, In.lightVec));
float normalRSQ = rsqrt(dot(In.normal, In.normal));
float diffuse = saturate(dot(In.lightVec, In.normal) * lightVecRSQ * normalRSQ);
float lightVecSQ = dot(In.lightVec, In.lightVec);
float normalSQ = dot(In.normal, In.normal);
float diffuse = saturate(dot(In.lightVec, In.normal) * rsqrt(lightVecSQ * normalSQ));
Shader programming tips #2
Wednesday, February 4, 2009 | Permalink
Closely related to what I mentioned in tips #1, it's of great importance to use parantheses properly. HLSL and GLSL evaluate expressions left to right, just like C/C++. If you're multiplying vectors and scalars together the number of operations generated may differ a lot. Consider this code:
float4 result = In.color.rgba * In.intensity * 1.7;
float4 result = In.color.rgba * (In.intensity * 1.7);
Shader programming tips #3
Monday, February 9, 2009 | Permalink
Multiplying a vector by a matrix is one of the most common tasks you do in graphics programming. A full matrix multiplication is generally four float4 vector instructions. Depending on whether you have a row major or column major matrix, and whether you multiply the vector from the left or right, the result is either a DP4-DP4-DP4-DP4 or MUL-MAD-MAD-MAD sequence.
In the vast majority of the cases the w component of the vector is 1.0, and in this case you can optimize it down to three instructions. For this to work, declare your matrix as row_major. If you previously was passing a column major matrix you'll need to transpose it before passing it to the shader. You need a matrix that works with mul(vertex, matrix) when declared as row_major. Then you can do something like this to accomplish the transformation in three MAD instructions:
float4 pos;
pos = view_proj[2] * vertex.z + view_proj[3];
pos += view_proj[0] * vertex.x;
pos += view_proj[1] * vertex.y;
Shader programming tips #4
Monday, April 27, 2009 | Permalink
The depth buffer is increasingly being used for more than just hidden surface removal. One of the more interesting uses is to find the position of already rendered geometry, for instance in deferred rendering, but also in plain old forward rendering. The easiest way to accomplish this is something like this:
float4 pos = float4(In.Position.xy, sampled_depth, 1.0);
float4 cpos = mul(pos, inverse_view_proj_scale_bias);
float3 world_pos = cpos.xyz / cpos.w;
The inverse_view_proj_scale matrix is the inverse of the view_proj matrix multiplied with a scale_bias matrix that brings the In.Position.xy from [0..w, 0..h] into [-1..1, -1..1] range.
The same technique can of course also be used to compute the view position instead of the world position. In many cases you're only interested in the view Z coordinate though, for instance for fading soft particles, fog distance computations, depth of field etc. While you could execute the above code and just use the Z coordinate this is more work than necessary in most cases. Unless you have a non-standard projection matrix you can do this in just two scalar instructions:
float view_z = 1.0 / (sampled_depth * ZParams.x + ZParams.y);
ZParams is a float2 constant you pass from the application containing the following values:
ZParams.x = 1.0 / far - 1.0 / near;
ZParams.y = 1.0 / near;
If you're using a reversed projection matrix with Z=0 at far plane and Z=1 at near plane you can just swap near and far in the above computation.
Shader programming tips #5
Friday, May 1, 2009 | Permalink
In the comments to Shader programming tips #4 Java Cool Dude mentioned that for fullscreen passes you can pass an interpolated direction vector and use the linearized depth to compute the world position. Basically with view_z computed using the math in #4 you compute:
float3 world_pos = cam_pos + In.dir * view_z;
This amounts to only two scalar operations and one float3 to carry out.
What about regular non-fullscreen passes? Turns out you can do that as well using a nice DX10 feature. The problem is that we need to interpolate the direction vector in screen space, rather than doing perspective correction. For screen aligned primitives it's the same thing, so it works out in this case, but for "normal" mesh data it's a completely different story. In DX10, and also in GLSL, there's now a noperspective keyword you can add to your interpolator, which changes the interpolation mode to eliminate the perspective correction, thus giving you an interpolation that's linear in screen space instead.
How do we compute the direction vector? Just take the position you're writing out from the vertex shader and push it to the far plane. Depending on if you're using a reversed projection matrix or not you either want Z=0 or Z=1, which can be done with using float4(Out.Position.xy, 0, Out.Position.w) or Out.Position.xyww respectively in homogenous coordinates. Transform this vector with the inverse view_proj matrix to get the world position of the far plane equivalent of the point in world space. Now subtract cam_pos from this and that's the direction vector. Instead of subtracting the cam_pos in the vertex shader you can just bake that into the same matrix and get it for free. The resulting vertex shader snippet for this is something like this:
float4 dir = mul(view_proj_inv, Out.position.xyww);
Out.dir = dir.xyz / dir.w;
Finally, note that view_z as computed in #4 goes from 0 at the camera to far_plane at the far clipping plane. For this computation we need it to be 1.0 at the far clipping plane, which can be done by simply multiplying ZParams with far_plane. Alternatively you can divide Out.dir with far_plane in the vertex shader.