I’ve been trying to catch up what hacks GPU vendors have exposed in Direct3D9, and turns out there’s a lot of them!
If you know more hacks or more details, please let me know in the comments!
Most hacks are exposed as custom (“FOURCC”) formats. So to check for that, you do CheckDeviceFormat
. Here’s the list (Usage column codes: DS=DepthStencil, RT=RenderTarget; Resource column codes: tex=texture, surf=surface). More green = more hardware support.
Format | Usage | Resource | Description | NVIDIA GeForce | ATI Radeon | Intel |
---|---|---|---|---|---|---|
Shadow mapping | ||||||
D3DFMT_D16 | DS | tex | Sample depth buffer directly as shadow map. | 3+ | HD 2xxx+ | 965+ |
D3DFMT_D24X8 | DS | tex | 3+ | HD 2xxx+ | 965+ | |
Depth Buffer As Texture | ||||||
DF16 | DS | tex | Read depth buffer as texture. | 9500+ | G45+ | |
DF24 | DS | tex | X1300+ | SB+ | ||
INTZ | DS | tex | 8+ | HD 4xxx+ | G45+ | |
RAWZ | DS | tex | 6 & 7 | |||
Anti-Aliasing related | ||||||
RESZ | RT | surf | Resolve MSAA’d depth stencil surface into non-MSAA’d depth texture. | HD 4xxx+ | G45+ | |
ATOC | 0 | surf | Transparency anti-aliasing. | 7+ | SB+ | |
SSAA | 0 | surf | 7+ | |||
All ATI SM2.0+ hardware | 9500+ | |||||
n/a | Coverage Sampled Anti-Aliasing[6] | 8+ | ||||
Texturing | ||||||
ATI1 | 0 | tex | ATI1n & ATI2n texture compression formats. | 8+ | X1300+ | G45+ |
ATI2 | 0 | tex | 6+ | 9500+ | G45+ | |
DF24 | DS | tex | Fetch 4: when sampling 1 channel texture, return four touched texel values[1]. Check for DF24 support. | X1300+ | SB+ | |
Misc | ||||||
NULL | RT | surf | Dummy render target surface that does not consume video memory. | 6+ | HD 4xxx+ | HD+ |
NVDB | 0 | surf | Depth Bounds Test. | 6+ | ||
R2VB | 0 | surf | Render into vertex buffer. | 6 & 7 | 9500+ | |
INST | 0 | surf | Geometry Instancing on pre-SM3.0 hardware. | 9500+ |
Native support for shadow map sampling & filtering was introduced ages ago (GeForce 3) by NVIDIA. Turns out ATI also implemented the same feature for it’s DX10 level cards. Intel also supports it on Intel 965 (aka GMA X3100, the shader model 3 card) and later (G45/X4500/HD) cards.
The usage is quite simple; just create a texture with regular depth/stencil format and render into it. When reading from the texture, one extra component in texture coordinates will be the depth to compare with. Compared & filtered result will be returned.
Also useful:
For some rendering schemes (anything with “deferred”) or some effects (SSAO, depth of field, volumetric fog, …) having access to a depth buffer is needed. If native depth buffer can be read as a texture, this saves both memory and a rendering pass or extra output for MRTs.
Depending on hardware, this can be achieved via INTZ, RAWZ, DF16 or DF24 formats:
Also useful when using depth textures:
Caveats:
Direct equivalent of GL_EXT_depth_bounds_test OpenGL extension. See [3] for more information.
NVIDIA exposes two controls: transparency multisampling (ATOC) and transparency supersampling (SSAA) [5]. ATI says that all Radeons since 9500 support “alpha to coverage” [1]. Intel supports ATOC with SandyBridge (GMA HD 2000/3000) GPUs.
Similar to “stream out” or “memexport” in other APIs/platforms. See [2] for more information. Apparently some NVIDIA GPUs (or drivers?) support this as well.
Instancing is supported on all Shader Model 3.0 hardware by Direct3D 9.0c, so there’s no extra hacks necessary there. ATI has exposed a capability to enable instancing on their Shader Model 2.0 hardware as well. Check for “INST” support, and do dev->SetRenderState (D3DRS_POINTSIZE, kFourccINST); at startup to enable instancing.
I can’t find any document on instancing from AMD now. Other references: [7] and [8].
Compressed texture formats. ATI1n is known as BC4 format in DirectX 10 land; ATI2n as BC5 or 3Dc. Since they are just DX10 formats, support for this is quite widespread, with NVIDIA exposing it a while ago and Intel exposing it recently (drivers 15.17 or higher).
Thing to keep in mind: when DX9 allocates the mip chain, they check if the format is a known compressed format and allocate the appropriate space for the smallest mip levels. For example, a 1x1 DXT1 compressed level actually takes up 8 bytes, as the block size is fixed at 4x4 texels. This is true for all block compressed formats. Now when using the hacked formats DX9 doesn’t know it’s a block compression format and will only allocate the number of bytes the mip would have taken, if it weren’t compressed. For example a 1x1 ATI1n format will only have 1 byte allocated. What you need to do is to stop the mip chain before the size of the either dimension shrinks below the block dimensions otherwise you risk having memory corruption.
Another thing to keep in mind: on Vista+ (WDDM) driver model, textures in these formats will still consume application address space. Most regular textures like DXT5 don’t take up additional address space in WDDM (see here). For some reason ATI1n and ATI2n textures on D3D9 are deemed lockable.
All this information gathered mostly from:
原文链接:http://aras-p.info/texts/D3D9GPUHacks.html