Performance Tips when Writing Shaders 编写着色器时的性能提示
原文地址:http://unity3d.com/support/documentation/Components/SL-ShaderPerformance.html
如有翻译不当之处,还请帮忙指出!
Compute only things that you need; anything that is not actually needed can be eliminated. For example, supporting per-material color is nice to make a shader more flexible, but if you always leave that color set to white then it's useless computations performed for each vertex or pixel rendered on screen.
仅仅计算你所需要的部分,消除任何实际上不必要的部分。例如,支持逐材质着色是一件非常好的事情,并且这样做可以让着色器更加灵活易用,但是如果你总是将颜色设置为白色,那么逐顶点或逐像素的计算将是无用的。
Another thing to keep in mind is frequency of computations. Usually there are many more pixels rendered (hence their pixel shaders executed) than there are vertices (vertex shader executions); and more vertices than objects being rendered. So generally if you can, move computations out of pixel shader into the vertex shader; or out of shaders completely and set the values once from a script.
还有一件需要注意的事情是计算频率。通常地说,所需要渲染的像素(像素着色器执行)个数要比所需要渲染的顶点(顶点着色器执行)个数要多,同时,所需要渲染的顶点个数也要比模型的个数要多。所以,一般来说,尽可能地将计算量从像素着色器移到顶点着色器中,或者完全从着色器中移除并从脚本中来赋值。
Surface Shaders are great for writing shaders that interact with lighting. However, their default options are tuned for "general case". In many cases, you can tweak them to make shaders run faster or at least be smaller:
表面着色器非常适于书写与光照相关的着色器。但是,对于“一般情况”,它们的默认选项已经过优化,。在许多情况下,你可以调整它们使其运行得更快,或者即便没有提速,也可以让其更小:
approxview
directive for shaders that use view direction (i.e. Specular) will make view direction be normalized per-vertex instead of per-pixel. This is approximate, but often good enough.
Approxview:指导使用视线方向(比如镜面反射)的着色器逐顶点地归一化视线方向而不是逐像素进行。这是近似的方法,但往往可以得到足够好的效果。
halfasview
for Specular shader types is even faster. Half-vector (halfway between lighting direction and view vector) will be computed and normalized per vertex, and lighting function will already receive half-vector as a parameter instead of view vector.
Halfasview:用于镜面着色器类型,速度甚至更快。半矢量(光源方向向量和视线方向向量加和后的一半)是逐顶点计算并归一化的,同时,光照计算函数将半矢量作为一个参数,而不是采用视线方向矢量。
noforwardadd
will make a shader fully support only one directional light in Forward rendering. The rest of the lights can still have an effect as per-vertex lights or spherical harmonics. This is great to make shader smaller and make sure it always renders in one pass, even with multiple lights present.
Noforwardadd 将完全支持只有一个方向光的前向渲染着色器。其余的光源仍然可以作为每个顶点官员或球面调和光源对场景中的物体产生影响。这种做法非常易于将着色器变小,并且即使场景中存在多个光源,也可以确保它始终渲染一遍。
noambient
will disable ambient lighting and spherical harmonics lights on a shader. This can be slightly faster.
Noambient将在着色器中关闭环境光照和球面调和光源的影响。这可以稍微使着色器提速。
When writing shaders in Cg/HLSL, there are three basic number types: float
, half
and fixed
(as well as vector & matrix variants of them, e.g. half3 and float4x4):
当使用Cg/HLSL来写着色器的时候,主要会用到三种基本的数据类型:float,half和fixed(以及由他们组成的向量和矩阵变量,即half3和float4x4):
float
: high precision floating point. Generally 32 bits, just like float type in regular programming languages.
float:高精度浮点型。一般是32位,就像是正规编程语言中的单精度浮点类型。
half
: medium precision floating point. Generally 16 bits, with a range of -60000 to +60000 and 3.3 decimal digits of precision.
half:中等精度浮点型。一般是16位,范围是-60000到+60000以及3.3 的十进制数字的精度。
fixed
: low precision fixed point. Generally 11 bits, with a range of -2.0 to +2.0 and 1/256th precision.
fixed:低精度浮点型。一般是11位,范围是-2.0到+2.0以及1/256th 精度。
Use lowest precision that is possible; this is especially important on mobile platforms like iOS and Android. Good rules of thumb are:
尽可能地使用最低的精度,这点对于iOS和Android平台特别重要。推荐的经验法则:
For colors and unit length vectors, use fixed
.
对于颜色和单位长度的向量,使用fixed.
For others, use half
if range and precision is fine; otherwise use float
.
对于其他的,如果范围和精度允许的话,使用half,否则使用float。
On mobile platforms, the key is to ensure as much as possible stays in low precision in the fragment shader. On most mobile GPUs, applying swizzles to low precision (fixed/lowp) types is costly; converting between fixed/lowp and higher precision types is quite costly as well.
在移动平台上,关键是在片段着色器中使用尽可能多低精度数据计算。在大多数移动设备的GPU中,在低精度(fixed/lowp)类型上应用swizzles是比较耗时的;同时,在fixed/lowp 和高精度类型之间进行转换也是需要付出很大代价的。
Fixed function AlphaTest or it's programmable equivalent, clip()
, has different performance characteristics on different platforms:
固定函数AlphaTest或者与其等功能的可编程函数clip()在不同的平台上拥有不同的性能特点:
Generally it's a small advantage to use it to cull out totally transparent pixels on most platforms.
一般来说,用它在大多数平台上来剔除完全透明的像素具有一点小小的优势。
However, on PowerVR GPUs found in iOS and some Android devices, alpha testing is expensive. Do not try to use it as "performance optimization" there, it will be slower.
但是,在IOS设备上的PowerVR GPU和一些Android设备中,alpha test是比较耗时的。不要尝试应用它来优化性能,因为它将使性能变得更慢。
On some platforms (mostly mobile GPUs found in iOS and Android devices), using ColorMask to leave out some channels (e.g. ColorMask RGB
) can be expensive, so only use it if really necessary.
在某些平台上(大多是iOS移动设备上的GPU和Android 设备),使用 ColorMask 中删除一些通道(例如 ColorMask RGB) 也是比较昂贵的,因此,除非真有必要,否则不要使用它。