本文档主要是对Unity官方手册的个人理解与总结(其实以翻译记录为主:>)
仅作为个人学习使用,不得作为商业用途,欢迎转载,并请注明出处。
文章中涉及到的操作都是基于Unity2018.3版本
参考链接:https://docs.unity3d.com/Manual/OptimizingGraphicsPerformance.html
Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game’s rendering.
良好的性能是许多游戏成功的关键。下面是一些游戏渲染速度最大化的简单指南。
Locate high graphics impact
定位高级图形效果
The graphical parts of your game can primarily impact on two systems of the computer: the GPU and the CPU. The first rule of any optimization is to find where the performance problem is, because strategies for optimizing for GPU vs. CPU are quite different (and can even be opposite - for example, it’s quite common to make the GPU do more work while optimizing for CPU, and vice versa).
游戏的图形部分主要影响计算机的两个系统:GPU和CPU。任何优化的第一条守则就是找出性能问题所在,因为优化GPU和CPU的策略是完全不同的(甚至可能是相反的——例如,优化CPU时让GPU做更多的工作是很常见的,反之亦然)。
Common bottlenecks and ways to check for them:
常见的瓶颈和检查它们的方法:
Less-common bottlenecks:
不常见的瓶颈:
CPU optimization
CPU优化
To render objects on the screen, the CPU has a lot of processing work to do: working out which lights affect that object, setting up the shader and shader parameters, and sending drawing commands to the graphics driver, which then prepares the commands to be sent off to the graphics card.
为了渲染对象到屏幕上,CPU有很多处理工作要做:灯光对对象的影响,设置着色器和着色器参数,发送绘制命令给显卡驱动,然后准备被送到显卡的命令。
All this “per object” CPU usage is resource-intensive, so if you have lots of visible objects, it can add up. For example, if you have a thousand triangles, it is much easier on the CPU if they are all in one mesh, rather than in one mesh per triangle (adding up to 1000 meshes). The cost of both scenarios on the GPU is very similar, but the work done by the CPU to render a thousand objects (instead of one) is significantly higher.
所有“每个对象”的CPU用法都是资源密集型的,所以如果您有很多可见对象,那么它就会累加起来。例如,如果您有1000个三角形,那么如果它们都在一个网格中,而不是在每个三角形分别在一个网格中(加起来就是1000个网格),CPU上处理要容易得多。而在GPU上两种情况的成本非常相似,但CPU渲染1000个对象(而不是一个)所做的工作要高得多。
Reduce the visible object count. To reduce the amount of work the CPU needs to do:
减少可见对象的数量。为了减少CPU需要做的工作量:
Combine objects together so that each mesh has at least several hundred triangles and uses only one Material
for the entire mesh. Note that combining two objects which don’t share a material does not give you any performance increase at all. The most common reason for requiring multiple materials is that two meshes don’t share the same textures; to optimize CPU performance, ensure that any objects you combine share the same textures.
合并物体,这样每个网格至少有几百个三角形,并且对整个网格只使用一种材质。请注意,合并两个不共享一个材质的对象根本不会给您带来任何性能提升。需要多种材质的最常见原因是两个网格的纹理不相同;要优化CPU性能,请确保组合的任何对象可以共享相同的纹理。
When using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense. See the Lighting performance section below to learn how to manage this.
当在正向渲染路径中使用许多像素光时,在某些情况下合并对象可能没有意义。请参阅下面的光照性能部分以了解如何管理这一点。
GPU: Optimizing model geometry
GPU: 优化模型几何体
There are two basic rules for optimizing the geometry of a model:
优化模型几何体有两个基本规则:
Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the number of distinct corner points that make up a model (known as the geometric vertex count). For a graphics card, however, some geometric vertices need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is usually higher than the count given by the 3D application.
注意,图形硬件必须处理的顶点的实际数量通常与3D应用程序报告的数量不一样。建模应用程序通常显示组成模型的不同角点的数量(称为几何顶点计数)。然而,对于显卡,为了渲染的目的,一些几何顶点需要被分割成两个或多个逻辑顶点。如果一个顶点有多个法线、UV坐标或顶点颜色,它必须被分割。因此,Unity中的顶点计数通常高于3D应用程序给出的计数。
While the amount of geometry in the models is mostly relevant for the GPU, some features in Unity also process models on the CPU (for example, mesh skinning).
虽然模型中几何图形的数量主要与GPU有关,但Unity中的一些特性也在CPU上处理模型(例如,网格蒙皮)。
Lighting performance
光照性能
The fastest option is always to create lighting that doesn’t need to be computed at all. To do this, use Lightmapping to “bake” static lighting just once, instead of computing it each frame. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:
最快的选择总是创建根本不需要计算的光照。要做到这一点,使用Lightmapping去“烘焙”一次静态光照,而不是每帧计算。生成环境光照图的过程只需要比在Unity场景中放置一盏灯稍微长一点,但是:
In many cases you can apply simple tricks instead of adding multiple extra lights. For example, instead of adding a light that shines straight into the camera to give a Rim Lighting effect, add a dedicated Rim Lighting computation directly into your shaders (see Surface Shader Examples to learn how to do this).
在许多情况下,您可以应用简单的技巧,而不是添加多个额外的灯光。例如,不要直接在相机中添加光源来产生边缘光照效果,而是直接在你的着色器中添加一个专用的边缘光照计算(请参阅表面着色器示例来学习如何做到这一点)。
Lights in forward rendering
Per-pixel dynamic lighting adds significant rendering work to every affected pixel, and can lead to objects being rendered in multiple passes. Avoid having more than one Pixel Light illuminating any single object on less powerful devices, like mobile or low-end PC GPUs, and use lightmaps to light static objects instead of calculating their lighting every frame. Per-vertex dynamic lighting can add significant work to vertex transformations, so try to avoid situations where multiple lights illuminate a single object.
像素动态光增加了大量的渲染工作到每一个受影响的像素,并可能导致对象在多个通道上渲染。避免在硬件不够强大的设备(如移动设备或低端PC GPUs)上使用超过一个像素光来照亮单个对象,并使用光照图来照静态物体,而不是每帧计算光照。顶点动态光会增加顶点转换的工作量,所以尽量避免多个光照去照一个对象的情况。
Avoid combining meshes that are far enough apart to be affected by different sets of pixel lights. When you use pixel lighting, each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object are taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, so nothing is gained by combining meshes.
避免合并那些间距远到足以受到不同像素光影响的网格。当你使用像素光照时,每一个网格都需要渲染与像素光一样数量的次数。如果你把两个相距很远的网格组合起来,也会增加组合对象的起作用大小。在渲染过程中,所有照亮这个组合对象的像素光都被考虑在内,因此需要进行的渲染通道的数量也会增加。通常,渲染组合对象的通道数量是每个独立对象的通道次数的总和,因此组合网格不会得到任何益处。
During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights, and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is - and some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important have a lower rendering overhead.
在渲染过程中,Unity会找到网格周围的所有光照,并计算这些光照对网格的最大影响。Quality Settings用于修改有多少光作为像素光,有多少光作为顶点光。每一盏灯的重要性都是基于它离网格有多远以及它的光照强度有多强来计算的–有些光源比另外光源更重要,这完全是基于游戏环境。因此,每盏光源都有一个渲染模式设置,可以设置为重要或不重要;标记为不重要的灯光有较低的渲染开销。
Example: Consider a driving game in which the player’s car is driving in the dark with headlights switched on. The headlights are probably the most visually significant light source in the game, so their Render Mode should be set to Important. There may be other lights in the game that are less important, like other cars’ rear lights or distant lampposts, and which don’t improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important to avoid wasting rendering capacity in places where it has little benefit.
举例:考虑一个驾驶游戏,玩家的车在黑暗中驾驶,前灯开着。前灯可能是游戏中最重要的视觉光源,所以它们的渲染模式应该设置为重要。游戏中可能还有其他不太重要的灯光,比如其他汽车的尾灯或远处的灯柱,它们不能通过像素光来改善视觉效果。这种灯的渲染模式可以安全地设置为不重要,以避免浪费渲染能力在没有什么好处的地方。
Optimizing per-pixel lighting saves both the CPU and GPU work: the CPU has fewer draw calls to do, and the GPU has fewer vertices to process and pixels to rasterize for all the additional object renders.
优化像素光照可以节省CPU和GPU的工作:CPU有更少的DC(绘制调用)要做,GPU有更少的顶点处理和像素光栅化所有额外的对象渲染。
GPU: Texture compression and mipmaps
GPU:贴图压缩与mipmap
Use Compressed textures to decrease the size of your textures. This can result in faster load times, a smaller memory footprint, and dramatically increased rendering performance. Compressed textures only use a fraction of the memory bandwidth needed for uncompressed 32-bit RGBA textures.
使用压缩纹理来减小纹理的大小。这会导致更快的加载时间,更小的内存占用,并显著提高渲染性能。压缩纹理只使用未压缩的32位RGBA纹理所需内存带宽的一小部分。
Texture mipmaps
Always enable Generate mipmaps for textures used in a 3D scene. A mipmap texture enables the GPU to use a lower resolution texture for smaller triangles.This is similar to how texture compression can help limit the amount of texture data transfered when the GPU is rendering.
总是 为3D场景中使用的纹理启用生成mipmaps。mipmap纹理使GPU能够为较小的三角形使用较低分辨率的纹理。这与纹理压缩可以帮助GPU渲染时降低传输的纹理数据量类似。
The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a 2D game.
这个规则的唯一例外是当一个texel(纹理像素)以1:1的比例映射到渲染的屏幕像素时,就像UI元素或2D游戏中一样。
LOD and per-layer cull distances
LOD和层剔除距离
Culling objects involves making objects invisible. This is an effective way to reduce both the CPU and GPU load.
剔除对象涉及到使对象不可见。这是减少CPU和GPU负载的有效方法。
In many games, a quick and effective way to do this without compromising the player experience is to cull small objects more aggressively than large ones. For example, small rocks and debris could be made invisible at long distances, while large buildings would still be visible.
在许多游戏中,在不损害玩家体验的情况下快速有效地完成这项任务的方法是更积极地剔除小对象而不是大对象。例如,小石头和碎石在很远的距离上是看不见的,而大的建筑物仍然是可见的。
There are a number of ways you can achieve this:
有很多方法可以做到这一点:
Realtime shadows
实时阴影
Realtime shadows are nice, but they can have a high impact on performance, both in terms of extra draw calls for the CPU and extra processing on the GPU. For further details, see the Light Performance page.
实时阴影很好,但是它们对性能有很大的影响,无论是CPU的额外DC还是GPU的额外处理。有关详细信息,请参阅性能页面。
GPU: Tips for writing high-performance shaders
GPU:高性能shader的写法小提示
Different platforms have vastly different performance capabilities; a high-end PC GPU can handle much more in terms of graphics and shaders than a low-end mobile GPU. The same is true even on a single platform; a fast GPU is dozens of times faster than a slow integrated GPU.
不同的平台具有非常不同的性能能力;高端PC GPU在图形和着色方面比低端移动GPU能处理更多。即使在单一平台上也是如此;一个快速的GPU比一个缓慢的集成GPU快几十倍。
GPU performance on mobile platforms and low-end PCs is likely to be much lower than on your development machine. It’s recommended that you manually optimize your shaders to reduce calculations and texture reads, in order to get good performance across low-end GPU machines. For example, some built-in Unity shaders have “mobile” equivalents that are much faster, but have some limitations or approximations.
在移动平台和低端PCs 上的GPU性能可能比在开发机器上的性能要低得多。建议您手动优化着色器以减少计算和纹理读取,以便在低端GPU机器上获得良好的性能。例如,一些内置的Unity着色器有“mobile”的等价物,这些等价物要快得多,但是有一些限制或近似性。
Below are some guidelines for mobile and low-end PC graphics cards:
以下是一些关于移动和低端PC显卡的指南:
Complex mathematical operations
复杂的数学运算
Transcendental mathematical functions (such as pow, exp, log, cos, sin, tan) are quite resource-intensive, so avoid using them where possible. Consider using lookup textures as an alternative to complex math calculations if applicable.
高级的数学函数(如pow、exp、log、cos、sin、tan)是非常消耗资源的,所以尽量避免使用它们。如果适用的话,考虑使用查找纹理作为复杂数学计算的替代方法。
Avoid writing your own operations (such as normalize, dot, inversesqrt). Unity’s built-in options ensure that the driver can generate much better code. Remember that the Alpha Test (discard) operation often makes your fragment shader slower.
避免编写自己的操作(如normalize、dot、inversesqrt)。Unity的内置选项确保驱动程序可以生成更好的代码。记住Alpha Test (operation )操作通常会使你的片断着色器变慢。
Floating point precision
浮点精度
While the precision (float vs half vs fixed) of floating point variables is largely ignored on desktop GPUs, it is quite important to get a good performance on mobile GPUs. See the Shader Data Types and Precision page for details.
虽然在桌面GPUs上,浮点变量的精度(float vs half vs fixed)在很大程度上被忽略了,但是在移动GPUs上获得良好的性能是非常重要的。有关详细信息,请参阅着色器数据类型和精度页面。
For further details about shader performance, see the Shader Performance page.
有关着色器性能的详细信息,请参阅着色器性能页面。
Simple checklist to make your game faster
简单的清单,让你的游戏更快