Optimizing graphics performance 优化图形性能 性能系列1

Optimizing graphics performance 优化图形性能

本文档主要是对Unity官方手册的个人理解与总结(其实以翻译记录为主:>)
仅作为个人学习使用,不得作为商业用途,欢迎转载,并请注明出处。
文章中涉及到的操作都是基于Unity2018.3版本
参考链接:https://docs.unity3d.com/Manual/OptimizingGraphicsPerformance.html

Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game’s rendering.
良好的性能是许多游戏成功的关键。下面是一些游戏渲染速度最大化的简单指南。

Locate high graphics impact
定位高级图形效果

The graphical parts of your game can primarily impact on two systems of the computer: the GPU and the CPU. The first rule of any optimization is to find where the performance problem is, because strategies for optimizing for GPU vs. CPU are quite different (and can even be opposite - for example, it’s quite common to make the GPU do more work while optimizing for CPU, and vice versa).
游戏的图形部分主要影响计算机的两个系统:GPU和CPU。任何优化的第一条守则就是找出性能问题所在,因为优化GPU和CPU的策略是完全不同的(甚至可能是相反的——例如,优化CPU时让GPU做更多的工作是很常见的,反之亦然)。

Common bottlenecks and ways to check for them:
常见的瓶颈和检查它们的方法:

  • GPU is often limited by fillrate or memory bandwidth.
    GPU经常受到填充率或内存带宽的限制。
    • Lower the display resolution and run the game. If a lower display resolution makes the game run faster, you may be limited by fillrate on the GPU.
      降低显示分辨率运行游戏。如果较低的显示分辨率使游戏运行得更快,您可能会受到GPU上填充率的限制。
  • CPU is often limited by the number of batches that need to be rendered.
    CPU通常受需要渲染的批处理数量的限制。
    • Check “batches” in the Rendering Statistics window. The more batches are being rendered, the higher the cost to the CPU.
      在渲染统计信息窗口中检查“batches”。渲染的批次越多,CPU的成本就越高。

Less-common bottlenecks:
不常见的瓶颈:

  • The GPU has too many vertices to process. The number of vertices that is acceptable to ensure good performance depends on the GPU and the complexity of vertex shaders. Generally speaking, aim for no more than 100,000 vertices on mobile. A PC manages well even with several million vertices, but it is still good practice to keep this number as low as possible through optimization.
    GPU有太多的顶点需要处理。为了保证良好的性能,可以接受的顶点数取决于GPU和顶点着色器的复杂性。一般来说,移动设备上的顶点不超过100,000个。即使有几百万个顶点,PC也能很好地管理,但是通过优化使这个数字尽可能低仍然是一个很好的做法。
  • The CPU has too many vertices to process. This could be in skinned meshes, cloth simulation, particles, or other game objects and meshes. As above, it is generally good practice to keep this number as low as possible without compromising game quality. See the section on CPU optimization below for guidance on how to do this.
    CPU有太多的顶点要处理。这可能是蒙皮网格,布料模拟,粒子,或其他游戏对象和网格造成的。如上所述,在不影响游戏质量的前提下,保持这个数字越低越好。关于如何做到这一点,请参阅下面关于CPU优化的部分。
  • If rendering is not a problem on the GPU or the CPU, there may be an issue elsewhere - for example, in your script or physics. Use the Unity Profiler to locate the problem.
    如果渲染在GPU或CPU上不是问题,那么在其他地方可能也会有问题。例如,在脚本或物理中。可使用Unity Profiler定位问题。

CPU optimization
CPU优化

To render objects on the screen, the CPU has a lot of processing work to do: working out which lights affect that object, setting up the shader and shader parameters, and sending drawing commands to the graphics driver, which then prepares the commands to be sent off to the graphics card.
为了渲染对象到屏幕上,CPU有很多处理工作要做:灯光对对象的影响,设置着色器和着色器参数,发送绘制命令给显卡驱动,然后准备被送到显卡的命令。

All this “per object” CPU usage is resource-intensive, so if you have lots of visible objects, it can add up. For example, if you have a thousand triangles, it is much easier on the CPU if they are all in one mesh, rather than in one mesh per triangle (adding up to 1000 meshes). The cost of both scenarios on the GPU is very similar, but the work done by the CPU to render a thousand objects (instead of one) is significantly higher.
所有“每个对象”的CPU用法都是资源密集型的,所以如果您有很多可见对象,那么它就会累加起来。例如,如果您有1000个三角形,那么如果它们都在一个网格中,而不是在每个三角形分别在一个网格中(加起来就是1000个网格),CPU上处理要容易得多。而在GPU上两种情况的成本非常相似,但CPU渲染1000个对象(而不是一个)所做的工作要高得多。

Reduce the visible object count. To reduce the amount of work the CPU needs to do:
减少可见对象的数量。为了减少CPU需要做的工作量:

  • Combine close objects together, either manually or using Unity’s draw call batching.
    将相近的对象组合在一起,手动或使用Unity的draw call批处理。
  • Use fewer materials in your objects by putting separate textures into a larger texture atlas.
    在对象中使用更少的材质,将不同的纹理放到一个更大的纹理图集中。
  • Use fewer things that cause objects to be rendered multiple times (such as reflections, shadows and per-pixel lights).
    使用更少的被多次渲染的物体(如反射,阴影和逐像素光影响的)。

Combine objects together so that each mesh has at least several hundred triangles and uses only one Material
for the entire mesh. Note that combining two objects which don’t share a material does not give you any performance increase at all. The most common reason for requiring multiple materials is that two meshes don’t share the same textures; to optimize CPU performance, ensure that any objects you combine share the same textures.
合并物体,这样每个网格至少有几百个三角形,并且对整个网格只使用一种材质。请注意,合并两个不共享一个材质的对象根本不会给您带来任何性能提升。需要多种材质的最常见原因是两个网格的纹理不相同;要优化CPU性能,请确保组合的任何对象可以共享相同的纹理。

When using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense. See the Lighting performance section below to learn how to manage this.
当在正向渲染路径中使用许多像素光时,在某些情况下合并对象可能没有意义。请参阅下面的光照性能部分以了解如何管理这一点。

GPU: Optimizing model geometry
GPU: 优化模型几何体

There are two basic rules for optimizing the geometry of a model:
优化模型几何体有两个基本规则:

  • Don’t use any more triangles than necessary
    不要用多余的三角形
  • Try to keep the number of UV mapping seams and hard edges (doubled-up vertices) as low as possible
    尽量减少UV贴图接缝和硬边的数量(双顶点)

Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the number of distinct corner points that make up a model (known as the geometric vertex count). For a graphics card, however, some geometric vertices need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is usually higher than the count given by the 3D application.
注意,图形硬件必须处理的顶点的实际数量通常与3D应用程序报告的数量不一样。建模应用程序通常显示组成模型的不同角点的数量(称为几何顶点计数)。然而,对于显卡,为了渲染的目的,一些几何顶点需要被分割成两个或多个逻辑顶点。如果一个顶点有多个法线、UV坐标或顶点颜色,它必须被分割。因此,Unity中的顶点计数通常高于3D应用程序给出的计数。

While the amount of geometry in the models is mostly relevant for the GPU, some features in Unity also process models on the CPU (for example, mesh skinning).
虽然模型中几何图形的数量主要与GPU有关,但Unity中的一些特性也在CPU上处理模型(例如,网格蒙皮)。

Lighting performance
光照性能

The fastest option is always to create lighting that doesn’t need to be computed at all. To do this, use Lightmapping to “bake” static lighting just once, instead of computing it each frame. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:
最快的选择总是创建根本不需要计算的光照。要做到这一点,使用Lightmapping去“烘焙”一次静态光照,而不是每帧计算。生成环境光照图的过程只需要比在Unity场景中放置一盏灯稍微长一点,但是:

  • It runs a lot faster (2–3 times faster for 2-per-pixel lights) 它的运行速度要快得多(2-3倍于2个像素光)
  • It looks a lot better, as you can bake global illumination and the lightmapper can smooth the results 它看起来更好,因为你可以烘焙全局光照,而且可以平滑光照图结果

In many cases you can apply simple tricks instead of adding multiple extra lights. For example, instead of adding a light that shines straight into the camera to give a Rim Lighting effect, add a dedicated Rim Lighting computation directly into your shaders (see Surface Shader Examples to learn how to do this).
在许多情况下,您可以应用简单的技巧,而不是添加多个额外的灯光。例如,不要直接在相机中添加光源来产生边缘光照效果,而是直接在你的着色器中添加一个专用的边缘光照计算(请参阅表面着色器示例来学习如何做到这一点)。

Lights in forward rendering

Per-pixel dynamic lighting adds significant rendering work to every affected pixel, and can lead to objects being rendered in multiple passes. Avoid having more than one Pixel Light illuminating any single object on less powerful devices, like mobile or low-end PC GPUs, and use lightmaps to light static objects instead of calculating their lighting every frame. Per-vertex dynamic lighting can add significant work to vertex transformations, so try to avoid situations where multiple lights illuminate a single object.
像素动态光增加了大量的渲染工作到每一个受影响的像素,并可能导致对象在多个通道上渲染。避免在硬件不够强大的设备(如移动设备或低端PC GPUs)上使用超过一个像素光来照亮单个对象,并使用光照图来照静态物体,而不是每帧计算光照。顶点动态光会增加顶点转换的工作量,所以尽量避免多个光照去照一个对象的情况。

Avoid combining meshes that are far enough apart to be affected by different sets of pixel lights. When you use pixel lighting, each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object are taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, so nothing is gained by combining meshes.
避免合并那些间距远到足以受到不同像素光影响的网格。当你使用像素光照时,每一个网格都需要渲染与像素光一样数量的次数。如果你把两个相距很远的网格组合起来,也会增加组合对象的起作用大小。在渲染过程中,所有照亮这个组合对象的像素光都被考虑在内,因此需要进行的渲染通道的数量也会增加。通常,渲染组合对象的通道数量是每个独立对象的通道次数的总和,因此组合网格不会得到任何益处。

During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights, and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is - and some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important have a lower rendering overhead.
在渲染过程中,Unity会找到网格周围的所有光照,并计算这些光照对网格的最大影响。Quality Settings用于修改有多少光作为像素光,有多少光作为顶点光。每一盏灯的重要性都是基于它离网格有多远以及它的光照强度有多强来计算的–有些光源比另外光源更重要,这完全是基于游戏环境。因此,每盏光源都有一个渲染模式设置,可以设置为重要或不重要;标记为不重要的灯光有较低的渲染开销。

Example: Consider a driving game in which the player’s car is driving in the dark with headlights switched on. The headlights are probably the most visually significant light source in the game, so their Render Mode should be set to Important. There may be other lights in the game that are less important, like other cars’ rear lights or distant lampposts, and which don’t improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important to avoid wasting rendering capacity in places where it has little benefit.
举例:考虑一个驾驶游戏,玩家的车在黑暗中驾驶,前灯开着。前灯可能是游戏中最重要的视觉光源,所以它们的渲染模式应该设置为重要。游戏中可能还有其他不太重要的灯光,比如其他汽车的尾灯或远处的灯柱,它们不能通过像素光来改善视觉效果。这种灯的渲染模式可以安全地设置为不重要,以避免浪费渲染能力在没有什么好处的地方。

Optimizing per-pixel lighting saves both the CPU and GPU work: the CPU has fewer draw calls to do, and the GPU has fewer vertices to process and pixels to rasterize for all the additional object renders.
优化像素光照可以节省CPU和GPU的工作:CPU有更少的DC(绘制调用)要做,GPU有更少的顶点处理和像素光栅化所有额外的对象渲染。

GPU: Texture compression and mipmaps
GPU:贴图压缩与mipmap

Use Compressed textures to decrease the size of your textures. This can result in faster load times, a smaller memory footprint, and dramatically increased rendering performance. Compressed textures only use a fraction of the memory bandwidth needed for uncompressed 32-bit RGBA textures.
使用压缩纹理来减小纹理的大小。这会导致更快的加载时间,更小的内存占用,并显著提高渲染性能。压缩纹理只使用未压缩的32位RGBA纹理所需内存带宽的一小部分。

Texture mipmaps
Always enable Generate mipmaps for textures used in a 3D scene. A mipmap texture enables the GPU to use a lower resolution texture for smaller triangles.This is similar to how texture compression can help limit the amount of texture data transfered when the GPU is rendering.
总是 为3D场景中使用的纹理启用生成mipmaps。mipmap纹理使GPU能够为较小的三角形使用较低分辨率的纹理。这与纹理压缩可以帮助GPU渲染时降低传输的纹理数据量类似。

The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a 2D game.
这个规则的唯一例外是当一个texel(纹理像素)以1:1的比例映射到渲染的屏幕像素时,就像UI元素或2D游戏中一样。

LOD and per-layer cull distances
LOD和层剔除距离

Culling objects involves making objects invisible. This is an effective way to reduce both the CPU and GPU load.
剔除对象涉及到使对象不可见。这是减少CPU和GPU负载的有效方法。

In many games, a quick and effective way to do this without compromising the player experience is to cull small objects more aggressively than large ones. For example, small rocks and debris could be made invisible at long distances, while large buildings would still be visible.
在许多游戏中,在不损害玩家体验的情况下快速有效地完成这项任务的方法是更积极地剔除小对象而不是大对象。例如,小石头和碎石在很远的距离上是看不见的,而大的建筑物仍然是可见的。

There are a number of ways you can achieve this:
有很多方法可以做到这一点:

  • Use the Level Of Detail system LOD系统
  • Manually set per-layer culling distances on the camera 手动设置相机上每层的剔除距离
  • Put small objects into a separate layer and set up per-layer cull distances using the Camera.layerCullDistances script function 把小物体放到一个单独的层中,用Camera.layerCullDistances设置每层的裁剪距离

Realtime shadows
实时阴影

Realtime shadows are nice, but they can have a high impact on performance, both in terms of extra draw calls for the CPU and extra processing on the GPU. For further details, see the Light Performance page.
实时阴影很好,但是它们对性能有很大的影响,无论是CPU的额外DC还是GPU的额外处理。有关详细信息,请参阅性能页面。

GPU: Tips for writing high-performance shaders
GPU:高性能shader的写法小提示

Different platforms have vastly different performance capabilities; a high-end PC GPU can handle much more in terms of graphics and shaders than a low-end mobile GPU. The same is true even on a single platform; a fast GPU is dozens of times faster than a slow integrated GPU.
不同的平台具有非常不同的性能能力;高端PC GPU在图形和着色方面比低端移动GPU能处理更多。即使在单一平台上也是如此;一个快速的GPU比一个缓慢的集成GPU快几十倍。

GPU performance on mobile platforms and low-end PCs is likely to be much lower than on your development machine. It’s recommended that you manually optimize your shaders to reduce calculations and texture reads, in order to get good performance across low-end GPU machines. For example, some built-in Unity shaders have “mobile” equivalents that are much faster, but have some limitations or approximations.
在移动平台和低端PCs 上的GPU性能可能比在开发机器上的性能要低得多。建议您手动优化着色器以减少计算和纹理读取,以便在低端GPU机器上获得良好的性能。例如,一些内置的Unity着色器有“mobile”的等价物,这些等价物要快得多,但是有一些限制或近似性。

Below are some guidelines for mobile and low-end PC graphics cards:
以下是一些关于移动和低端PC显卡的指南:

Complex mathematical operations
复杂的数学运算

Transcendental mathematical functions (such as pow, exp, log, cos, sin, tan) are quite resource-intensive, so avoid using them where possible. Consider using lookup textures as an alternative to complex math calculations if applicable.
高级的数学函数(如pow、exp、log、cos、sin、tan)是非常消耗资源的,所以尽量避免使用它们。如果适用的话,考虑使用查找纹理作为复杂数学计算的替代方法。

Avoid writing your own operations (such as normalize, dot, inversesqrt). Unity’s built-in options ensure that the driver can generate much better code. Remember that the Alpha Test (discard) operation often makes your fragment shader slower.
避免编写自己的操作(如normalize、dot、inversesqrt)。Unity的内置选项确保驱动程序可以生成更好的代码。记住Alpha Test (operation )操作通常会使你的片断着色器变慢。

Floating point precision
浮点精度

While the precision (float vs half vs fixed) of floating point variables is largely ignored on desktop GPUs, it is quite important to get a good performance on mobile GPUs. See the Shader Data Types and Precision page for details.
虽然在桌面GPUs上,浮点变量的精度(float vs half vs fixed)在很大程度上被忽略了,但是在移动GPUs上获得良好的性能是非常重要的。有关详细信息,请参阅着色器数据类型和精度页面。

For further details about shader performance, see the Shader Performance page.
有关着色器性能的详细信息,请参阅着色器性能页面。

Simple checklist to make your game faster
简单的清单,让你的游戏更快

  • Keep the vertex count below 200K and 3M per frame when building for PC (depending on the target GPU).
    在为PC创作时,保持顶点数在每帧200K和3M以下(取决于目标GPU)。
  • If you’re using built-in shaders, pick ones from the Mobile or Unlit categories. They work on non-mobile platforms as well, but are simplified and approximated versions of the more complex shaders.
    如果你使用内置的着色器,从Mobile 或Unlit 的类别中选择一个。它们也可以在非移动平台上工作,是更复杂的着色器的简化版或近似版本。
  • Keep the number of different materials per scene low, and share as many materials between different objects as possible.
    保持每个场景中不同材质的数量尽可能少,并在不同的对象之间共享尽可能多的材质。
  • Set the Static property on a non-moving object to allow internal optimizations like static batching.
    在不移动的对象上设置静态属性,以允许内部优化,比如静态批处理。
  • Only have a single (preferably directional) pixel light affecting your geometry, rather than multiples. Bake lighting rather than using dynamic lighting.
    只有一个单一(最好是方向)像素光影响你的几何体,而不是多个。使用烘焙光照而不是动态光照。
  • Use compressed texture formats when possible, and use 16-bit textures over 32-bit textures.
    尽可能使用压缩纹理格式,尽可能使用16位纹理替代32位纹理。
  • Avoid using fog where possible.
    尽可能避免使用雾。
  • Use Occlusion Culling to reduce the amount of visible geometry and draw-calls in cases of complex static scenes with lots of occlusion. Design your levels with occlusion culling in mind.
    在有大量遮挡的复杂静态场景中,使用遮挡剔除来减少可见几何体和绘制调用的数量。设计关卡时要考虑到遮挡剔除。
  • Use skyboxes to “fake” distant geometry.
    使用天空盒来“伪造”远方的几何体。
  • Use pixel shaders or texture combiners to mix several textures instead of a multi-pass approach.
    使用像素着色器或纹理组合器来混合几种纹理,而不是多通道的方法。
  • Use half precision variables where possible.
    尽可能使用half半精度变量。
  • Minimize use of complex mathematical operations such as pow, sin and cos in pixel shaders.
    尽量减少使用复杂的数学运算,如pow, sin和cos在像素着色器中。
  • Use fewer textures per fragment.
    每个片段着色器使用更少的纹理。

你可能感兴趣的:(Performance,Unity,Graphics,Performance,渲染性能,文档翻译,performance,graphics,unity,shader,docment)