UE提供了两套用于实现实例化渲染的方案，分别是ISM（Instanced Static Mesh）与HISM(Hierarchical ISM)，下面对这两套方案的实现原理做一个大致的介绍。

ISM

ISM是Instanced Static Mesh Rendering的缩写，这个方案的做法是，将大量（甚至一个场景内所有）的相同实例合并成一个DrawCall进行绘制，这个方案存在如下的一些问题：

在离线状态下确定哪些mesh需要被放进一个批次进行渲染，导致在运行时因为视角转动，会有部分实例不可见，但依然需要进入渲染造成的浪费
处于同一个批次中的实例需要具备相同的LOD，这就导致，如果一个批次覆盖范围较大，本来可以通过LOD优化的实例，此时依然需要使用较高LOD，导致浪费
ISM中每个Instance的LOD切换是以整个ISM的Boundingbox尺寸来的，只有当整个ISM的boundingbox尺寸达到跟Instance上设定的屏占比时才会触发LOD切换，这就导致了如果Instance分布比较散，那么Instance的LOD就会永远处于一个高精层面，浪费较大
ISM中不会针对Instance做剔除，只会针对整个ISM做剔除，对于每个Instance虽然设定了两个参数，但是这两个参数是用于Dither LOD处理的：

/** 从这个距离开始，实例开始渐隐 */
UPROPERTY(EditAnywhere, BlueprintReadOnly, Category=Culling)
int32 InstanceStartCullDistance;

/** 从这个距离开始，实例完全被剔除 */
UPROPERTY(EditAnywhere, BlueprintReadOnly, Category=Culling)
int32 InstanceEndCullDistance;

这两个参数的使用逻辑如下所示：

// Distance需要乘以一个Scale
const float MaxDrawDistanceScale = GetCachedScalabilityCVars().ViewDistanceScale;
const float StartDistance = InstancingUserData->StartCullDistance * MaxDrawDistanceScale;
const float EndDistance = InstancingUserData->EndCullDistance * MaxDrawDistanceScale;

// 作为参数传入shader
InstancingFadeOutParams.X = StartDistance; // start cull distance
InstancingFadeOutParams.Y = 1.f / (float)(EndDistance - StartDistance); // cull ratio

// 在Vertex Shader中计算剔除概率
Intermediates.PerInstanceParams.y = 1.0 - saturate((length(InstanceTranslatedLocation) - InstancingFadeOutParams.x) * InstancingFadeOutParams.y); // per-instance fade out amount(remaining)

// 在pixel shader中作为材质的GetPerInstanceFadeAmount属性进行访问
/** Get the per-instance fade-out amount when instancing */
float GetPerInstanceFadeAmount(FMaterialPixelParameters Parameters)
{
#if USE_INSTANCING || USE_INSTANCE_CULLING
    return float(Parameters.PerInstanceParams.y);
#else
    return float(1.0);
#endif
}

关于这部分的分析，在[4]中已经有比较详细的论述，这里就不再展开

HISM

HISM是Hierarchical ISM的缩写，这是基于Cluster聚类的N叉树（称之为Cluster Tree），树上各个节点之间存在层级关系（方便降低剔除消耗），最终提交渲染的只有叶子节点，其他节点的存在则是为了方便剔除。

原理

这个方案的大致原理是将需要渲染的实例用一棵N叉树（ClusterTree）来表示，在渲染的时候，我们会有一个InstanceBuffer，其中存储了待渲染物件的实例信息，在HISM下，可以为每个叶子节点分别提供一个InstanceBuffer，也可以更简单一点，借助图形API可以指定Buffer偏移与长度的能力，使用同一个InstanceBuffer[1]

这棵树每个节点中存储了此节点中对应的实例的起始InstanceID与终止InstanceID，基于双面的InstanceBuffer共用逻辑，我们可以非常方便的实现Cluster（节点）的剔除与渲染数据的提交。

对应到UE的数据结构，这里的整棵树用ClusterTree来表示，其中最关键的FClusterNode代表了树的一个节点：

struct FClusterTree
{
 TArray Nodes;
 TArray SortedInstances;
 TArray InstanceReorderTable;
 int32 OutOcclusionLayerNum = 0;
};
struct FClusterNode
{
 // 当前节点的BoundingBox信息
 FVector3f BoundMin;
 FVector3f BoundMax;
 //当前节点的子节点在ClusterTree的Nodes数组中的索引范围
 int32 FirstChild;
 int32 LastChild;
 // 当前节点所包含的Instance在PerInstanceSMData的索引范围
 int32 FirstInstance;
 int32 LastInstance;

 FVector3f MinInstanceScale;
 FVector3f MaxInstanceScale;
}

//截取部分
class ENGINE_API UInstancedStaticMeshComponent : public UStaticMeshComponent, public ISMInstanceManager
{
 /** Array of instances, bulk serialized. */
 TArray PerInstanceSMData;
}

针对这个方案，还有一些细节需要澄清，总的来说，有两个问题需要回答：

N叉树如何构造
运行时N叉树要如何使用

1. N叉树的构造

给定一套物件实例，我们需要构建出对应的树，这个计算逻辑在UHierarchicalInstancedStaticMeshComponent::BuildTree中可以找到。

按照自顶向下的逻辑，这些实例数据可以组成我们整棵树的根节点，我们首先需要判断当前的实例数据是否需要划分出子节点，如果需要，该怎么划分。

一个节点是否需要划分，在UE中由一个叫做BranchingFactor的参数控制，这个参数表明了一个节点的最大子节点数目（在第一步实例划分的时候，这个数值表明叶子节点中可以装载的实例数目，在后续步骤自底向上构建ClusterTree的时候，这个就对应于父节点下子节点的最大数目），在实例划分（构建叶子节点）的时候，这个数值等于叶子节点的最大顶点数目（每批顶点数目）除以实例（LOD0）顶点数，并Clamp到[1, 1024]：

int32 UHierarchicalInstancedStaticMeshComponent::DesiredInstancesPerLeaf()
{
 int32 LOD0Verts = GetVertsForLOD(0);
 int32 VertsToSplit = CVarMinVertsToSplitNode.GetValueOnAnyThread();
 if (LOD0Verts)
 {
 return FMath::Clamp(VertsToSplit / LOD0Verts, 1, 1024);
 }
 return 16;
}

如果某个节点包含的实例数目大于BranchingFactor，那么就要进行二分（这里的二分还不是建树，而是在进行聚类，因此跟我们前面说的N叉树不是一个事情），这里二分是基于所有实例的位置进行的，统计所有实例的XYZ三个坐标轴的跨度，取跨度最大的轴，按照实例数目均分为两个cluster（如果是离线的计算，等分逻辑是否还可以优化一下，比如基于PCA？）：

void Split(int32 Start, int32 End)
{
 int32 NumRange = 1 + End - Start;
 FBox ClusterBounds(ForceInit);
 // 计算所有实例组成的boundingbox，每个实例用一个点表示，数据存于SortPoints
 for (int32 Index = Start; Index <= End; Index++)
 {
 ClusterBounds += SortPoints[SortIndex[Index]];
 }
 //判断是否需要继续细分，不需要就直接返回
 if (NumRange <= BranchingFactor)
 {
 Clusters.Add(FRunPair(Start, NumRange));
 return;
 }

 // 寻找实例分布的最长跨度轴，根据boundingbox来判断
 SortPairs.Reset();
 int32 BestAxis = -1;
 float BestAxisValue = -1.0f;
 for (int32 Axis = 0; Axis < 3; Axis++)
 {
 float ThisAxisValue = ClusterBounds.Max[Axis] - ClusterBounds.Min[Axis];
 if (!Axis || ThisAxisValue > BestAxisValue)
 {
 BestAxis = Axis;
 BestAxisValue = ThisAxisValue;
 }
 }

 // 根据最长轴进行实例排序
 for (int32 Index = Start; Index <= End; Index++)
 {
 FSortPair Pair;

 Pair.Index = SortIndex[Index];
 Pair.d = SortPoints[Pair.Index][BestAxis];
 SortPairs.Add(Pair);
 }
 SortPairs.Sort();

 // 为二分做准备
 for (int32 Index = Start; Index <= End; Index++)
 {
 SortIndex[Index] = SortPairs[Index - Start].Index;
 }
 int32 Half = NumRange / 2;
 int32 EndLeft = Start + Half - 1;
 int32 StartRight = 1 + End - Half;
 if (NumRange & 1)
 {
 if (SortPairs[Half].d - SortPairs[Half - 1].d < SortPairs[Half + 1].d - SortPairs[Half].d)
 {
 EndLeft++;
 }
 else
 {
 StartRight--;
 }
 }

 // 递归对二分后的实例Cluster进行进一步细分处理
 Split(Start, EndLeft);
 Split(StartRight, End);
}

经过上面的递归拆分之后，我们得到了若干个子节点，但同时我们也注意到，在需要二分的时候，实际上在当前函数中并没有添加新的节点，也就是说，经过上述拆分得到的节点（存储于Clusters中）相互之间并没有层级关系，为什么不在拆分的同时把层级关系一并建立起来呢？

这是因为聚类（不论是聚类出叶子节点还是父节点）是按照（空间）二分方式进行的，如果在聚类出叶子节点的同时就计算出对应的父节点，就会存在如下两个问题：

父节点到子节点就不再能遵循空间二分（比如是N叉树，就要N分）的方式进行，其聚类的精确程度会有所下降（想象100个实例在X轴上具有最大跨度，但是这些实例实际上分别聚类于X轴上的两个端点的情况，如果基于X轴N分，可能会出现中间节点为空的情况）
普通节点的子节点数目与叶子节点的子节点（实例）数目是不同的，需要做特殊处理

为了构建出这棵树，我们还需要基于这些已有的节点做一次层级的聚类，这个逻辑在FClusterBuilder::BuildTree接口中完成，整体分为如下几步:

基于之前的Cluster信息（存储了哪些实例分到一个Cluster中）构建节点，得到当前层（叶子）的节点

for (int32 Index = 0; Index < NumRoots; Index++)
{
     FClusterNode& Node = Result->Nodes[Index];
     Node.FirstInstance = Clusters[Index].Start;
     Node.LastInstance = Clusters[Index].Start + Clusters[Index].Num - 1;
     FBox NodeBox(ForceInit);
     for (int32 InstanceIndex = Node.FirstInstance; InstanceIndex <= Node.LastInstance; InstanceIndex++)
     {
     const FMatrix& ThisInstTrans = Transforms[SortedInstances[InstanceIndex]];
     FBox ThisInstBox = InstBox.TransformBy(ThisInstTrans);
     NodeBox += ThisInstBox;
    
     if (GenerateInstanceScalingRange)
     {
     FVector3f CurrentScale(ThisInstTrans.GetScaleVector());
    
     Node.MinInstanceScale = Node.MinInstanceScale.ComponentMin(CurrentScale);
     Node.MaxInstanceScale = Node.MaxInstanceScale.ComponentMax(CurrentScale);
     }
     }
     Node.BoundMin = (FVector3f)NodeBox.Min;
     Node.BoundMax = (FVector3f)NodeBox.Max;
}

将当前层的节点看成是一个个的实例，按照之前的划分（聚类）逻辑进行聚类，聚类的粒度跟此前实例聚类的粒度不同，数值从MaxInstancesPerLeaf（叶子节点最大实例数）变成了InternalNodeBranchingFactor（ClusterTree中节点的最大子节点数，也是我们前面说过N叉树的N，通过CVarFoliageSplitFactor配置），聚类后就得到一系列的父节点，重复这个过程，直到最终我们只得到了一个节点（根节点）。在完成聚类或者建树之后，还需要对数据进行重排，保证父子节点在InstanceBuffer中的数据是可重用的。

// 自底向上，逐层处理
while (NumRoots > 1)
{
     // 将当前层的每个节点（boundingbox的中心）当成一个点（实例）
     SortIndex.Reset();
     SortPoints.Reset();
     SortIndex.AddUninitialized(NumRoots);
     SortPoints.AddUninitialized(NumRoots);
     for (int32 Index = 0; Index < NumRoots; Index++)
     {
     SortIndex[Index] = Index;
     FClusterNode& Node = Result->Nodes[Index];
     SortPoints[Index] = (FVector)(Node.BoundMin + Node.BoundMax) * 0.5f;
     }
     // 对上面的点（实例），重复此前划分（聚类）逻辑，不过这里每个cluster的实例数目从MaxInstancesPerLeaf变成了InternalNodeBranchingFactor，这个决定了分支的数目，比如当前层有100个节点，而InternalNodeBranchingFactor = 20，那么我们就得到了5个父节点，也就是说，每个父节点有20个子节点，那么这棵树就是20叉树
     BranchingFactor = InternalNodeBranchingFactor;
     if (BranchingFactor > 2 && OcclusionLayerTarget && NumRoots / BranchingFactor <= OcclusionLayerTarget)
     {
     BranchingFactor = FMath::Max(2, (NumRoots + OcclusionLayerTarget - 1) / OcclusionLayerTarget);
     OcclusionLayerTarget = 0;
     bIsOcclusionLayer = true;
     }
     Split(NumRoots);//NumRoots是当前层的节点数目
     if (bIsOcclusionLayer)
     {
     Result->OutOcclusionLayerNum = Clusters.Num();
     bIsOcclusionLayer = false;
     }
    
     InverseSortIndex.Reset();
     InverseSortIndex.AddUninitialized(NumRoots);
     for (int32 Index = 0; Index < NumRoots; Index++)
     {
     InverseSortIndex[SortIndex[Index]] = Index;
     }
    
     // 对instances进行重排，从而在此前节点顺序调整后，实现数据匹配，达到一个InstanceBuffer实现任意drawcall的目的
     {
     RemapSortIndex.Reset();
     RemapSortIndex.AddUninitialized(Num);
     int32 OutIndex = 0;
     for (int32 Index = 0; Index < NumRoots; Index++)
     {
     FClusterNode& Node = Result->Nodes[SortIndex[Index]];
     for (int32 InstanceIndex = Node.FirstInstance; InstanceIndex <= Node.LastInstance; InstanceIndex++)
     {
     RemapSortIndex[OutIndex++] = InstanceIndex;
     }
     }
     InverseInstanceIndex.Reset();
     InverseInstanceIndex.AddUninitialized(Num);
     for (int32 Index = 0; Index < Num; Index++)
     {
     InverseInstanceIndex[RemapSortIndex[Index]] = Index;
     }
     for (int32 Index = 0; Index < Result->Nodes.Num(); Index++)
     {
     FClusterNode& Node = Result->Nodes[Index];
     Node.FirstInstance = InverseInstanceIndex[Node.FirstInstance];
     Node.LastInstance = InverseInstanceIndex[Node.LastInstance];
     }
     OldInstanceIndex.Reset();
     Swap(OldInstanceIndex, SortedInstances);
     SortedInstances.AddUninitialized(Num);
     for (int32 Index = 0; Index < Num; Index++)
     {
     SortedInstances[Index] = OldInstanceIndex[RemapSortIndex[Index]];
     }
     }
     //同样，对节点进行重排
     {
     RemapSortIndex.Reset();
     int32 NewNum = Result->Nodes.Num() + Clusters.Num();
     // RemapSortIndex[new index] == old index
     RemapSortIndex.AddUninitialized(NewNum);
     LevelStarts.Reset();
     LevelStarts.Add(Clusters.Num());
     for (int32 Index = 0; Index < NodesPerLevel.Num() - 1; Index++)
     {
     LevelStarts.Add(LevelStarts[Index] + NodesPerLevel[Index]);
     }
    
     for (int32 Index = 0; Index < NumRoots; Index++)
     {
     FClusterNode& Node = Result->Nodes[SortIndex[Index]];
     RemapSortIndex[LevelStarts[0]++] = SortIndex[Index];
    
     int32 LeftIndex = Node.FirstChild;
     int32 RightIndex = Node.LastChild;
     int32 LevelIndex = 1;
     while (RightIndex >= 0)
     {
     int32 NextLeftIndex = MAX_int32;
     int32 NextRightIndex = -1;
     for (int32 ChildIndex = LeftIndex; ChildIndex <= RightIndex; ChildIndex++)
     {
     RemapSortIndex[LevelStarts[LevelIndex]++] = ChildIndex;
     int32 LeftChild = Result->Nodes[ChildIndex].FirstChild;
     int32 RightChild = Result->Nodes[ChildIndex].LastChild;
     if (LeftChild >= 0 && LeftChild <  NextLeftIndex)
     {
     NextLeftIndex = LeftChild;
     }
     if (RightChild >= 0 && RightChild >  NextRightIndex)
     {
     NextRightIndex = RightChild;
     }
     }
     LeftIndex = NextLeftIndex;
     RightIndex = NextRightIndex;
     LevelIndex++;
     }
     }
     checkSlow(LevelStarts[LevelStarts.Num() - 1] == NewNum);
     InverseChildIndex.Reset();
     // InverseChildIndex[old index] == new index
     InverseChildIndex.AddUninitialized(NewNum);
     for (int32 Index = Clusters.Num(); Index < NewNum; Index++)
     {
     InverseChildIndex[RemapSortIndex[Index]] = Index;
     }
     for (int32 Index = 0; Index < Result->Nodes.Num(); Index++)
     {
     FClusterNode& Node = Result->Nodes[Index];
     if (Node.FirstChild >= 0)
     {
     Node.FirstChild = InverseChildIndex[Node.FirstChild];
     Node.LastChild = InverseChildIndex[Node.LastChild];
     }
     }
     {
     Swap(OldNodes, Result->Nodes);
     Result->Nodes.Empty(NewNum);
     for (int32 Index = 0; Index < Clusters.Num(); Index++)
     {
     Result->Nodes.Add(FClusterNode());
     }
     Result->Nodes.AddUninitialized(OldNodes.Num());
     for (int32 Index = 0; Index < OldNodes.Num(); Index++)
     {
     Result->Nodes[InverseChildIndex[Index]] = OldNodes[Index];
     }
     }
     int32 OldIndex = Clusters.Num();
     int32 InstanceTracker = 0;
     for (int32 Index = 0; Index < Clusters.Num(); Index++)
     {
     FClusterNode& Node = Result->Nodes[Index];
     Node.FirstChild = OldIndex;
     OldIndex += Clusters[Index].Num;
     Node.LastChild = OldIndex - 1;
     Node.FirstInstance = Result->Nodes[Node.FirstChild].FirstInstance;
     checkSlow(Node.FirstInstance == InstanceTracker);
     Node.LastInstance = Result->Nodes[Node.LastChild].LastInstance;
     InstanceTracker = Node.LastInstance + 1;
     checkSlow(InstanceTracker <= Num);
     FBox NodeBox(ForceInit);
     for (int32 ChildIndex = Node.FirstChild; ChildIndex <= Node.LastChild; ChildIndex++)
     {
     FClusterNode& ChildNode = Result->Nodes[ChildIndex];
     NodeBox += (FVector)ChildNode.BoundMin;
     NodeBox += (FVector)ChildNode.BoundMax;
    
     if (GenerateInstanceScalingRange)
     {
     Node.MinInstanceScale = Node.MinInstanceScale.ComponentMin(ChildNode.MinInstanceScale);
     Node.MaxInstanceScale = Node.MaxInstanceScale.ComponentMax(ChildNode.MaxInstanceScale);
     }
     }
     Node.BoundMin = (FVector3f)NodeBox.Min;
     Node.BoundMax = (FVector3f)NodeBox.Max;
     }
     NumRoots = Clusters.Num();
     NodesPerLevel.Insert(NumRoots, 0);
     }
    }

2. HISM运行时管理

HISM的运行时管理介绍的是HISM中cluster或者节点的剔除与渲染。

由于HISM为一棵N叉树，因此剔除的时候可以自顶向下，从粗到精进行，主要逻辑存放在FHierarchicalStaticMeshSceneProxy::GetDynamicMeshElements接口中：

从根节点开始遍历处理：FHierarchicalStaticMeshSceneProxy::Traverse
先判断当前节点是否可见（基于boundingbox）
- 如果当前节点不可见，不需要进入子节点处理逻辑，直接返回
节点可见状态下，再进行一系列的判断
- 是否超出需要显示的最大LOD
- 是否被遮挡剔除
- 是否当前节点不可拆分、，不可拆分就不需要拆出子节点
如果上述判断都通过了，就进入子节点递归处理

template
void FHierarchicalStaticMeshSceneProxy::Traverse(const FFoliageCullInstanceParams& Params, int32 Index, int32 MinLOD, int32 MaxLOD, bool bFullyContained) const
{
 const FClusterNode& Node = Params.Tree[Index];
 if (!bFullyContained)
 // 判断当前节点是否被剔除（视锥剔除）
 if (CullNode(Params, Node, bFullyContained))
 return;

 // 更新当前节点的LOD
 if (MinLOD != MaxLOD)
 {
 // 根据boundingbox的距离以及width（尺寸）判断当前节点的最小（高精）最大（低精）LOD
 CalcLOD(MinLOD, MaxLOD, (FVector)Node.BoundMin, (FVector)Node.BoundMax, Params.ViewOriginInLocalZero, Params.ViewOriginInLocalOne, Params.LODPlanesMin, Params.LODPlanesMax);

 // 超出设定的最高LOD，不予显示？
 if (MinLOD >= Params.LODs)
 return;
 }

 // 遮挡剔除判断
 if (Index >= Params.FirstOcclusionNode && Index <= Params.LastOcclusionNode)
 {
 check(Params.OcclusionResults != NULL);
 const TArray& OcclusionResultsArray = *Params.OcclusionResults;
 if (OcclusionResultsArray[Params.OcclusionResultsStart + Index - Params.FirstOcclusionNode])
 {
 INC_DWORD_STAT_BY(STAT_OcclusionCulledFoliageInstances, 1 + Node.LastInstance - Node.FirstInstance);
 return;
 }
 }

 // 节点不可拆分
 bool bShouldGroup = Node.FirstChild < 0
 || ((Node.LastInstance - Node.FirstInstance + 1) < Params.MinInstancesToSplit[MinLOD]
 && CanGroup((FVector)Node.BoundMin, (FVector)Node.BoundMax, Params.ViewOriginInLocalZero, Params.ViewOriginInLocalOne, Params.LODPlanesMax[Params.LODs - 1]));
 bool bSplit = (!bFullyContained || MinLOD < MaxLOD || Index < Params.FirstOcclusionNode)
 && !bShouldGroup;

 if (!bSplit)
 {
 MaxLOD = FMath::Min(MaxLOD, Params.LODs - 1);
 // 连续Index的Cluster节点会在AddRun后进行合并，比如[0, 31], [32, 50]合并成[0, 50]
 Params.AddRun(MinLOD, MaxLOD, Node);
 return;
 }

 // 对子节点进行递归处理
 for (int32 ChildIndex = Node.FirstChild; ChildIndex <= Node.LastChild; ChildIndex++)
 {
 Traverse(Params, ChildIndex, MinLOD, MaxLOD, bFullyContained);
 }
}

最后遍历可渲染列表的InstanceIndex分段，比如[0, 19] [30, 59]等等，来进行提交MeshBatch：FHierarchicalStaticMeshSceneProxy::FillDynamicMeshElements

特点

DrawCall数目相对于ISM有所提升（分得更细了）
聚类粒度缩小，可以实现更为精细的裁剪剔除（为了提高剔除效率，还做了层级处理）
N叉树的N可控，可以配置：CVarFoliageSplitFactor

参考

[1]. [ue4] HISM 大规模植被渲染解决方案

[2]. (UE4 4.27) UHierarchicalInstancedStaticMesh(HISM)原理分析

[3]. UE4的HISM深入探究

[4]. UE4 ISM(实例化静态网格体)理解

【UE】Unreal的实例化渲染

ISM

HISM

原理

1. N叉树的构造

2. HISM运行时管理

特点

参考

你可能感兴趣的:(【UE】Unreal的实例化渲染)