【Vulkan学习记录-基础篇-5】多线程渲染

现代的图形API都具备对多线程渲染友好的特性,所谓的多线程并不是指GPU端的多线程图像渲染,而是指在CPU提交DrawCall时所做的一系列工作可以并行化,也就是说多线程渲染其实是在CPU端提升程序的性能。

在使用D3D 11或者OpenGL的时候,每次提交DrawCall之前,都需要将相关的状态进行更新,将需要用的资源进行绑定,在提交DrawCall时,还要进行相关的参数检查等工作,这些看上去耗费的时间并没有太大的影响,而如果场景的几何体、材质种类非常多,用到的Shader数量比较多,每一帧的Pass比较多,就会导致有大量的DrawCall产生。那么每次在CPU端进行的这些操作的费时就很有可能会成为瓶颈。一种很自然的优化策略,就是将所有的CPU端的这些操作并行处理,即多线程地进行状态修改、参数检查等工作。但是传统的图形API对此并不友好,不管是D3D11还是OpenGL,它们都具有一个Context的概念,这个Context负责进行资源的绑定、状态修改、DrawCall调用,这样的模式对多线程十分不友好,如果想要实现多线程地提交,理论上是可以完成,但是非常麻烦,而且需要用到很多复杂同步原语,导致整体性能未必能达到理想的效果。

而现代的图形API则进行了一些模式上的更新,使得对多线程的支持更加友好。在Vulkan中的设计则主要体现在Queue和CommandBuffer上。在前几篇中有提到过,Vulkan中所有需要GPU执行的命令,只能通过CommandBuffer来完成,这些命令并不只包括DrawCall,对计算的调用,内存的操作,都需要用到CommandBuffer。而渲染所需要的所有状态(Shader和DescriptorSet等),都需要在CommandBuffer中进行绑定。每一个CommandBuffer,都有它独立的这些状态,在使用任意一个CommandBuffer时,都不可能避免这些操作,这与传统API中,如果不改变一个状态的话那么它将一直保持不变很不一样。而Queue则是在Vulkan中唯一一个可以向GPU提交命令的通道,而不是通过绑定在一个单一线程上的Context来完成。可以向Queue提交任务,而如果需要等待Queue中的某个任务结束的话,就需要手动的进行同步控制。用到上一篇所介绍的同步机制。

因此在Vulkan中的一种简单的多线程模式为:每个线程在每一帧都负责设置好自己的CommandBuffer,等待所有的线程将自己的CommandBuffer都设置好后,再将所有的CommandBuffer全部提交给Queue。

本文需要渲染的场景为:
【Vulkan学习记录-基础篇-5】多线程渲染_第1张图片
这个场景由非常多的飞碟构成,观察到每个飞碟中间部分的颜色都不相同,也就是在渲染每一个飞碟时,都需要对渲染的状态进行更新。并且每个飞碟的位置在每一帧都需要进行更新。

下面就介绍这种多线程模式是如何具体实现的:
首先需要手动实现一下Thread:

class Thread
{
private:
	bool destroying = false;
	std::thread worker;
	std::queue<std::function<void()>> jobQueue;
	std::mutex queueMutex;
	std::condition_variable condition;

	// Loop through all remaining jobs
	void queueLoop()
	{
		while (true)
		{
			std::function<void()> job;
			{
				std::unique_lock<std::mutex> lock(queueMutex);
				condition.wait(lock, [this] { return !jobQueue.empty() || destroying; });
				if (destroying)
				{
					break;
				}
				job = jobQueue.front();
			}

			job();

			{
				std::lock_guard<std::mutex> lock(queueMutex);
				jobQueue.pop();
				condition.notify_one();
			}
		}
	}

public:
	Thread()
	{
		worker = std::thread(&Thread::queueLoop, this);
	}

	~Thread()
	{
		if (worker.joinable())
		{
			wait();
			queueMutex.lock();
			destroying = true;
			condition.notify_one();
			queueMutex.unlock();
			worker.join();
		}
	}

	// Add a new job to the thread's queue
	void addJob(std::function<void()> function)
	{
		std::lock_guard<std::mutex> lock(queueMutex);
		jobQueue.push(std::move(function));
		condition.notify_one();
	}

	// Wait until all work items have been finished
	void wait()
	{
		std::unique_lock<std::mutex> lock(queueMutex);
		condition.wait(lock, [this]() { return jobQueue.empty(); });
	}
};

class ThreadPool
{
public:
	std::vector<std::unique_ptr<Thread>> threads;

	// Sets the number of threads to be allocted in this pool
	void setThreadCount(uint32_t count)
	{
		threads.clear();
		for (auto i = 0; i < count; i++)
		{
			threads.push_back(std::make_unique<Thread>());
		}
	}

	// Wait until all threads have finished their work items
	void wait()
	{
		for (auto &thread : threads)
		{
			thread->wait();
		}
	}
};

这里将每个Thread需要执行的任务放在了一个jobQueue中,在jobQueue中没有任何任务时,将当前线程睡眠,而当有新的任务加入进来以后,唤醒该线程执行任务。
ThreadPool负责创建Thread,在每一帧中通过Wait函数,来等待每个线程中的所有任务都结束。

多线程更新相关的数据为:

	struct PushConstantBlock
	{
		glm::mat4 mvp;
		glm::vec3 color;
	};

	struct ObjectData
	{
		glm::mat4 model;
		glm::vec3 pos;
		glm::vec3 rotation;
		float rotationDir;
		float rotationSpeed;
		float scale;
		float deltaT;
		float stateT = 0;
		bool visible = true;
	};

	struct ThreadData
	{
		VkCommandPool commandPool;
		std::vector<VkCommandBuffer> commandBufferVec;
		std::vector<PushConstantBlock> pushConstantBlockVec;
		std::vector<ObjectData> objectDataVec;
	};

注意到所有的飞碟,用的都是同一个Shader:

#version 450

layout (location = 0) in vec3 inPos;
layout (location = 1) in vec3 inNormal;
layout (location = 2) in vec3 inColor;

layout (std140, push_constant) uniform PushConsts 
{
	mat4 mvp;
	vec3 color;
} pushConsts;

layout (location = 0) out vec3 outNormal;
layout (location = 1) out vec3 outColor;
layout (location = 3) out vec3 outViewVec;
layout (location = 4) out vec3 outLightVec;

void main() 
{
	outNormal = inNormal;

	if ( (inColor.r == 1.0) && (inColor.g == 0.0) && (inColor.b == 0.0))
	{	
		outColor = pushConsts.color;
	}
	else
	{
		outColor = inColor;
	}
	
	gl_Position = pushConsts.mvp * vec4(inPos.xyz, 1.0);
	
    vec4 pos = pushConsts.mvp * vec4(inPos, 1.0);
    outNormal = mat3(pushConsts.mvp) * inNormal;
	//	vec3 lPos = ubo.lightPos.xyz;
	vec3 lPos = vec3(0.0);
    outLightVec = lPos - pos.xyz;
    outViewVec = -pos.xyz;
}
#version 450

layout (location = 0) in vec3 inNormal;
layout (location = 1) in vec3 inColor;
layout (location = 3) in vec3 inViewVec;
layout (location = 4) in vec3 inLightVec;

layout (location = 0) out vec4 outFragColor;


void main() 
{
	vec3 N = normalize(inNormal);
	vec3 L = normalize(inLightVec);
	vec3 V = normalize(inViewVec);
	vec3 R = reflect(-L, N);
	vec3 diffuse = max(dot(N, L), 0.0) * inColor;
	vec3 specular = pow(max(dot(R, V), 0.0), 8.0) * vec3(0.75);
	outFragColor = vec4(diffuse + specular, 1.0);	
}

可以看到飞碟中间颜色的差异是通过在VertexShader中将模型中具备特殊颜色顶点设置为指定颜色实现的。而FragmentShader则是一个比较简单的Phong着色。渲染每个飞碟时都需要更新VertexShader中的PushConsts,它的MVP矩阵决定了飞碟的位置,而color决定了飞碟中部的颜色。Shader中的其他内容都不需要在渲染时进行更新,看起来还是比较简单的。PushConstant的提交是在CommandBuffer内进行,所以每个线程的关键任务就是要对CommandBuffer做更新。

void UpdateCommandBuffer(int ind)
	{
		Update();
		VkCommandBufferInheritanceInfo inheritanceInfo = {};
		inheritanceInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_INHERITANCE_INFO;
		inheritanceInfo.renderPass = render_pass_;
		inheritanceInfo.framebuffer = frame_buffer_[ind];

		VkCommandBufferBeginInfo beginInfo = {};
		beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
		beginInfo.pInheritanceInfo = &inheritanceInfo;
		beginInfo.flags = VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT;

		vkBeginCommandBuffer(ui_command_buffer_, &beginInfo);
		VkViewport viewport = {};
		viewport.width = width_;
		viewport.height = height_;
		viewport.minDepth = 0.0f;
		viewport.maxDepth = 1.0f;

		VkRect2D scissor = {};
		scissor.extent.width = width_;
		scissor.extent.height = height_;

		vkCmdSetViewport(ui_command_buffer_, 0, 1, &viewport);
		vkCmdSetScissor(ui_command_buffer_, 0, 1, &scissor);
		vkCmdBindPipeline(ui_command_buffer_, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline_);
		imgui_->draw(ui_command_buffer_);
		vkEndCommandBuffer(ui_command_buffer_);

		VkClearValue clearValues[2];
		clearValues[0].color = { 0.0f , 0.0f , 0.0f , 1.0f };
		clearValues[1].depthStencil = { 1.0f, 0 };
		VkRenderPassBeginInfo renderPassBeginInfo = {};
		renderPassBeginInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
		renderPassBeginInfo.renderArea.extent.width = width_;
		renderPassBeginInfo.renderArea.extent.height = height_;
		renderPassBeginInfo.framebuffer = frame_buffer_[ind];
		renderPassBeginInfo.clearValueCount = 2;
		renderPassBeginInfo.pClearValues = clearValues;
		renderPassBeginInfo.renderPass = render_pass_;

		vkBeginCommandBuffer(draw_command_buffer_[ind] , &beginInfo );
		vkCmdBeginRenderPass(draw_command_buffer_[ind], &renderPassBeginInfo, VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS);

		for (uint32_t t = 0; t < thread_count; t++)
		{
			for (uint32_t i = 0; i < object_count / thread_count; i++)
			{
				thread_pool_.threads[t]->addJob([=] { UpdateThreadData(t, i, inheritanceInfo); });
			}
		}

		thread_pool_.wait();

		std::vector<VkCommandBuffer> commandBufferVec;
		for (uint32_t t = 0; t < thread_count; t++)
		{
			for (uint32_t i = 0; i < object_count / thread_count; i++)
			{
				commandBufferVec.push_back(threadDataVec[t].commandBufferVec[i]);
			}
		}
		commandBufferVec.push_back(ui_command_buffer_);
		vkCmdExecuteCommands( draw_command_buffer_[ind] , commandBufferVec.size(), commandBufferVec.data() );
		vkCmdEndRenderPass(draw_command_buffer_[ind]);
		vkEndCommandBuffer(draw_command_buffer_[ind]);
	}

这是在每一帧调用的总的更新函数,可以看到,所有线程的CommandBuffer,都是内嵌在一个大的CommandBuffer的一个RenderPass内部的。每个飞碟对应一个CommandBuffer,一个线程在一帧内要处理多个CommandBuffer。

	void UpdateThreadData(uint32_t threadIndex , uint32_t commandBufferIndex , VkCommandBufferInheritanceInfo inheritanceInfo )
	{
		ThreadData & threadData = threadDataVec[threadIndex];
		ObjectData & objectData = threadData.objectDataVec[commandBufferIndex];

		VkCommandBufferBeginInfo beginInfo = {};
		beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
		beginInfo.flags = VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT;
		beginInfo.pInheritanceInfo = &inheritanceInfo;

		vkBeginCommandBuffer(threadData.commandBufferVec[commandBufferIndex], &beginInfo);
		VkViewport viewport = {};
		viewport.width = width_;
		viewport.height = height_;
		viewport.minDepth = 0.0f;
		viewport.maxDepth = 1.0f;

		VkRect2D scissor = {};
		scissor.extent.width = width_;
		scissor.extent.height = height_;

		vkCmdSetViewport(threadData.commandBufferVec[commandBufferIndex], 0, 1, &viewport);
		vkCmdSetScissor(threadData.commandBufferVec[commandBufferIndex], 0, 1, &scissor);
		vkCmdBindPipeline(threadData.commandBufferVec[commandBufferIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline_);

		// Update Object Data
		objectData.rotation.y += 2.5f * objectData.rotationSpeed * frame_timer;
		if (objectData.rotation.y > 360.0f) {
			objectData.rotation.y -= 360.0f;
		}
		objectData.deltaT += 0.15f * frame_timer;
		if (objectData.deltaT > 1.0f)
			objectData.deltaT -= 1.0f;
		objectData.pos.y = sin(glm::radians(objectData.deltaT * 360.0f)) * 2.5f;
	
		objectData.model = glm::translate(glm::mat4(1.0f), objectData.pos);
		objectData.model = glm::rotate(objectData.model, -sinf(glm::radians(objectData.deltaT * 360.0f)) * 0.25f, glm::vec3(objectData.rotationDir, 0.0f, 0.0f));
		objectData.model = glm::rotate(objectData.model, glm::radians(objectData.rotation.y), glm::vec3(0.0f, objectData.rotationDir, 0.0f));
		objectData.model = glm::rotate(objectData.model, glm::radians(objectData.deltaT * 360.0f), glm::vec3(0.0f, objectData.rotationDir, 0.0f));
		objectData.model = glm::scale(objectData.model, glm::vec3(objectData.scale));

		// Update Push Constant 
		threadData.pushConstantBlockVec[commandBufferIndex].mvp = uboVS.projectionMatrix * uboVS.viewMatrix * objectData.model;
		vkCmdPushConstants(threadData.commandBufferVec[commandBufferIndex],
			pipeline_layout_,
			VK_SHADER_STAGE_VERTEX_BIT,
			0,
			sizeof(PushConstantBlock),
			&threadData.pushConstantBlockVec[commandBufferIndex]);

		VkBuffer vertBuffer = ufo.vertices->GetDesc().buffer;
		VkDeviceSize offset = 0;
		vkCmdBindVertexBuffers(threadData.commandBufferVec[commandBufferIndex], 0, 1, &vertBuffer, &offset);
		vkCmdBindIndexBuffer(threadData.commandBufferVec[commandBufferIndex], ufo.indices->GetDesc().buffer, 0, VK_INDEX_TYPE_UINT32);
		vkCmdDrawIndexed(threadData.commandBufferVec[commandBufferIndex], ufo.indexCount , 1, 0, 0, 0);
		vkEndCommandBuffer(threadData.commandBufferVec[commandBufferIndex]);
	}

上面则是每个线程所要执行的具体的任务,比较直观,在获取到飞碟对应的CommandBuffer后,先对飞碟本身的数据信息进行更新,然后对CommandBuffer进行重新写入,尽管感觉上我们只需要重新提交一次PushConstant命令,但是所有的其他不变的状态也需要再进行一次提交,比如VertexBuffer、IndexBuffer、Scissor、Viewport,这里就体现出与传统API的区别了,如果在D3D11中,我们只需要将某个ConstantBuffer修改一下,其他的都不需要动,然后直接提交DrawCall就行,但是在Vulkan中,每个CommandBuffer内的状态只要需要修改一点,那么其他所有的状态都要跟着再进行一次设定。

程序大体上就是如此,详细地可以参考源码:https://github.com/syddf/VulkanRenderExample
(参考了SaschaWillems的Samples:https://github.com/SaschaWillems/Vulkan)

你可能感兴趣的:(Vulkan)