混合精度训练支持什么显卡_混合精度训练

混合精度训练支持什么显卡

Discover a way to efficiently utilize your GPU

探索有效利用GPU的方法

涵盖的清单(List of things covered)

  • What is Mixed Precision Training

    什么是混合精密训练

  • Why MPT is Important

    为什么MPT很重要

  • How MPT reduces memory

    MPT如何减少内存

  • Frameworks with AMP (Automatic Mixed Precision)

    带有AMP(自动混合精度)的框架

什么是混合精密训练(What is Mixed Precision Training)

Mixed precision training is a technique used in training a large neural network where the model’s parameter are stored in different datatype precision (FP16 vs FP32 vs FP64). It offers significant performance and computational boost by training large neural networks in lower precision formats. With release of 30X series of GPUs it becomes even more important to utilize these features.

混合精度训练是一种用于训练大型神经网络的技术,其中模型的参数以不同的数据类型精度( FP16与FP32与FP64 )存储。 通过以较低的精度格式训练大型神经网络,它提供了显着的性能和计算能力。 随着30X系列GPU的发布,利用这些功能变得更加重要。

For instance, In Pytorch, the single precision float mean float32 and by default the parameters takes float32 datatype. Now if we have a parameter (W) which could be stored in FP16 while ensuring that no task specific accuracy is affected by this movement between precision, then why should we use FP32 or FP64?

例如,在Pytorch中单精度float均值float32 ,默认情况下参数采用float32数据类型。 现在,如果我们有一个参数(W)可以存储在FP16中,同时确保精度之间的这种移动不会影响特定任务的精度,那么为什么要使用FP32或FP64?

Notations

记号

  • FP16 — Half-Precision, 16bit Floating Point-occupies 2 bytes of memory

    FP16 —半精度,16位浮点占用2个字节的内存

  • FP32 — Single-Precision, 32bit Floating Point-occupies 4 bytes of memory

    FP32 —单精度32位浮点占用4个字节的内存

  • FP64— Double-Precision, 64bit Floating Point-occupies 8 bytes of memory

    FP64 —双精度64位浮点占用8个字节的内存

Since the introduction of Tensor Cores in the Volta and Turing architectures (NVIDIA), significant training speedups are experienced by switching to mixed precision — up to 3x overall speedup on the most arithmetically intense model architectures. The ability to train deep learning networks with lower precision was introduced in the Pascal architecture and first supported in CUDA® 8 in the NVIDIA Deep Learning SDK.

自从在Volta和Turing架构(NVIDIA)中引入Tensor Core以来,通过切换到混合精度,可显着提高培训速度-在算术强度最高的模型架构上,总体速度可提高3倍。 Pascal架构引入了以较低

你可能感兴趣的:(深度学习,机器学习,tensorflow,python,神经网络)