CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning

Agenda

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第1张图片

Hardware 101: the Family

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第2张图片

Hardware 101: Number Representation

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第3张图片

Hardware 101: Number Representation

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第4张图片

1. Algorithms for Efficient Inference

1.1 Pruning Neural Networks

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第5张图片

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第6张图片

Iteratively Retrain to Recover Accuracy

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第7张图片

Pruning RNN and LSTM

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第8张图片

pruning之后准确率有所提升:

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第9张图片

Pruning Changes Weight Distribution

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第10张图片

1.2 Weight Sharing

Trained Quantization

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第11张图片

How Many Bits do We Need?

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第12张图片

Pruning + Trained Quantization Work Together

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第13张图片

Huffman Coding

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第14张图片

Summary of Deep Compression

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第15张图片

Results: Compression Ratio

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第16张图片

SqueezeNet

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第17张图片

Compressing SqueezeNet

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第18张图片

1.3 Quantization

Quantizing the Weight and Activation

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第19张图片
**Quantization Result**:选择8bit
CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第20张图片

1.4 Low Rank Approximation

Low Rank Approximation for Conv:类似Inception Module

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第21张图片

Low Rank Approximation for FC :矩阵分解

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第22张图片

1.5 Binary / Ternary Net

Trained Ternary(三元) Quantization

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第23张图片

Weight Evolution during Training

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第24张图片

Error Rate on ImageNet

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第25张图片

1.6 Winograd Transformation

3x3 DIRECT Convolutions

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第26张图片

Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs

3x3 WINOGRAD Convolutions

Transform Data to Reduce Math Intensity

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第27张图片

Direct convolution: we need 9xCx4 = 36xC FMAs for 4 outputs
Winograd convolution: we need 16xC FMAs for 4 outputs: 2.25x fewer FMAs

2. Hardware for Efficient Inference

Hardware for Efficient Inference:

a common goal: minimize memory access

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第28张图片

Google TPU

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第29张图片

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第30张图片

Roofline Model: Identify Performance Bottleneck

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第31张图片

Log Rooflines for CPU, GPU, TPU

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第32张图片

EIE: the First DNN Accelerator for Sparse, Compressed Model
不保存、计算0值

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第33张图片

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第34张图片

EIE Architecture

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第35张图片

Micro Architecture for each PE

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第36张图片

Comparison: Throughput

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第37张图片

Comparison: Energy Efficiency

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第38张图片

3. Algorithms for Efficient Training

3.1 Parallelization

Data Parallel – Run multiple inputs in parallel

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第39张图片

Parameter Update

参数共享更新

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第40张图片

Model-Parallel Convolution – by output region (x,y)

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第41张图片

Model Parallel Fully-Connected Layer (M x V)

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第42张图片

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第43张图片

Summary of Parallelism

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第44张图片

3.2 Mixed Precision with FP16 and FP32

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第45张图片

Mixed Precision Training

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第46张图片

结果对比:

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第47张图片

3.3 Model Distillation

student model has much smaller model size

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第48张图片

Softened outputs reveal the dark knowledge

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第49张图片

Softened outputs reveal the dark knowledge

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第50张图片

3.4 DSD: Dense-Sparse-Dense Training

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第51张图片

DSD produces same model architecture but can find better optimization solution, arrives at better local minima, and achieves higher prediction accuracy across a wide range of deep neural networks on CNNs / RNNs / LSTMs.

DSD: Intuition

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第52张图片

DSD is General Purpose: Vision, Speech, Natural Language

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第53张图片

DSD on Caption Generation

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第54张图片

4. Hardware for Efficient Training

GPU / TPU

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第55张图片

Google Cloud TPU

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第56张图片

Future

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第57张图片

Outlook: the Focus for Computation

CS231n学习笔记--15. Efficient Methods and Hardware for Deep Learning_第58张图片

你可能感兴趣的:(深度学习,CS231n学习笔记)