神经网络与体系结构的故事01

A 2.2 GHz SRAM with High Temperature Variation Immunity for Deep Learning Application under 28nm(论文)

1.abstract

(1) The implementation of machine learning algorithms needs intensive memory access, and SRAM is critical for the overall performance. This paper proposes a new design of high speed SRAM for machine learning purposes.

(2) Compared with Samsung HL 152,our design has smaller size (121X43 um2 vs 127X44 um2) with half the number of pins ports (12 vs 25) and higher speed (2.2GHz vs 0.8GHz).

2.introduction

(1) In hardware implementation of machine learning algorithms, data transition is a critical part in regards to performance and power consumption.

(2 )As shown in [2,3], conventional SRAM design can only perform stable read/write speed to 0.4~0.8 GHz within normal operating conditions. On the other hand, processor clock can run at > 2GHz. Consequently, it usually takes 3~5 clock cycles to finish memory access operation, which can be a bottleneck for hardware Deep Learning (DL) performance.

(3) process, voltage, and temperature (PVT).

(4)The contributions of this paper are as follows:

  1. We designed a high-speed SRAM running at 2.2 GHz. With fast access time, the proposed SRAM can help accelerate hardware in machine learning applications.
  2. We proposed a temperature-variation-immune SRAM design. The proposed work is compatible with conventional SRAM process without additional mask cost. The temperaturevariation-immune SRAM design can benefit machine learning applications, since intensive memory access of machine learning can significantly vary due to the temperature of SRAM.
  3. The smaller size of bank is designed to gain higher configurability and faster read/write. The machine learning algorithm could turn off some neurons to avoid unnecessary leakage current with less power consumption.

3.access in machine learning application

(1) Convolution layer
神经网络与体系结构的故事01_第1张图片

(2) Fully connected layer
神经网络与体系结构的故事01_第2张图片

(3) Flow chart of neuron computation in neural network
神经网络与体系结构的故事01_第3张图片

(4) Conventional design methodology requires every part of SRAM to pass the worst-case temperature variation condition, which means slower SRAM system clock.

(5) In order to avoid functional failure, the design needs to be pessimistic. To avoid the overconservative design methodology, real-time temperature monitors for different SRAM parts need to be implemented, and the corresponding temperature compensation is required so that the performance of SRAM does not degrade even with significant temperature variation.

4.system architecture

In the proposed SRAM, fast access is achieved by folded SRAM structure and careful layout optimization. In addition, dual-loop process/temperature compensation is implemented to eliminate the effect of process and temperature to access time. Furthermore, the proposed SRAM uses one-port design instead of conventional dual-port and thereby saves routing resource. With optimized structure design, the proposed fast SRAM can run reliably at
1.8~2.2 GHz.

(1)Folded Structure
(2)Dual-Loop Process/ Temperature Compensation

5.conclusion

In this paper, we propose a new design of high speed SRAM for machine learning purposes. With fast access time (cycle time: 650 ps, access time: 350 ps), low sensitivity to temperature variation and high configurability (less than 10% performance differencebetween 125_rcw_tt vs 0_rcw_tt) the proposed SRAM is a better candidate for hardware machine learning system the conventional SRAM. Compared with Samsung HL 152, our design has smaller size (121X43 um2 vs 127X44 um2) with half the number of pins ports (12 vs 25) and higher speed (2.2GHz vs 800MHz).

你可能感兴趣的:(计算机体系结构,深度学习,神经网络)