Stochastic Computing + Quantization

文章目录

  • 大佬们的Google学术主页
  • 1. Conference Papers: 神经网络压缩算法及其硬件加速器
    • 1.1 深度压缩及其硬件实现
    • 1.2 Conference Papers
      • 2018 DAC
  • 2. Conference Papers: SC-Based Neural Network
    • 2018: DAC, ASP-DAC, DATE,
    • 2017: ASLPOS, ICCD, DAC, DATE, ASP-DAC, ICCAD
    • 2016: DAC,
  • 3. Nonlinear Activation Function in Stochastic Computing
  • 4. 随机计算理论文章
    • 4.1 编码方式
      • 4.1.1 Time-encode 模拟脉冲方式编码
      • 4.1.2 确定性编码
    • 4.2 随机数产生器
    • 4.3 其他方面
  • 5. Spintronics等新器件相关的随机计算
  • 需要看还没看的paper

大佬们的Google学术主页

  • Jie Han—University of Alberta
  • Jongeun Lee—UNIST
  • Kiyoung Choi—Seoul National University
  • John P. Hayes—University of Michigan
  • Kia Bazargan—University of Minnesota
  • M. Hassan Najafi—University of Louisiana
  • Marc Riedel—University of Minnesota
  • Siddharth Garg—New York University
  • Brandon Reagen—Facebook
  • Gu-Yeon Wei—Harvard

1. Conference Papers: 神经网络压缩算法及其硬件加速器

整理近几年ICLR、ISCA、ASPLOS、DAC

1.1 深度压缩及其硬件实现

  • 2016 ICLR Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (Stanford, Tsinghua, NVIDIA)
    韩松大神的开山paper:剪枝,量化,Huffman编码
  • 2016 ISCA EIE: Efficient Inference Engine on Compressed Deep Neural Network (Stanford, NVIDIA)
    EIE加速器:权重静态稀疏化,输入动态稀疏化,量化
    只计算和存储了非零数据,对锁存位置进行CSC编码
  • 2018 ISQED Quantized neural networks with new stochastic multipliers (UMN, CNNY)
    通过随机计算实现具有不同量化级别的再训练神经网络;提出了一种随机量化矩阵乘法器,其中shifted unary code adders (SUC-Adder)用于量化神经网络。该方法仅通过使用几个AND和OR门就可以有效且准确地实现部分矩阵乘法的高精度
    只量化了权重参数,提出了一个随机计算加法器,采集不同时间段的input信息
    感觉除了精确度提高了,跟量化完全不沾边。。。。。

1.2 Conference Papers

2018 DAC

  • Session14: CONFIGURE TO CONQUER: DYNAMIC HW/SW RECONFIGURATION FOR DEEP LEARNING
    The papers in this session have a common theme in that they propose to dynamically reconfigure DNN accelerators to improve their efficiency. The first paper dynamically scales the precision of computations. The second paper proposes to reconfigure the micro-architectural parameters of a neural network accelerator. The third paper reconfigures in software by changing the data-flow used for computation. The final paper dynamically partitions resources on reconfigurable platforms for modern deep neural network topologies like Inception and residual networks.

    • 14-1 Dynamic Precision Scaling for Stochastic Computing-based Deep Neural Networks (UNIST)
    • 14-2 DyHard-DNN: Even More DNN Acceleration with Dynamic Hardware Reconfiguration (Virginia, IBM)
    • 14-3 Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization (Washington)
    • 14-4 LCP: a Layer Clusters Paralleling mapping method for accelerating Inception and Residual networks on FPGA (THU)
  • Session19: WATCH YOUR BITS: PRECISION AND FAULT TOLERANCE IN DEEP LEARNING
    Designers must account for the effect of error and imprecision on DNN behavior, especially since these characteristics of DNN can be leveraged to improve performance and energy. Ares presents a fault-injection framework for estimating the resilience of DNNs to permanent hardware faults. DeepN-JPEG revisits JPEG quantization in order to improve classification accuracy when using compressed images. ThUnderVolt enables voltage underscaling of DNN accelerators by tolerating timing errors. Loom presents an accelerator that exploits the variable precision required by different layers of a CNN, increasing performance by reducing precision.

    • 19-1 Ares: A framework for quantifying the resilience of deep neural networks (Harvard)
      介绍了一种可以注入硬件噪声评估DNN精确度的方法,可以对权重、激活和隐藏层分别注入噪声。文章中说考虑了两种噪声:static variation和transient variation。考虑bit-level fault tolerance。
    • 19-2 DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework (FIU, Indiana, Miami, Syracuse)
    • 19-3 ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators (NYU, IITK)
      实现aggressive voltage underscaling of high-performance DNN accelerators without compromising classification accuracy。
    • 19-4 Loom: Exploiting Weight and Activation Precisions to Accelerate Convolutional Neural Networks (Toronto)
  • Session20: COMPUTE-IN-MEMORY MEETS 3DIC
    From the first paper in the session you will learn how to use SRAM arrays to create binary neural networks; the second paper shows how floatinggate memory arrays can be used to accelerate analog-style vector-matrix multiplication for machine learning; the third paper focuses on how to reduce the impact of cross-coupling in TSV-based 3DICs using coding techniques; finally, the fourth paper in the session shows how turning a 3D stack on the side to create a “loaf-of-bread” structure (unlike the more common “pancake” structure) can be used in rad-hard space applications.

    • 20-1 Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks (Arizona, NTHU) Shimeng Yu
  • Session28: STAY COOL WITH CROSS-LAYER OPTIMIZATION!
    This session addresses various energy efficient, and yet robust, schemes from application level to circuit level. The first paper presents a new quantization methodology to improve energy-efficiency of DNNs. Next paper addresses the impact of temperature on the accuracy of Re-RAMbased neuromorphic computing systems. The third paper provides a novel compiler-guided clock scheduling algorithm to minimize energy without performance degradation. Next, a memory-based energy minimization method in dual-voltage near-threshold computing systems is presented. The next paper presents a workload-dependent voltage scaling method for ultra-low-power CPUs. The last paper provides an analytical approach to characterize power delivery networks for voltagestacked manycore systems.

    • 28-1 Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors (Purdue, IBM) best paper
    • 28-2 Thermal-aware Optimizations of ReRAM-based Neuromorphic Computing Systems (Northwestern)
  • Session56: LEARNING HOW TO THINK
    In this session, we consider how technology impacts both training and inference in various types of neuro-inspired computing models. The first paper considers a ReRAM-based accelerator suitable for both training and inference in CNNs. The next paper presents ways to mitigate accuracy loss in spiking neural networks even when data is quantized. CNNs and binary CNNs are then considered in the context of SOT-MRAM. In papers four and five, RRAM-based compute kernels are considered in the context of sparse neural networks and gradient sparsification. The session concludes with a discussion of hyperdimensional computing.

    • 56-1 AtomLayer: A Universal ReRAM-Based CNN Accelerator with Atomic Layer Computation (Duke)
    • 56-2 Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks (Clarkson)
    • 56-3 CMP-PIM: An Energy-Efficient Comparator-based Processing-In-Memory Neural Network Accelerator (Florida)
    • 56-4 SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory (Florida)
    • 56-5 Long Live TIME: Improving Lifetime for Training-In-Memory Engines by Structured Gradient Sparsification (THU, CAS, MIT)
  • Session69: BUILDING FAST AND EFFICIENT NEURAL NETWORKS
    This session includes papers that present innovative solutions to improve efficiency and performance of neural networks. The first two papers present efficient implementations of the Winograd convolutional neural network (CNN) algorithm. The first paper proposes a sparse-optimized dataflow and a load-balancing algorithm for enhancing CNN efficiency. The second paper focuses on an efficient implementation targeting IoT edge devices. The third paper discusses a kernel transformation method to reduce computations and improve performance and power efficiency of binary- and ternary-weight neural networks. The fourth paper pursues mapping XNOR and bitcount operations in binary neural networks onto content addressable memory (CAM) arrays.

    • 69-1 SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs (Peking)
    • 69-2 Efficient Winograd-based Convolution Kernel Implementation on Edge Devices (Intel, NTUA)
    • 69-3 An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference (THU)
    • 69-4 Content Addressable Memory Based Binarized Neural Network Accelerator Using Time-Domain Signal Processing (Korea University)
  • Session72: SPECIAL SESSION: CO-DESIGN OF DEEP NEURAL NETS AND NEURAL NET ACCELERATORS
    As deep neural nets are at the core of many applications, a new problem of HW/SW co-design emerges. It is now common that even highly regarded DNN accelerators benchmark themselves on tiny datasets and antiquated DNN architectures. At the same time, for designers of novel DNN models, details on processor power-consumption and timing-models have never been harder to obtain. As a result, many DNN accelerator architects are focusing on increasing the speed on energy efficiency of older DNN models running on out of date benchmarks, and the novel DNN models, of many computer vision researchers, that increase accuracy on their target benchmarks, are only later discovered to be poorly suited to current generations of processor and DNN accelerator architectures. In this session we bring together three research groups which aim to closely coordinate the novel design of DNN models with the design of processors for efficiently executing them.

    • 72-2 Bandwidth-Efficient Deep Learning (Google)
    • 72-3 Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications (Berkeley AI Research)
  • Session75: APPROXIMATE COMPUTING: GOOD ENOUGH IS ENOUGH
    Low energy, small area, and high performance can be achieved by employing approximate computing. This session covers the broad range of approximate computing with first three papers on individual high-efficiency approximate adder and multiplier designs to automated optimization strategies for libraries of approximate components that limit the accuracy loss while maximizing efficiency and providing quality guarantees. The next two papers target the emerging area of approximate computing on reconfigurable fabrics, targeting FPGAs and novel coarsegrained reconfigurable architectures with hardened approximate functional units. The final paper focuses on design space exploration to examine the combined impact of multiple approximate units on the output quality.

    • 75-1 SMApproxLib: Library of FPGA-based Approximate Multipliers (Technische Universität)
    • 75-2 Sign-Magnitude SC: Getting 10X Accuracy for Free in Stochastic Computing for Deep Neural Networks (UNIST)
    • 75-3 Area-Optimized Low-Latency Approximate Multipliers for FPGA-based Hardware Accelerators (Technische Universität, TU Wien)
    • 75-4 Approximate On-The-Fly Coarse-Grained Reconfigurable Acceleration for General-Purpose Applications (UFRGS, TU Wien)
    • 75-5 LEMAX: Learning-based Energy Consumption Minimization in Approximate Computing with Quality Guarantee (UCSD)
  • 12-4 Calibrating Process Variation at System Level with In-Situ Low-Precision Transfer Learning for Analog Neural Network Processors(THU)

2. Conference Papers: SC-Based Neural Network

2018: DAC, ASP-DAC, DATE,

  • 2018 DAC DPS: dynamic precision scaling for stochastic computing-based deep neural networks (UNIST)
    DPS(动态精度缩放)SC-CNN在ImageNet目标的CNN方面具有高效性和准确性,并且显示出比传统数字设计更高的效率,每个区域的操作范围为50%~100%,具体取决于DNN和应用场景,识别准确率下降不到1%。
  • 2018 DAC Sign-magnitude SC: getting 10X accuracy for free in stochastic computing for deep neural networks (UNIST)
    提出了带符号位的随机计算编码方式,并证明比传统Bipolar方式好。
    测试数据集用了MNIST,CIFAR-10和LSTM的一个
  • 2018 ASP-DAC Low latency parallel implementation of traditionally-called stochastic circuits using deterministic shuffling networks (UMN)
    第一个提出并行确定性随机比特流,利用了译码器生成随机比特流;使用简单的确定性温度计数据编码,从而产生零随机波动和高精度,同时保持输出比特流长度不变。
  • 2018 DATE An energy-efficient stochastic computational deep belief network (Alberta, Syracuse, NEU)

2017: ASLPOS, ICCD, DAC, DATE, ASP-DAC, ICCAD

  • 2017 ASLPOS SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing (Syracuse, USC, CNNY)
    This paper presents SC-DCNN, the first comprehensive design and optimization framework of SC-based DCNNs, using a bottom-up approach. We first present the designs of function blocks that perform the basic operations in DCNN, including inner product, pooling, and activation function.
  • 2017 ICCD Accurate and Efficient Stochastic Computing Hardware for Convolutional Neural Networks (Syracuse, USC, CUNY)
    使用单极性编码,减少乘法计算误差;提出了激活函数SReLU和池化函数Smax;权重归一化,并放缩权重;随机数产生器共享的CNN
  • 2017 ICCD Neural Network Classifiers Using Stochastic Computing with a Hardware-Oriented Approximate Activation Function (UMN, CUNY)
    这个文章和2018ISQED那个量化的好像
  • 2017 DAC A New Stochastic Computing Multiplier with Application to Deep Convolutional Neural Networks (UNIST)
    提出一种新SC乘法算法及其向量扩展SC-MVM(矩阵向量乘法器)来解决基于SC的CNN的两个关键问题——SC的固有随机波动误差和长等待时间导致精度和能量效率的降低。其中一个SC乘法只需几个周期。
  • 2017 DATE Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing (Washington, UMich)
    随机计算二进制混合设计,重新训练二进制部分,弥补随机计算精确度损失
  • 2017 DATE Structural design optimization for deep convolutional neural networks using stochastic computing (Syracuse, USC, CUNY)
  • 2017 DATE Energy efficient stochastic computing with Sobol sequences (UMN)
  • 2017 ASP-DAC Scalable stochastic-computing accelerator for convolutional neural networks (UNIST, Seoul National University)
    针对CNN设计的随机计算神经网络,随机计算和二进制混合设计,与二进制和传统随机计算相比,都有明显优势。比特流的并行,利用accumulative parallel counter,将计数器和累加器合并在一起,减小并行计数器开销。
  • 2017 ASP-DAC Towards acceleration of deep convolutional neural networks using stochastic computing (USC, Syracuse, CUNY)
    使用随机计算(SC)的完全并行和可扩展的基于硬件的DCNN设计,主要描述了神经元近似并行计数器(APC)
  • 2017 ICCAD Deep reinforcement learning: Framework, applications, and embedded implementations: Invited paper (Syracuse, UCR)
  • 2017 IJCNN Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks (USC, Syracuse)

2016: DAC,

  • 2016 DAC Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks ( Seoul National University, UNIST)
    非常经典的随机计算神经网络之一!去除近零权重,应用权重缩放,将激活函数与累加器集成;利用随机计算的渐进精度特性,允许实现早期决策终止。

3. Nonlinear Activation Function in Stochastic Computing

  • 2017 IJCNN Hardware-driven nonlinear activation for stochastic computing based deep convolutional neural networks (USC, Syracuse)
    In this paper, we design and optimize SC based neurons, and we propose highly accurate activation designs for the three most frequently used activation functions in software DCNNs, i.e, hyperbolic tangent, logistic, and rectified linear units.
  • 2017 GLSVLSI Softmax Regression Design for Stochastic Computing Based Deep Convolutional Neural Networks (USC, Syracuse, CNNY)

4. 随机计算理论文章

  • 2015 DAC Introduction to stochastic computing and its challenges (UMich)
  • 2017 DATE Framework for quantifying and managing accuracy in stochastic circuit design (Passau, UMich)
    对于组合SC,精度与电路的尺寸或复杂性无关

4.1 编码方式

4.1.1 Time-encode 模拟脉冲方式编码

  • 2017 ASP-DAC High-speed stochastic circuits using synchronous analog pulses (UMN)

4.1.2 确定性编码

  • 2018 TCAD An Efficient and Accurate Stochastic Number Generator Using Even-distribution Coding (UNIST, Samsung, Seoul National University)
  • 2017 DATE Energy efficient stochastic computing with Sobol sequences (Alberta)
  • 2017 ICCD Power and Area Efficient Sorting Networks Using Unary Processing (UMN)

4.2 随机数产生器

  • 2016 ASP-DAC An energy-efficient random number generator for stochastic (Seoul National University, UNIST)

4.3 其他方面

  • 2018 DATE Correlation manipulating circuits for stochastic computing (Washington)
  • 2016 ASP-DAC Polysynchronous stochastic circuits (UMN)
  • 2017 ICCAD Statistically certified approximate logic synthesis (Cornell)

5. Spintronics等新器件相关的随机计算

  • 2017 ASP-DAC Spintronics based stochastic computing for efficient Bayesian inference system (Beihang University, Duke)
  • 2017 ICCAD Design of accurate stochastic number generators with noisy emerging devices for stochastic computing(SJTU, UMich, UCF)

需要看还没看的paper

  • 2018 Jie Han An Energy-Efficient Online-Learning Stochastic Computational Deep Belief Network

你可能感兴趣的:(Stochastic Computing + Quantization)