理解 CPU Cache

下列两个循环哪个快？

int array[1024][1024]

// Loop 1
for(int i = 0; i < 1024; i ++)
    for(int j = 0; j < 1024; j ++)
        array[i][j] ++;

// Loop 2
for(int i = 0; i < 1024; i ++)
    for(int j = 0; j < 1024; j ++)
        array[j][i] ++;

Loop 1 的 CPU cache 命中率高，所以它比 Loop 2 约快八倍！

Gallery of Processor Cache Effects 用 7 个源码示例生动的介绍 cache 原理，深入浅出！但是可能因操作系统的差异、编译器是否优化，以及近些年 cache 性能的提升，第 3 个样例在 Mac 的效果与原文相差较大。另外 Berkeley 公开课 CS162 图文并茂，非常推荐。本文充当搬运工的角色，集二者之精华科普 CPU cache 知识。

What is Cache

维基百科定义为：在计算机系统中，CPU cache（中文简称缓存）是用于减少处理器访问内存所需平均时间的部件。在金字塔式存储体系中它位于自顶向下的第二层，仅次于 CPU 寄存器。其容量远小于内存，但速度却可以接近处理器的频率。

原图出处(CS162)。 Note: 早期的 L2 cache 位于主板，现在 L2 和 L3 cache 均封装于 CPU 芯片。

CPU 访问内存时，首先查询 cache 是否已缓存该数据。如果有，则返回数据，无需访问内存；如果不存在，则需把数据从内存中载入 cache，最后返回给理器。在处理器看来，缓存是一个透明部件，旨在提高处理器访问内存的速率，所以从逻辑的角度而言，编程时无需关注它，但是从性能的角度而言，理解其原理和机制有助于写出性能更好的程序。Cache 之所以有效，是因为程序对内存的访问存在一种概率上的局部特征：

Spatial Locality：对于刚被访问的数据，其相邻的数据在将来被访问的概率高。
Temporal Locality：对于刚被访问的数据，其本身在将来被访问的概率高。

从广义的角度而言，cache 可以分为两类：

数据(指令) cache: 缓存内存数据，根据层级又可分为 L1、L2 和 L3，如果 miss，CPU 需访内存获取数据(指令)。
TLB(Translation lookaside buffer): 寻址 cache，缓存进程的虚拟机地址和物理地址之间的映射关系，如果 miss，MMU 需多次访问内存获取多级 page table 才能计算出物理地址。

比 mac OS 为例，可用 sysctl 查询 cache 信息。

$ sysctl -a

hw.cachelinesize: 64
hw.l1icachesize: 32768
hw.l1dcachesize: 32768
hw.l2cachesize: 262144
hw.l3cachesize: 3145728
machdep.cpu.cache.L2_associativity: 8
machdep.cpu.core_count: 2
machdep.cpu.thread_count: 4
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.data.small_level1: 64
machdep.cpu.tlb.shared: 1024

如下图：

Why Cache

早期的 CPU 并没有 cache，以起于 1978 年的 intel x86 芯片为例，它从 1992 年开始才开始引入 cache：

1992: 386 platform 引入 L1 cache
1992: 386 platform 引入 L1 cache
1995: Pentium Pro 引入 L2 cache
2008: Core i3 引入 L3 cache1995: Pentium Pro 引入 L2 cache
1992: 386 platform 引入 L1 cache
1995: Pentium Pro 引入 L2 cache
2008: Core i3 引入 L3 cache 2008: Core i3 引入 L3 cache

CPU 和 RAM 主频的增长速率的巨大差距是 cache 引入的直接原因，下图展示了从 1980 年到 2010 年二者的发展状况，CPU 性能的年增长速度约为 60%，而 RAM 仅有约 9%，巨大的差异导致数十年后，CPU 的速度约比 RAM 快数百倍。

原图出处: Computer Architecture, A quantitative Approach by Hennessy and Patterson

有人问，为什么不提高 RAM 的速度，因为成本太高！成本因素也是 cache 分为多级的原因。越快的越贵，所以容量小；越慢越廉，容量可很大，它是成本和性能之间的折中方案。CS162 如下两句原话很好的概括了 cache 的作用。

Present as much memory as in the cheapest technology
Provide access at speed offered by the fastest technology

原文

http://wsfdl.com/linux/2016/06/11/%E7%90%86%E8%A7%A3CPU%E7%9A%84cache.html

理解 CPU Cache

What is Cache

Why Cache

原文

你可能感兴趣的:(理解 CPU Cache)