cache line对内存访问的影响很早就看到了,但是没有写过例子跑过,突然兴起就写了下,对这里第一个例子稍微做了改造。要注意jvm参数设置,新生代+老生代分配了2.4xG内存,新生代分了2G,eden区分了1.6g,从实际内存占用看,数组eden区使用了近1.1G的内存,剩下区域基本都是空的。另,demo是在mac上跑的。
/** * -Xms2500m -Xmx2500m -Xcomp -Xmn2g -XX:NewRatio=10 * @author tianmai.fh * @date 2014-03-12 16:55 */ public class CacheLineTest { public static final int COUNT = 3; public static void main(String[] args) { int[] arrs = new int[64 * 1024 * 1024 * 4]; //1g的空间,通过参数设值全放到了eden取,避免gc对测试结果的影响 equivalentWidth(arrs); fullLoop(arrs); } /** * 全循环,看跨cache line和不跨cache line的时候,花费时间对比 * @param arrs */ public static void fullLoop(int[] arrs){ int forLen = 256; int forAssembly = 0; while (forAssembly++ < COUNT) { // 循环三次,避免第一次代码优化前的影响 for (int i = 1; i <= forLen; i *= 2) { long start = System.currentTimeMillis(); for (int j = 0, size = arrs.length; j < size; j += i) { arrs[j] = j * 3; } long end = System.currentTimeMillis(); System.out.println("Full, factor: " + i + " spent " + (end - start) + " ms"); } System.out.println(); } } /** * 每次循环计算次数相同,比较跨cache line和不跨cache line的时候,花费时间的差异 * @param arrs */ public static void equivalentWidth(int[] arrs){ int forLen = 256; int breakWidth = arrs.length / 256; int forAssembly = 0; while (forAssembly++ < COUNT) { // 循环三次,避免第一次代码优化前的影响 for (int i = 1; i <= forLen; i *= 2) { long start = System.currentTimeMillis(); int cnt = 0; for (int j = 0, size = arrs.length; j < size; j += i) { arrs[j] = j; if (++cnt > breakWidth) { //每次循环就access这么多数据 break; } } long end = System.currentTimeMillis(); System.out.println("Equivalent Witdh, factor: " + i + " spent " + (end - start) + " ms"); } System.out.println(); } } }
结果,等量数据访问的情况:
Equivalent Witdh, factor: 1 spent 3 ms Equivalent Witdh, factor: 2 spent 7 ms Equivalent Witdh, factor: 4 spent 2 ms Equivalent Witdh, factor: 8 spent 3 ms Equivalent Witdh, factor: 16 spent 7 ms Equivalent Witdh, factor: 32 spent 10 ms Equivalent Witdh, factor: 64 spent 10 ms Equivalent Witdh, factor: 128 spent 8 ms Equivalent Witdh, factor: 256 spent 9 ms Equivalent Witdh, factor: 1 spent 1 ms Equivalent Witdh, factor: 2 spent 1 ms Equivalent Witdh, factor: 4 spent 2 ms Equivalent Witdh, factor: 8 spent 4 ms Equivalent Witdh, factor: 16 spent 7 ms Equivalent Witdh, factor: 32 spent 10 ms Equivalent Witdh, factor: 64 spent 10 ms Equivalent Witdh, factor: 128 spent 9 ms Equivalent Witdh, factor: 256 spent 8 ms Equivalent Witdh, factor: 1 spent 2 ms Equivalent Witdh, factor: 2 spent 1 ms Equivalent Witdh, factor: 4 spent 1 ms Equivalent Witdh, factor: 8 spent 4 ms Equivalent Witdh, factor: 16 spent 7 ms Equivalent Witdh, factor: 32 spent 9 ms Equivalent Witdh, factor: 64 spent 10 ms Equivalent Witdh, factor: 128 spent 9 ms Equivalent Witdh, factor: 256 spent 8 ms
观察会发现,第二次和第三次运行,步长在1-8的时候,时间消耗是一个量级,大于等于16的时候,就是更高的量级了。
访问全部可访问数据的情况:
Full, factor: 1 spent 351 ms Full, factor: 2 spent 178 ms Full, factor: 4 spent 113 ms Full, factor: 8 spent 111 ms Full, factor: 16 spent 113 ms Full, factor: 32 spent 77 ms Full, factor: 64 spent 40 ms Full, factor: 128 spent 17 ms Full, factor: 256 spent 9 ms Full, factor: 1 spent 351 ms Full, factor: 2 spent 180 ms Full, factor: 4 spent 113 ms Full, factor: 8 spent 114 ms Full, factor: 16 spent 111 ms Full, factor: 32 spent 74 ms Full, factor: 64 spent 40 ms Full, factor: 128 spent 16 ms Full, factor: 256 spent 9 ms Full, factor: 1 spent 355 ms Full, factor: 2 spent 178 ms Full, factor: 4 spent 112 ms Full, factor: 8 spent 111 ms Full, factor: 16 spent 113 ms Full, factor: 32 spent 76 ms Full, factor: 64 spent 40 ms Full, factor: 128 spent 17 ms Full, factor: 256 spent 8 ms
这个在步长为1的时候耗时比较多,2-8的时候是一个量级的。在大于8的时候,耗时基本上是以50%的比率在递减,随着步长变长,导致的cache line重新加载次数也在递减。