cache line对内存访问的影响

        cache line对内存访问的影响很早就看到了,但是没有写过例子跑过,突然兴起就写了下,对这里第一个例子稍微做了改造。要注意jvm参数设置,新生代+老生代分配了2.4xG内存,新生代分了2G,eden区分了1.6g,从实际内存占用看,数组eden区使用了近1.1G的内存,剩下区域基本都是空的。另,demo是在mac上跑的。

/**
 * -Xms2500m -Xmx2500m -Xcomp -Xmn2g -XX:NewRatio=10
 * @author tianmai.fh
 * @date 2014-03-12 16:55
 */
public class CacheLineTest {

    public static final int COUNT = 3;
    public static void main(String[] args) {
        int[] arrs = new int[64 * 1024 * 1024 * 4];  //1g的空间,通过参数设值全放到了eden取,避免gc对测试结果的影响
        equivalentWidth(arrs);
        fullLoop(arrs);
    }

    /**
     * 全循环,看跨cache line和不跨cache line的时候,花费时间对比
     * @param arrs
     */
    public static void fullLoop(int[] arrs){
        int forLen = 256;
        int forAssembly = 0;
        while (forAssembly++ < COUNT) { // 循环三次,避免第一次代码优化前的影响
            for (int i = 1; i <= forLen; i *= 2) {
                long start = System.currentTimeMillis();
                for (int j = 0, size = arrs.length; j < size; j += i) {
                    arrs[j] = j * 3;
                }
                long end = System.currentTimeMillis();
                System.out.println("Full, factor: " + i + " spent " + (end - start) + " ms");
            }
            System.out.println();
        }
    }
    /**
     * 每次循环计算次数相同,比较跨cache line和不跨cache line的时候,花费时间的差异
     * @param arrs
     */
    public static void equivalentWidth(int[] arrs){
        int forLen = 256;
        int breakWidth = arrs.length / 256;
        int forAssembly = 0;
        while (forAssembly++ < COUNT) { // 循环三次,避免第一次代码优化前的影响
            for (int i = 1; i <= forLen; i *= 2) {
                long start = System.currentTimeMillis();
                int cnt = 0;
                for (int j = 0, size = arrs.length; j < size; j += i) {
                    arrs[j] = j;
                    if (++cnt > breakWidth) {     //每次循环就access这么多数据
                        break;
                    }
                }
                long end = System.currentTimeMillis();
                System.out.println("Equivalent Witdh, factor: " + i + " spent " + (end - start) + " ms");
            }
            System.out.println();
        }
    }
}

 结果,等量数据访问的情况:

Equivalent Witdh, factor: 1 spent 3 ms
Equivalent Witdh, factor: 2 spent 7 ms
Equivalent Witdh, factor: 4 spent 2 ms
Equivalent Witdh, factor: 8 spent 3 ms
Equivalent Witdh, factor: 16 spent 7 ms
Equivalent Witdh, factor: 32 spent 10 ms
Equivalent Witdh, factor: 64 spent 10 ms
Equivalent Witdh, factor: 128 spent 8 ms
Equivalent Witdh, factor: 256 spent 9 ms

Equivalent Witdh, factor: 1 spent 1 ms
Equivalent Witdh, factor: 2 spent 1 ms
Equivalent Witdh, factor: 4 spent 2 ms
Equivalent Witdh, factor: 8 spent 4 ms
Equivalent Witdh, factor: 16 spent 7 ms
Equivalent Witdh, factor: 32 spent 10 ms
Equivalent Witdh, factor: 64 spent 10 ms
Equivalent Witdh, factor: 128 spent 9 ms
Equivalent Witdh, factor: 256 spent 8 ms

Equivalent Witdh, factor: 1 spent 2 ms
Equivalent Witdh, factor: 2 spent 1 ms
Equivalent Witdh, factor: 4 spent 1 ms
Equivalent Witdh, factor: 8 spent 4 ms
Equivalent Witdh, factor: 16 spent 7 ms
Equivalent Witdh, factor: 32 spent 9 ms
Equivalent Witdh, factor: 64 spent 10 ms
Equivalent Witdh, factor: 128 spent 9 ms
Equivalent Witdh, factor: 256 spent 8 ms

 观察会发现,第二次和第三次运行,步长在1-8的时候,时间消耗是一个量级,大于等于16的时候,就是更高的量级了。

访问全部可访问数据的情况:

Full, factor: 1 spent 351 ms
Full, factor: 2 spent 178 ms
Full, factor: 4 spent 113 ms
Full, factor: 8 spent 111 ms
Full, factor: 16 spent 113 ms
Full, factor: 32 spent 77 ms
Full, factor: 64 spent 40 ms
Full, factor: 128 spent 17 ms
Full, factor: 256 spent 9 ms

Full, factor: 1 spent 351 ms
Full, factor: 2 spent 180 ms
Full, factor: 4 spent 113 ms
Full, factor: 8 spent 114 ms
Full, factor: 16 spent 111 ms
Full, factor: 32 spent 74 ms
Full, factor: 64 spent 40 ms
Full, factor: 128 spent 16 ms
Full, factor: 256 spent 9 ms

Full, factor: 1 spent 355 ms
Full, factor: 2 spent 178 ms
Full, factor: 4 spent 112 ms
Full, factor: 8 spent 111 ms
Full, factor: 16 spent 113 ms
Full, factor: 32 spent 76 ms
Full, factor: 64 spent 40 ms
Full, factor: 128 spent 17 ms
Full, factor: 256 spent 8 ms

 这个在步长为1的时候耗时比较多,2-8的时候是一个量级的。在大于8的时候,耗时基本上是以50%的比率在递减,随着步长变长,导致的cache line重新加载次数也在递减。

你可能感兴趣的:(cache,line)