Java下CAS(Compare And Swap)并发操作测试

测试目的

据我的了解, 在高并发环境下, 为了减少锁的开销(睡眠, 线程上下文切换), 采用的是无锁编程(lock-free or lockless programming), 而无锁编程的基础是CAS操作, 那么CAS操作在高并发下的效果怎样, 怎么尽量避免并发带来的问题.

测试的Java代码:

package com.lqp.test;

import java.util.concurrent.atomic.AtomicLong;

public class ConcurrentCASTest {

    public static void main(String[] args) throws Exception {
        final AtomicLong value = new AtomicLong();

        final int count = 100 * 10000;
        int threadCount = 1;

        for (int i = 0; i < threadCount; i++) {
            final int id = i;
            new Thread(new Runnable() {

                @Override
                public void run() {
                    long start = System.nanoTime();

                    int failCount = 0;
                    for (int i = 0; i < count; i++) {
                        long initVal = value.get();

                        boolean suc = value.compareAndSet(initVal, initVal + 1);

                        if (!suc) {
                            failCount++;
                        }
                    }

                    long diffMilis = (System.nanoTime() - start) / 1000 / 1000;

                    println("Time = " + diffMilis + " milis" + ", ThreadId = " + id + ", fail count = " + failCount + ", failPercent = " + (failCount * 100) / (double)count);
                }
            }).start();
        }
    }

    public static boolean println(String msg) {
        System.out.println(msg);

        return true;
    }
}

说明:

  • 循环100W次对一个AtomicLong变量, 取值, +1, 然后CAS设置值, 失败则统计失败次数, 结束时记录100W次操作所用的时间
  • threadCount是线程数量, 改变线程数目观察失败情况
  • 虽然测试语言是java, 但底层核心是仍然cpu操作指令 ‘lock cmpxchg’, 不用C测试是因为写起来不太方便
  • CPU配置Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz, 8核
  • java版本信息
    • java version “1.8.0_91”
    • Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
    • Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)

测试结果:

threadCount = 1, 即单线程下无并发修改操作, 测试3次:

  1. Time = 16 milis, ThreadId = 0, fail count = 0, failPercent = 0.0
  2. Time = 16 milis, ThreadId = 0, fail count = 0, failPercent = 0.0
  3. Time = 16 milis, ThreadId = 0, fail count = 0, failPercent = 0.0

threadCount = 2, 双线程同时修改, 测试3次:

  1. Time = 62 milis, ThreadId = 1, fail count = 347064, failPercent = 34.7064
    Time = 64 milis, ThreadId = 0, fail count = 463725, failPercent = 46.3725
  2. Time = 65 milis, ThreadId = 1, fail count = 478589, failPercent = 47.8589
    Time = 66 milis, ThreadId = 0, fail count = 477393, failPercent = 47.7393
  3. Time = 66 milis, ThreadId = 0, fail count = 489875, failPercent = 48.9875
    Time = 65 milis, ThreadId = 1, fail count = 490575, failPercent = 49.0575

threadCount = 4, 4线程同时修改, 测试2次:

  1. Time = 150 milis, ThreadId = 0, fail count = 663234, failPercent = 66.3234
    Time = 152 milis, ThreadId = 1, fail count = 747265, failPercent = 74.7265
    Time = 154 milis, ThreadId = 3, fail count = 666898, failPercent = 66.6898
    Time = 154 milis, ThreadId = 2, fail count = 675097, failPercent = 67.5097
  2. Time = 134 milis, ThreadId = 1, fail count = 522787, failPercent = 52.2787
    Time = 148 milis, ThreadId = 3, fail count = 746457, failPercent = 74.6457
    Time = 150 milis, ThreadId = 2, fail count = 744108, failPercent = 74.4108
    Time = 150 milis, ThreadId = 0, fail count = 710502, failPercent = 71.0502

threadCount = 4, 4线程同时修改, 测试1次:

Time = 258 milis, ThreadId = 2, fail count = 753286, failPercent = 75.3286
Time = 269 milis, ThreadId = 6, fail count = 769007, failPercent = 76.9007
Time = 277 milis, ThreadId = 5, fail count = 750360, failPercent = 75.036
Time = 281 milis, ThreadId = 3, fail count = 799387, failPercent = 79.9387
Time = 283 milis, ThreadId = 0, fail count = 790358, failPercent = 79.0358
Time = 285 milis, ThreadId = 1, fail count = 765177, failPercent = 76.5177
Time = 279 milis, ThreadId = 7, fail count = 763133, failPercent = 76.3133
Time = 285 milis, ThreadId = 4, fail count = 791819, failPercent = 79.1819

可以看到, 在两个线程竞争的情况下, 失败率就已经到达近50%了, 这也好理解, 一人一半的概率成功. 对于高并发下来说, 性能比没有竞争下下降了4倍左右. 4线程竞争的情况下, 下降10倍左右; 8线程下, 下降接近20倍. 在实际情况中, 失败的时候很可能选择继续尝试, 直至成功, 除了时间上的进一步增加, 由于失败而进一步尝试也导致了CPU周期的浪费.

什么具体场景下可能会遇到上述的情况呢? 在我的认知领域里, JVM在分配内存的时候就有可能遇到. 我们知道heap是线程共同访问的, java里面分配对象是很常见的操作, 当线程很多, 分配的时候就很可能产生竞争. 尽管hotspot里的分配可以由一条CAS的操作搞定, 但竞争激烈情况下, 仍然会发生性能退化.

怎么去减少这种不良效应呢? 一个方法就是采用线程本地化, 让操作尽量不去参加竞争. 比如在hotspot中, 每个线程会一次性从heap中申请一块稍大的内存TLAB(thread local allocation buffer), 然后对象分配时, 优先从这块thread local的内存分配, 由于是线程私有的, 因此不需要CAS的操作即可完成分配, 只有当TLAB中不能满足时, 才会使用CAS的方式分配. 在其他需要减少竞争的地方, 也可用借鉴这种思路来解决.

你可能感兴趣的:(测试代码,java,并发,线程,CAS操作)