据我的了解, 在高并发环境下, 为了减少锁的开销(睡眠, 线程上下文切换), 采用的是无锁编程(lock-free or lockless programming), 而无锁编程的基础是CAS操作, 那么CAS操作在高并发下的效果怎样, 怎么尽量避免并发带来的问题.
测试的Java代码:
package com.lqp.test;
import java.util.concurrent.atomic.AtomicLong;
public class ConcurrentCASTest {
public static void main(String[] args) throws Exception {
final AtomicLong value = new AtomicLong();
final int count = 100 * 10000;
int threadCount = 1;
for (int i = 0; i < threadCount; i++) {
final int id = i;
new Thread(new Runnable() {
@Override
public void run() {
long start = System.nanoTime();
int failCount = 0;
for (int i = 0; i < count; i++) {
long initVal = value.get();
boolean suc = value.compareAndSet(initVal, initVal + 1);
if (!suc) {
failCount++;
}
}
long diffMilis = (System.nanoTime() - start) / 1000 / 1000;
println("Time = " + diffMilis + " milis" + ", ThreadId = " + id + ", fail count = " + failCount + ", failPercent = " + (failCount * 100) / (double)count);
}
}).start();
}
}
public static boolean println(String msg) {
System.out.println(msg);
return true;
}
}
threadCount = 1, 即单线程下无并发修改操作, 测试3次:
- Time = 16 milis, ThreadId = 0, fail count = 0, failPercent = 0.0
- Time = 16 milis, ThreadId = 0, fail count = 0, failPercent = 0.0
- Time = 16 milis, ThreadId = 0, fail count = 0, failPercent = 0.0
threadCount = 2, 双线程同时修改, 测试3次:
- Time = 62 milis, ThreadId = 1, fail count = 347064, failPercent = 34.7064
Time = 64 milis, ThreadId = 0, fail count = 463725, failPercent = 46.3725- Time = 65 milis, ThreadId = 1, fail count = 478589, failPercent = 47.8589
Time = 66 milis, ThreadId = 0, fail count = 477393, failPercent = 47.7393- Time = 66 milis, ThreadId = 0, fail count = 489875, failPercent = 48.9875
Time = 65 milis, ThreadId = 1, fail count = 490575, failPercent = 49.0575
threadCount = 4, 4线程同时修改, 测试2次:
- Time = 150 milis, ThreadId = 0, fail count = 663234, failPercent = 66.3234
Time = 152 milis, ThreadId = 1, fail count = 747265, failPercent = 74.7265
Time = 154 milis, ThreadId = 3, fail count = 666898, failPercent = 66.6898
Time = 154 milis, ThreadId = 2, fail count = 675097, failPercent = 67.5097- Time = 134 milis, ThreadId = 1, fail count = 522787, failPercent = 52.2787
Time = 148 milis, ThreadId = 3, fail count = 746457, failPercent = 74.6457
Time = 150 milis, ThreadId = 2, fail count = 744108, failPercent = 74.4108
Time = 150 milis, ThreadId = 0, fail count = 710502, failPercent = 71.0502
threadCount = 4, 4线程同时修改, 测试1次:
Time = 258 milis, ThreadId = 2, fail count = 753286, failPercent = 75.3286
Time = 269 milis, ThreadId = 6, fail count = 769007, failPercent = 76.9007
Time = 277 milis, ThreadId = 5, fail count = 750360, failPercent = 75.036
Time = 281 milis, ThreadId = 3, fail count = 799387, failPercent = 79.9387
Time = 283 milis, ThreadId = 0, fail count = 790358, failPercent = 79.0358
Time = 285 milis, ThreadId = 1, fail count = 765177, failPercent = 76.5177
Time = 279 milis, ThreadId = 7, fail count = 763133, failPercent = 76.3133
Time = 285 milis, ThreadId = 4, fail count = 791819, failPercent = 79.1819
可以看到, 在两个线程竞争的情况下, 失败率就已经到达近50%了, 这也好理解, 一人一半的概率成功. 对于高并发下来说, 性能比没有竞争下下降了4倍左右. 4线程竞争的情况下, 下降10倍左右; 8线程下, 下降接近20倍. 在实际情况中, 失败的时候很可能选择继续尝试, 直至成功, 除了时间上的进一步增加, 由于失败而进一步尝试也导致了CPU周期的浪费.
什么具体场景下可能会遇到上述的情况呢? 在我的认知领域里, JVM在分配内存的时候就有可能遇到. 我们知道heap是线程共同访问的, java里面分配对象是很常见的操作, 当线程很多, 分配的时候就很可能产生竞争. 尽管hotspot里的分配可以由一条CAS的操作搞定, 但竞争激烈情况下, 仍然会发生性能退化.
怎么去减少这种不良效应呢? 一个方法就是采用线程本地化, 让操作尽量不去参加竞争. 比如在hotspot中, 每个线程会一次性从heap中申请一块稍大的内存TLAB(thread local allocation buffer), 然后对象分配时, 优先从这块thread local的内存分配, 由于是线程私有的, 因此不需要CAS的操作即可完成分配, 只有当TLAB中不能满足时, 才会使用CAS的方式分配. 在其他需要减少竞争的地方, 也可用借鉴这种思路来解决.