Volatile底层实现- 线程可见,指令重排+volatile与synchronized对比+使用hsdis观察volatile及synchronized

文章目录

  • volatile的用途
    • 1.线程可见性
    • 2.防止指令重排序
      • 例子:DCL单例需不需要加volatile?
    • 3. volatile与synchronized的区别
    • CPU的基础知识
  • 用hsdis观察synchronized和volatile
    • 输出结果

volatile的用途

volatile本意是“易变的,可变的”,它的作用是来保证 线程的可见性,和防止指令重排

1.线程可见性


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SOBgRKtz-1603325999246)(/Users/luca/MarkText-img-Support/2020-08-08-09-39-08-image.png)]

  1. 我们都知道每一个线程都有自己的线程私有区,还有与多个线程共享的线程共有区也就是堆(heap),如果线程要使用到堆中的某一个对象,则这个对象会被复制到线程私有区,对这个对象的任何改变首先是更新自己私有区的,之后在立刻写回到共享内存中(Heap 堆)。

  2. 这就产生了线程可见性的问题,假设一种情况:我需要线程一修改该对象之后,在用线程二处理这个对象,但是这个对象在线程一还未将处理后的对象更新到共享内存中。线程二就已经到共享内存中取得了这个对象。这就会出现问题。

  3. 所以我们就使用volatile关键字来保证这个对象的线程可见性。当一个线程在对主内存的某一份数据进行更改时,改完之后会立刻刷新到主内存。并且会强制让缓存了该变量的线程中的数据清空,它们必须从主内存重新读取最新数据。这样一来就保证了可见性

  4. 它的本质是使用的CPU的缓存一致性协议来保证线程可见性。 CPU缓存一致性协议是多核的计算机来保证各个CPU缓存一致的。

  5. 注意⚠️:volatile没有把握就不要用,volatile修饰的值越简单越好,尽量不要用它来修饰引用,例如:

    volatile ArrayList<Object> mylist = new ArrayList();
    

    这样volatile就是修饰的一个引用,volatile保证这个引用的可见性,也就是说,当mylist指向别的对象时其他的线程才看的见,当这个列表的内容发生改变时,另外的线程时看不见的。

package com.mashibing.testvolatile;

public class T01_ThreadVisibility {
     
    private static volatile boolean flag = true;

    public static void main(String[] args) throws InterruptedException {
     
        new Thread(()-> {
     
            while (flag) {
     
                //do sth
            }
            System.out.println("end");
        }, "server").start();


        Thread.sleep(1000);

        flag = false;
    }
}

2.防止指令重排序


  • 我们知道,CPU为了执行的效率,都是采用了流水线模式来处理指令,如果想充分的利用这一点,就要求我们的编译器会对编译完的代码进行重新排序。

  • 但是有一些指令我们需要让他按顺序执行,不能让它重排序,所以我们要使用volatile来修饰。

  • 我们不能禁止CPU的指令重排序,因为这是CPU提高效率的策略,是CPU级别的,我们禁止不了的,但是我们可以在虚拟机级别来禁止指令重排序。

  • 如果还要深究,其实防止指令重排是使用读屏障(LoadFence),写屏障(StroeFence). 这个是CPU的原语,CPU是直接支持的。LoadFance规定 必须执行完屏障前的读操作才能执行屏障后的操作,写也是一样

例子:DCL单例需不需要加volatile?


public static class MyObject{
     
        private static MyObject INSTANCE ;
        //有线程安全的单例模式
        public static /*volatile*/MyObject getInstance(){
     
            if(INSTANCE == null){
     
                INSTANCE = new MyObject();
            }
            return INSTANCE;
        }
        //使用synchronized解决线程安全
        public static synchronized MyObject getInstance(){
     
            if(INSTANCE == null){
     
                INSTANCE = new MyObject();
            }
            return INSTANCE;
        }
        //锁粒度细化
        public static  MyObject getInstance(){
     
            //其他不用上锁的代码
            if(INSTANCE == null){
     
                synchronized(MyObject.class){
     
                    if(INSTANCE == null) {
     
                        INSTANCE = new MyObject();
                    }
                }

            }
            return INSTANCE;
        }

    }
  1. 这样的单例模式肯定是有问题的,因为他是线程不安全的,我们可以在getInstance方法上加一个Synchronized,这样肯定是解决了这个问题。

  2. 但是现在又有一个问题,我们直接粗暴将getInstance() 方法变成了同步方法,我们希望将这个锁的粒度细化,最终就是我们的DCL(Double Check Lock), 这个看起来是十全十美了,在工程上也十分难出错,但是这个还是可能会出错。

  3. 这个错就在INSTANCE = new MyObject(); 的指令重排上。在JVM中new一个对象分成三步

    1. 给对象申请内存

    2. 初始化对象的成员变量

    3. 引用指向这块内存

    当这三个步骤发生了指令重排序的话,比如顺序是132。当我们一个线程开始new这个对象的时候,执行的是132,当这个线程执行到3的时候,也就是说引用已经指向了这一块内存,但这一块内存还是赋值的默认值,还没有进行初始化,这时第二个线程进来,判断发现,这个引用已经指向一个内存了(也就是不等于null了),这时第二个线程就直接拿起还没有初始化的对象就走了。

    虽然这个情况在高并发的环境中也可能不会出现,但是在超高超高的并发环境下就可能会出现这种的情况

  4. 这时我们就要对这个对象加上volatile,防止这个对象进行指令重排序

3. volatile与synchronized的区别


volatile不可能替代synchronized,volatile只保证线程的可见性,但不保证原子性,比如一个递增语句:count++,它最少分为三步执行,在这三步中难免会被其他的线程插一脚进来访问,所以volatile并不能保证多个线程访问共享数据带来的不一致问题

CPU的基础知识

  • 缓存行对齐
    缓存行64个字节是CPU同步的基本单位,缓存行隔离会比伪共享效率要高
    Disruptor

  • 需要注意,JDK8引入了@sun.misc.Contended注解,来保证缓存行隔离效果
    要使用此注解,必须去掉限制参数:-XX:-RestrictContended

  • 另外,java编译器或者JIT编译器有可能会去除没用的字段,所以填充字段必须加上volatile

    package com.mashibing.juc.c_028_FalseSharing;
    
    public class T02_CacheLinePadding {
           
        private static class Padding {
           
            public volatile long p1, p2, p3, p4, p5, p6, p7; //
        }
    
        private static class T extends Padding {
           
            public volatile long x = 0L;
        }
    
        public static T[] arr = new T[2];
    
        static {
           
            arr[0] = new T();
            arr[1] = new T();
        }
    
        public static void main(String[] args) throws Exception {
           
            Thread t1 = new Thread(()->{
           
                for (long i = 0; i < 1000_0000L; i++) {
           
                    arr[0].x = i;
                }
            });
    
            Thread t2 = new Thread(()->{
           
                for (long i = 0; i < 1000_0000L; i++) {
           
                    arr[1].x = i;
                }
            });
    
            final long start = System.nanoTime();
            t1.start();
            t2.start();
            t1.join();
            t2.join();
            System.out.println((System.nanoTime() - start)/100_0000);
        }
    }
    

    MESI

  • 伪共享

  • 合并写
    CPU内部的4个字节的Buffer

    package com.mashibing.juc.c_029_WriteCombining;
    
    public final class WriteCombining {
           
    
        private static final int ITERATIONS = Integer.MAX_VALUE;
        private static final int ITEMS = 1 << 24;
        private static final int MASK = ITEMS - 1;
    
        private static final byte[] arrayA = new byte[ITEMS];
        private static final byte[] arrayB = new byte[ITEMS];
        private static final byte[] arrayC = new byte[ITEMS];
        private static final byte[] arrayD = new byte[ITEMS];
        private static final byte[] arrayE = new byte[ITEMS];
        private static final byte[] arrayF = new byte[ITEMS];
    
        public static void main(final String[] args) {
           
    
            for (int i = 1; i <= 3; i++) {
           
                System.out.println(i + " SingleLoop duration (ns) = " + runCaseOne());
                System.out.println(i + " SplitLoop  duration (ns) = " + runCaseTwo());
            }
        }
    
        public static long runCaseOne() {
           
            long start = System.nanoTime();
            int i = ITERATIONS;
    
            while (--i != 0) {
           
                int slot = i & MASK;
                byte b = (byte) i;
                arrayA[slot] = b;
                arrayB[slot] = b;
                arrayC[slot] = b;
                arrayD[slot] = b;
                arrayE[slot] = b;
                arrayF[slot] = b;
            }
            return System.nanoTime() - start;
        }
    
        public static long runCaseTwo() {
           
            long start = System.nanoTime();
            int i = ITERATIONS;
            while (--i != 0) {
           
                int slot = i & MASK;
                byte b = (byte) i;
                arrayA[slot] = b;
                arrayB[slot] = b;
                arrayC[slot] = b;
            }
            i = ITERATIONS;
            while (--i != 0) {
           
                int slot = i & MASK;
                byte b = (byte) i;
                arrayD[slot] = b;
                arrayE[slot] = b;
                arrayF[slot] = b;
            }
            return System.nanoTime() - start;
        }
    }
    
  • 指令重排序

    package com.mashibing.jvm.c3_jmm;
    
    public class T04_Disorder {
           
        private static int x = 0, y = 0;
        private static int a = 0, b =0;
    
        public static void main(String[] args) throws InterruptedException {
           
            int i = 0;
            for(;;) {
           
                i++;
                x = 0; y = 0;
                a = 0; b = 0;
                Thread one = new Thread(new Runnable() {
           
                    public void run() {
           
                        //由于线程one先启动,下面这句话让它等一等线程two. 读着可根据自己电脑的实际性能适当调整等待时间.
                        //shortWait(100000);
                        a = 1;
                        x = b;
                    }
                });
    
                Thread other = new Thread(new Runnable() {
           
                    public void run() {
           
                        b = 1;
                        y = a;
                    }
                });
                one.start();other.start();
                one.join();other.join();
                String result = "第" + i + "次 (" + x + "," + y + ")";
                if(x == 0 && y == 0) {
           
                    System.err.println(result);
                    break;
                } else {
           
                    //System.out.println(result);
                }
            }
        }
    
    public static void shortWait(long interval){
        long start = System.nanoTime();
        long end;
        do{
            end = System.nanoTime();
        }while(start + interval >= end);
    }
    

    }

### 系统底层如何实现数据一致性

1. MESI如果能解决,就使用MESI
2. 如果不能,就锁总线

### 系统底层如何保证有序性

1. 内存屏障sfence mfence lfence等系统原语
2. 锁总线

### volatile如何解决指令重排序

1: volatile i

2: ACC_VOLATILE

3: JVM的内存屏障

​    屏障两边的指令不可以重排!保障有序!

​    happends-before 

​    as - if - serial

4:hotspot实现

bytecodeinterpreter.cpp

​```c++
int field_offset = cache->f2_as_index();
          if (cache->is_volatile()) {
            if (support_IRIW_for_not_multiple_copy_atomic_cpu) {
              OrderAccess::fence();
            }

orderaccess_linux_x86.inline.hpp

inline void OrderAccess::fence() {
  if (os::is_MP()) {
    // always use locked addl since mfence is sometimes expensive
#ifdef AMD64
    __asm__ volatile ("lock; addl $0,0(%%rsp)" : : : "cc", "memory");
#else
    __asm__ volatile ("lock; addl $0,0(%%esp)" : : : "cc", "memory");
#endif
  }
}

LOCK 用于在多处理器中执行指令时对共享内存的独占使用。
它的作用是能够将当前处理器对应缓存的内容刷新到内存,并使其他处理器对应的缓存失效。

另外还提供了有序的指令无法越过这个内存屏障的作用。

用hsdis观察synchronized和volatile

  1. 安装hsdis

  2. 代码

    public class T {
           
    
      public static volatile int i = 0;
    
      public static void main(String[] args) {
           
        for(int i=0; i<1000000; i++) {
           
           m();
           n();
        }
      }
    
      public static synchronized void m() {
           
    
      }
    
      public static void n() {
           
        i = 1;
      }
    }
    
  3. java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly T > 1.txt
    

输出结果

由于JIT会为所有代码生成汇编,请搜索T::m T::n,来找到m() 和 n()方法的汇编码

============================= C1-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c1)      67    1       3       java.lang.Object:: (1 bytes)
 total in heap  [0x00007f81d4d33010,0x00007f81d4d33360] = 848
 relocation     [0x00007f81d4d33170,0x00007f81d4d33198] = 40
 main code      [0x00007f81d4d331a0,0x00007f81d4d33260] = 192
 stub code      [0x00007f81d4d33260,0x00007f81d4d332f0] = 144
 metadata       [0x00007f81d4d332f0,0x00007f81d4d33300] = 16
 scopes data    [0x00007f81d4d33300,0x00007f81d4d33318] = 24
 scopes pcs     [0x00007f81d4d33318,0x00007f81d4d33358] = 64
 dependencies   [0x00007f81d4d33358,0x00007f81d4d33360] = 8

--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Entry Point]
  # {method} {0x00007f81d3cfe650} '' '()V' in 'java/lang/Object'
  #           [sp+0x40]  (sp of caller)
  0x00007f81d4d331a0:   mov    0x8(%rsi),%r10d
  0x00007f81d4d331a4:   shl    $0x3,%r10
  0x00007f81d4d331a8:   cmp    %rax,%r10
  0x00007f81d4d331ab:   jne    0x00007f81d47eed00           ;   {runtime_call ic_miss_stub}
  0x00007f81d4d331b1:   data16 data16 nopw 0x0(%rax,%rax,1)
  0x00007f81d4d331bc:   data16 data16 xchg %ax,%ax
[Verified Entry Point]
  0x00007f81d4d331c0:   mov    %eax,-0x14000(%rsp)
  0x00007f81d4d331c7:   push   %rbp
  0x00007f81d4d331c8:   sub    $0x30,%rsp
  0x00007f81d4d331cc:   movabs $0x7f81d3f33388,%rdi         ;   {metadata(method data for {method} {0x00007f81d3cfe650} '' '()V' in 'java/lang/Object')}
  0x00007f81d4d331d6:   mov    0x13c(%rdi),%ebx
  0x00007f81d4d331dc:   add    $0x8,%ebx
  0x00007f81d4d331df:   mov    %ebx,0x13c(%rdi)
  0x00007f81d4d331e5:   and    $0x1ff8,%ebx
  0x00007f81d4d331eb:   cmp    $0x0,%ebx
  0x00007f81d4d331ee:   je     0x00007f81d4d33204           ;*return {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - java.lang.Object::@0 (line 50)
  0x00007f81d4d331f4:   add    $0x30,%rsp
  0x00007f81d4d331f8:   pop    %rbp
  0x00007f81d4d331f9:   mov    0x108(%r15),%r10
  0x00007f81d4d33200:   test   %eax,(%r10)                  ;   {poll_return}
  0x00007f81d4d33203:   retq   
  0x00007f81d4d33204:   movabs $0x7f81d3cfe650,%r10         ;   {metadata({method} {0x00007f81d3cfe650} '' '()V' in 'java/lang/Object')}
  0x00007f81d4d3320e:   mov    %r10,0x8(%rsp)
  0x00007f81d4d33213:   movq   $0xffffffffffffffff,(%rsp)
  0x00007f81d4d3321b:   callq  0x00007f81d489e000           ; ImmutableOopMap {rsi=Oop }
                                                            ;*synchronization entry
                                                            ; - java.lang.Object::@-1 (line 50)
                                                            ;   {runtime_call counter_overflow Runtime1 stub}
  0x00007f81d4d33220:   jmp    0x00007f81d4d331f4
  0x00007f81d4d33222:   nop
  0x00007f81d4d33223:   nop
  0x00007f81d4d33224:   mov    0x3f0(%r15),%rax
  0x00007f81d4d3322b:   movabs $0x0,%r10
  0x00007f81d4d33235:   mov    %r10,0x3f0(%r15)
  0x00007f81d4d3323c:   movabs $0x0,%r10
  0x00007f81d4d33246:   mov    %r10,0x3f8(%r15)
  0x00007f81d4d3324d:   add    $0x30,%rsp
  0x00007f81d4d33251:   pop    %rbp
  0x00007f81d4d33252:   jmpq   0x00007f81d480be80           ;   {runtime_call unwind_exception Runtime1 stub}
  0x00007f81d4d33257:   hlt    
  0x00007f81d4d33258:   hlt    
  0x00007f81d4d33259:   hlt    
  0x00007f81d4d3325a:   hlt    
  0x00007f81d4d3325b:   hlt    
  0x00007f81d4d3325c:   hlt    
  0x00007f81d4d3325d:   hlt    
  0x00007f81d4d3325e:   hlt    
  0x00007f81d4d3325f:   hlt    
[Exception Handler]
  0x00007f81d4d33260:   callq  0x00007f81d489ad00           ;   {no_reloc}
  0x00007f81d4d33265:   mov    %rsp,-0x28(%rsp)
  0x00007f81d4d3326a:   sub    $0x80,%rsp
  0x00007f81d4d33271:   mov    %rax,0x78(%rsp)
  0x00007f81d4d33276:   mov    %rcx,0x70(%rsp)
  0x00007f81d4d3327b:   mov    %rdx,0x68(%rsp)
  0x00007f81d4d33280:   mov    %rbx,0x60(%rsp)
  0x00007f81d4d33285:   mov    %rbp,0x50(%rsp)
  0x00007f81d4d3328a:   mov    %rsi,0x48(%rsp)
  0x00007f81d4d3328f:   mov    %rdi,0x40(%rsp)
  0x00007f81d4d33294:   mov    %r8,0x38(%rsp)
  0x00007f81d4d33299:   mov    %r9,0x30(%rsp)
  0x00007f81d4d3329e:   mov    %r10,0x28(%rsp)
  0x00007f81d4d332a3:   mov    %r11,0x20(%rsp)
  0x00007f81d4d332a8:   mov    %r12,0x18(%rsp)
  0x00007f81d4d332ad:   mov    %r13,0x10(%rsp)
  0x00007f81d4d332b2:   mov    %r14,0x8(%rsp)
  0x00007f81d4d332b7:   mov    %r15,(%rsp)
  0x00007f81d4d332bb:   movabs $0x7f81f15ff3e2,%rdi         ;   {external_word}
  0x00007f81d4d332c5:   movabs $0x7f81d4d33265,%rsi         ;   {internal_word}
  0x00007f81d4d332cf:   mov    %rsp,%rdx
  0x00007f81d4d332d2:   and    $0xfffffffffffffff0,%rsp
  0x00007f81d4d332d6:   callq  0x00007f81f1108240           ;   {runtime_call}
  0x00007f81d4d332db:   hlt    
[Deopt Handler Code]
  0x00007f81d4d332dc:   movabs $0x7f81d4d332dc,%r10         ;   {section_word}
  0x00007f81d4d332e6:   push   %r10
  0x00007f81d4d332e8:   jmpq   0x00007f81d47ed0a0           ;   {runtime_call DeoptimizationBlob}
  0x00007f81d4d332ed:   hlt    
  0x00007f81d4d332ee:   hlt    
  0x00007f81d4d332ef:   hlt    
--------------------------------------------------------------------------------

============================= C1-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c1)      74    2       3       java.lang.StringLatin1::hashCode (42 bytes)
 total in heap  [0x00007f81d4d33390,0x00007f81d4d338a8] = 1304
 relocation     [0x00007f81d4d334f0,0x00007f81d4d33528] = 56
 main code      [0x00007f81d4d33540,0x00007f81d4d336c0] = 384
 stub code      [0x00007f81d4d336c0,0x00007f81d4d33750] = 144
 metadata       [0x00007f81d4d33750,0x00007f81d4d33758] = 8
 scopes data    [0x00007f81d4d33758,0x00007f81d4d337c0] = 104
 scopes pcs     [0x00007f81d4d337c0,0x00007f81d4d33890] = 208
 dependencies   [0x00007f81d4d33890,0x00007f81d4d33898] = 8
 nul chk table  [0x00007f81d4d33898,0x00007f81d4d338a8] = 16

--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Verified Entry Point]
  # {method} {0x00007f81d3e6ddd0} 'hashCode' '([B)I' in 'java/lang/StringLatin1'
  # parm0:    rsi:rsi   = '[B'
  #           [sp+0x40]  (sp of caller)
  0x00007f81d4d33540:   mov    %eax,-0x14000(%rsp)

[测试结果太多,大约有14w+字,如果感兴趣,私信我获取完整的测试结果]

你可能感兴趣的:(#,Java并发编程,jvm,多线程,cpu)