volatile的两个作用

volatile的两个作用:

  1. 线程可见性
  2. 内存屏障,保证指令不重排序

volatile与线程可见性

  1. 保证写后数据马上回写到系统内存
  2. 写后通知其他CPU缓存数据过期,其他CPU读时需从内存中读取,CPU的L1/L2/L3级中此变量所在的cacheline内存全部失效

不能保证i++的原子性,即使声明了volidate
volidate int i ; i++应看成三个原子操作:
1)从内存读取i至寄存器
2)i自增1
3)写入i,回写到系统内存,通知CPU缓存过期

volatile与cacheline

volatile为了保证线程可见性,每次修改volatile变量后都需要回写内存,并通知其他CPU缓存失效,若其他CPU线程缓存了此变量同一cacheline的变量则需要再次从内存读取数据,必然会导致性能下降;如何防止这种情况呢?
cacheline大小是64Byte,若两个volatile变量在同一个cacheline中,互相之间必然会有影响。

// volatile 对cacheline的影响

#include 
#include 

using namespace std;
using namespace std::chrono; //增加引用空间

struct T
{
    // 在x的前后添加7个long类型数据占位,x一定独占一个cacheline
    // 注释掉x1~x14看看结果是什么

    volatile long x1 = 0L;
    volatile long x2 = 0L;
    volatile long x3 = 0L;
    volatile long x4 = 0L;
    volatile long x5 = 0L;
    volatile long x6 = 0L;
    volatile long x7 = 0L;

    volatile long x = 0L;

    volatile long x8 = 0L;
    volatile long x9 = 0L;
    volatile long x10 = 0L;
    volatile long x11 = 0L;
    volatile long x12 = 0L;
    volatile long x13 = 0L;
    volatile long x14 = 0L;
};

const long count = 100000000l;
static T ts[2];
int main()
{

    ts[0].x = 0L;
    ts[1].x = 0L;

    auto beg_t = system_clock::now(); //开始时间

    auto t1 = thread([]() {
        int i = 0;
        while (i < count)
        {
            ts[0].x++;
            i++;
        }
    });

    auto t2 = thread([]() {
        int i = 0;
        while (i < count)
        {
            ts[1].x++;
            i++;
        }
    });

    t1.join();
    t2.join();

    auto end_t = system_clock::now(); //结束时间
    duration<double> diff = end_t - beg_t;
    // printf("performTest total time: ");
    cout << "performTest total time:" << diff.count() << endl;
}

在O0下性能能有1倍多的提升。


在java中的影响尤其大:

import java.util.concurrent.CountDownLatch;

public class Deprecated {

    public static long COUNT = 1_0000_0000l;

    private static class T {
        // public volatile long x17 = 0L;
        // public volatile long x16 = 0L;
        // public volatile long x15 = 0L;
        // public volatile long x14 = 0L;
        // public volatile long x13 = 0L;
        // public volatile long x12 = 0L;
        // public volatile long x11 = 0L;
        public volatile long x = 0L;
        // public volatile long x1 = 0L;
        // public volatile long x2 = 0L;
        // public volatile long x3 = 0L;
        // public volatile long x4 = 0L;
        // public volatile long x5 = 0L;
        // public volatile long x6 = 0L;
        // public volatile long x7 = 0L;
    }

    public static T[] arr = new T[2];

    static {
        arr[0] = new T();
        arr[1] = new T();
    }

    public static void main(String[] args) throws Exception {

        CountDownLatch latch = new CountDownLatch(2);

        Thread t1 = new Thread(() -> {
            for (long i = 0; i < COUNT; i++) {
                arr[0].x = i;
            }

            latch.countDown();
        });

        Thread t2 = new Thread(() -> {
            for (long i = 0; i < COUNT; i++) {
                arr[1].x = i;
            }

            latch.countDown();
        });

        final long start = System.nanoTime();
        t1.start();
        t2.start();
        latch.await();

        final long end = System.nanoTime();
        System.out.println((end - start) / 100_0000);
    }
}

性能有超过100倍的提升。

volatile与指令重排序

我们先看以下代码:

// main.cpp
#include 
#include 

using namespace std;

int main()
{
    static int a = 0;
    static int b = 0;
    static int y = 0;
    static int x = 0;
    static int j = 0;

    while (true)
    {
        a = b = x = y = 0;

        auto t1 = thread([]() {
            a = 1; // L1
            x = b; // L2
        });

        auto t2 = thread([]() {
            b = 1; // L3
            y = a; // L4
        });

        t1.join();
        t2.join();

        j++;

        // 判断x与y的值
        /*
正常情况下,x与y可能的输出有哪些?

执行顺序(L1在L2前,L3在L4前):
1. L1-L2-L3-L4 : 0,1
2. L3-L4-L1-L2 : 1,0
3. L1-L3-L2-L4 : 1,1
4. L1-L3-L4-L2 : 1,1
...

总之,x与y至少有一个为1。

若x与y都为0,则说明执行顺序必须同时满足:
1. L4 在 L1 前执行,x = 0
2. L2 在 L3 前执行,y = 0

而正常的执行顺序:L1必须在L2前,L3必须在L4前

发生x=y=0的情况一定是发生了指令重排序
        */

        if (x == 0 && y == 0)
        {
            cout << "第" << j << "次发生了指令重排序" << endl;
            break;
        }
    }
}

编译:

g++ -std=c++11 main.cpp -lpthread -O2

在执行过程中确实发生了指令重排序。


java的实现:


public class barrier {

    private static int a = 0;
    private static int b = 0;
    private static int x = 0;
    private static int y = 0;

    public static void main(String[] args) throws InterruptedException{

        int i = 0;
        while (true) {
            i++;
            x = y = a = b = 0;

            Thread t1 = new Thread(new Runnable() {

                @Override
                public void run() {
                    a = 1;
                    x = b;
                }
            });

            Thread t2 = new Thread(new Runnable() {

                @Override
                public void run() {
                    b = 1;
                    y = a;
                }
            });

            t1.start();
            t2.start();
            t1.join();
            t2.join();

            if (x == 0 && y == 0) {
                System.out.println("on " + i + " occur");
                break;
            }
        }
    }
}

你可能感兴趣的:(C/C++,Java,多线程,缓存)