per_cpu变量用法

per_cpu的原理就是一个变量结构在所有CPU cache上都存一份,这样每次读写就可以避免锁开销,上下文切换和cache miss等一系列问题,一般来说,最好把per_cpu变量声明为CPU cache对齐的,e.g.

struct percpu_stat {
  uint64 a;
  uint64 b;
} ____cacheline_aligned;


一种是全局栈类型的per_cpu变量,我摘抄了相应内核代码如下

DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };

__get_cpu_var(netdev_rx_stat).received_rps++;
__get_cpu_var(netdev_rx_stat).total++;
__get_cpu_var(netdev_rx_stat).dropped++;
__get_cpu_var(netdev_rx_stat).time_squeeze++;

static struct netif_rx_stats *softnet_get_online(loff_t *pos)
{
    struct netif_rx_stats *rc = NULL;

    while (*pos < nr_cpu_ids)
        if (cpu_online(*pos)) {
            rc = &per_cpu(netdev_rx_stat, *pos);
            break;
        } else
            ++*pos;
    return rc;
}
可以看出,DEFINE_PER_CPU定义的per_cpu变量,一般都通过__get_cpu_var(var)来访问,或者通过per_cpu(var, cpu)宏来访问,var代表类型,cpu代表CPU index,

另一种是分配出来的per_cpu变量,我摘抄了openvswitch相应内核代码如下

vport->percpu_stats = alloc_percpu(struct vport_percpu_stats);

free_percpu(vport->percpu_stats);

struct vport {
    struct rcu_head rcu;
    u16 port_no;
    struct datapath *dp;
    struct list_head node;
    u32 upcall_pid;

    struct hlist_node hash_node;
    const struct vport_ops *ops;

    struct vport_percpu_stats __percpu *percpu_stats;

    spinlock_t stats_lock;
    struct vport_err_stats err_stats;
};

for_each_possible_cpu(i) {
    const struct vport_percpu_stats *percpu_stats;
    struct vport_percpu_stats local_stats;
    unsigned int start;

    percpu_stats = per_cpu_ptr(vport->percpu_stats, i);

    do {
        start = u64_stats_fetch_begin_bh(&percpu_stats->sync);
        local_stats = *percpu_stats;
    } while (u64_stats_fetch_retry_bh(&percpu_stats->sync, start));

    stats->rx_bytes     += local_stats.rx_bytes;
    stats->rx_packets   += local_stats.rx_packets;
    stats->tx_bytes     += local_stats.tx_bytes;
    stats->tx_packets   += local_stats.tx_packets;
}

void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
{
    struct vport_percpu_stats *stats;

    stats = per_cpu_ptr(vport->percpu_stats, smp_processor_id());

    u64_stats_update_begin(&stats->sync);
    stats->rx_packets++;
    stats->rx_bytes += skb->len;
    u64_stats_update_end(&stats->sync);

    ovs_dp_process_received_packet(vport, skb);
}
这种类型的per_cpu变量需要动态通过alloc_percpu来创建,通过free_percpu来释放,如果要引用到该per_cpu变量,需要通过per_cpu_ptr来获取该变量指针

需要注意的是,per_cpu_ptr读写的只是local CPU的一份per_cpu变量数据,如果需要所有CPU上的总和,则需要通过遍历所有smp cpu相加得到





你可能感兴趣的:(Linux,Linux内核)