每日C++小程序小研究·1·2023.7.21

 

        今日:c++线程并行,获取硬件可支持的最大数,进行分解任务执行,然后合并;        

        通过std::thread::hardware_concurrency()来获取可真正并发的线程数量,硬件信息无法获取时返回0。当用多线程分解任务时,该值是有用的指标。

        以下是并行版accumulate的简易实现,根据硬件线程数计算实际需要运算的线程数,随后将任务分解到各个线程处理,最后汇总得到结果。

// 每个线程运行的子任务
template 
struct accumulate_block {
    void operator()(Iterator first, Iterator last, T& result) {
        result = std::accumulate(first, last, result);
    }
};

template 
T parallel_accumulate(Iterator first, Iterator last, T init) {
    unsigned long const length = std::distance(first, last);//两个迭代器之间的距离
    if (!length) return init;
    // 每个线程至少处理25个元素
    unsigned long const min_per_thread = 25;
    unsigned long const max_threads =
        (length + min_per_thread - 1) / min_per_thread;//ps:获取需要几个25的小公式:f(x)=(x+k-1)/k;取int
    unsigned long const hardware_threads = std::thread::hardware_concurrency();
    // 无法获取硬件线程数时设置为2
    unsigned long const num_threads =
        std::min(hardware_threads != 0 ? hardware_threads : 2, max_threads);
    unsigned long const block_size = length / num_threads;
    
    std::vector results(num_threads);
    // 创建n-1个线程,因为本线程也进行运算任务
    std::vector threads(num_threads - 1);

    Iterator block_start = first;
    //初始化对应数量的线程,绑定相关函数和入参
    for (unsigned long i = 0; i < (num_threads - 1); ++i) {
        Iterator block_end = block_start;
        std::advance(block_end, block_size);
        threads[i] = std::thread(accumulate_block(), block_start,
                                 block_end, std::ref(results[i]));
        block_start = block_end;
    }
    //最后一组计算,不在线程中
    accumulate_block()(block_start, last,
                                    results[num_threads - 1]);
    //循环执行每个线程
    std::for_each(threads.begin(), threads.end(),
                  std::mem_fn(&std::thread::join));
    //汇总所有计算结果之和
    return std::accumulate(results.begin(), results.end(), init);
}

你可能感兴趣的:(每日c++小分析,c++,小程序,开发语言)