C# 并行编程 之 PLINQ 执行MapReduce算法

基本信息

MapReduce(映射和规约)也称 Map/Reduce 或 Map & Reduce,充分运用并行方式处理大数据集。基本思想是将数据处理问题分解为两个独立的且可并行执行的操作: Map 和 Reduce。

Map:对数据源进行操作,为每个数据项计算出一个键值,运行的结果是一个键-值对的集合,并且根据键分组。
Reduce:对Map产生的键-值对进行操作,对每个组进行规约操作,返回结果值(一个或多个)。

C# 并行编程 之 PLINQ 执行MapReduce算法_第1张图片

程序示例:
一段文本,统计其中单词的个数,数据量很少,只是演示PLINQ关于 MapReduce的执行方式。

第一步:建立了一个 单词 和 数字1 组成的键-值对。这里使用的是ILookup<>接口。
详细参考:https://msdn.microsoft.com/zh-cn/library/bb534291(v=VS.100).aspx

第二步:对键值对进行分组 select操作,并且选择出Count大于1的word。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Sample6_7_plinq_mapreduce
{
    class Program
    {
        public static string strTarget = @"English is a West Germanic language that was first spoken in 
early medieval England and is now a global lingua franca. 
It is an official language of almost 60 sovereign states, the most 
commonly spoken language in the United Kingdom, the United States, 
Canada, Australia, Ireland, and New Zealand, and a widely spoken 
language in countries in the Caribbean, Africa, and southeast Asia.
It is the third most common native language in the world, after Mandarin and Spanish.
It is widely learned as a second language and is an official language
of the United Nations, of the European Union, and of many other 
world and regional international organisations.";

        static void Main(string[] args)
        {
            string[] words = strTarget.Split(' ');

            ILookup map = words.AsParallel().ToLookup(p => p, k => 1);

            var reduce = from IGrouping wordMap
                         in map.AsParallel()
                         where wordMap.Count() > 1
                         select new { Word = wordMap.Key, Count = wordMap.Count() };

            foreach (var word in reduce)
                Console.WriteLine("Word: '{0}' : Count: {1}", word.Word, word.Count);

            Console.ReadLine();
        }
    }
}

你可能感兴趣的:(C#)