Azure Storm入门(一)



这里纪录一下使用Azure HDInsight Storm的过程。

1、首先,得拥有一个Azue账户。进入Azure门户,创建HDInsight storm集群。如下所示。输入相关的信息,注意存储账户也应该指定,若没有,需要创建存储账户。HDInsight默认使用blob作为文件系统。后面我们也会看到,其相关日志也是放在存储账户中。点击“创建HDInsight集群”,后台自动创建集群。整个过程大概需要15分钟。

Azure Storm入门(一)_第1张图片

2、安装下列一个版本的Visual Studio

(1)Visual Studio 2012 Update 4

(2)Visual Studio 2013 Update 4 或 Visual Studio 2013 Community

(3)Visual Studio 2015 或 Visual Studio 2015 Community


3、安装最新版的Azure SDK。官网下载链接:,注意与你的语言和开发工具对应。

安装好Azure SDK后,VS左侧的服务浏览器中应该有HDInsight了

Azure Storm入门(一)_第2张图片



Azure Storm入门(一)_第3张图片


(1)Program.cs:定义项目的拓扑。请注意,默认情况下会创建包含一个 Spout 和一个 Bolt 的默认拓扑。

(2)Spout.cs:发出随机数的示例 Spout。

(3)Bolt.cs:保留 Spout 所发出数字计数的示例 Bolt。

在创建项目过程中,将会从 NuGet 下载最新的SCP.NET 包。(倘若不是,后续过程会提示你的SCP需要更新,可以通过.NuGet管理包更新SCP包)

Azure Storm入门(一)_第4张图片



(1)NextTuple: 允许Spout发出新的Tuple时由Storm调用。大白话讲就是,这段代码由整个Storm调用,是程序的入口,而且是循环不停的调用




using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Threading;
using Microsoft.SCP;
using Microsoft.SCP.Rpc.Generated;
// 将外部源中的数据读入拓扑。
namespace StormTestWordCount
    public class Spout : ISCPSpout
        private Context ctx;
        private Random r = new Random();
        string[] sentences = new string[] {
                "the cow jumped over the moon",
                "an apple a day keeps the doctor away",
                "four score and seven years ago",
                "snow white and the seven dwarfs",
                "i am at two with nature"
        public Spout(Context ctx)
            //set the instance context
            this.ctx = ctx;
            Context.Logger.Info("StormTestWordCount, Spout constructor called");
            //Declare output schema
            Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();
            //The schema for the default output stream is
            // a tuple that contains a string field
            outputSchema.Add("default", new List<Type>() { typeof(string) });
            this.ctx.DeclareComponentSchema(new ComponentStreamSchema(null, outputSchema));
        // get an instance of the spout
        public static Spout Get(Context ctx, Dictionary<string, Object> parms)
            return new Spout(ctx);
        // 由storm调用
        public void NextTuple(Dictionary<string, Object> parms)
            Context.Logger.Info("StormTestWordCount, Spout NextTuple enter");
            System.Threading.Thread.Sleep(1000 * 60);
            // The sentence to be emitted
            string sentence;
            // get a rondom sentence
            sentence = sentences[r.Next(0, sentences.Length - 1)];
            Context.Logger.Info("StormTestWordCount, Spout Emit: {0}", sentence);
            //Emit it
            ctx.Emit(new Values(sentence));
            Context.Logger.Info("StormTestWordCount, Spout NextTuple exit");
        public void Ack(long seqId, Dictionary<string, Object> parms)
            // only used for transactional topologies
        public void Fail(long seqId, Dictionary<string, Object> parms)
            // only used for transactional topologies

8、删除项目中的现有Bolt.cs文件,在“资源管理器”中,右键-》添加-》新建项。从列表中选择“Storm Bolt”,将名字改为“Splitter.cs”。同样,创建另外一个Bolt“Counter.cs”




using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Threading;
using Microsoft.SCP;
using Microsoft.SCP.Rpc.Generated;
// 实现Bolt,以将橘子分割成不同的单词并发出一串新单词
namespace StormTestWordCount
    public class Splitter : ISCPBolt
        private Context ctx;
        public Splitter(Context ctx)
            Context.Logger.Info("StormTestWordCount, Splitter constructor called");
            this.ctx = ctx;
            // declare input and output schemas
            Dictionary<string, List<Type>> inputSchema = new Dictionary<string, List<Type>>();
            // Input contains a tuple with a string field (the sentence)
            inputSchema.Add("default", new List<Type>() { typeof(string) });
            Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();
            // output contains a tuple with a string field (the word)
            outputSchema.Add("default", new List<Type>() { typeof(string) });
            this.ctx.DeclareComponentSchema(new ComponentStreamSchema(inputSchema, outputSchema));
        // get a new instance of the bolt
        public static Splitter Get(Context ctx, Dictionary<string, Object> parms)
            return new Splitter(ctx);
        // 在Bolt收到要处理的Tuple时将调用此方法。此时,你可以读取和处理Tuple,以及发出传出Tuple
        // Called whe a new tuple is available
        public void Execute(SCPTuple tuple)
            Context.Logger.Info("StormTestWordCount, Splitter Excute enter");
            // get the sentence from the tuple
            string sentence = tuple.GetString(0);
            // split at space characters
            foreach (var word in sentence.Split(' '))
                Context.Logger.Info("StormTestWordCount, Splitter Emit: {0}", word);
                //Emit each word
                this.ctx.Emit(new Values(word));
            Context.Logger.Info("StormTestWordCount, Splitter Execute exit");

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Threading;
using Microsoft.SCP;
using Microsoft.SCP.Rpc.Generated;
// 实现Bolt,以统计每个单词的数目,并发出一串新单词和每个单词的计数
namespace StormTestWordCount
    public class Counter : ISCPBolt
        private Context ctx;
        // Dictionary for holding words and counts
        private Dictionary<string, int> counts = new Dictionary<string, int>();
        // Constructor
        public Counter(Context ctx)
            Context.Logger.Info("StormTestWordCount, Counter constructor called");
            // set instance context
            this.ctx = ctx;
            // Declare input and output schemas
            Dictionary<string, List<Type>> inputSchema = new Dictionary<string, List<Type>>();
            // a tuple containing a string field - the word
            inputSchema.Add("default", new List<Type>() { typeof(string) });
            Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();
            // a tuple containing a string and integer field - the word and the word count
            outputSchema.Add("default", new List<Type>() { typeof(string), typeof(int) });
            this.ctx.DeclareComponentSchema(new ComponentStreamSchema(inputSchema, outputSchema));
        // get a new instance
        public static Counter Get(Context ctx, Dictionary<string, Object> parms)
            return new Counter(ctx);
        // called when a new tuple is availableContext.Logger.Info("StormTestWordCount, 
        public void Execute(SCPTuple tuple)
            Context.Logger.Info("StormTestWordCount, Counter Execute enter");
            //get the word form the tuple
            string word = tuple.GetString(0);
            // calculate the count
            int count = counts.ContainsKey(word) ? counts[word] : 0;
            counts[word] = count;
            Context.Logger.Info("StormTestWordCount, Counter Emit: {0}, count: {1}", word, count);
            // Emit the word and count information
            this.ctx.Emit(Constants.DEFAULT_STREAM_ID, new List<SCPTuple> { tuple }, new Values(word, count));
            Context.Logger.Info("StormTestWordCount, Counter Execute exit");


Azure Storm入门(一)_第5张图片

句子从 Spout 发出,并分布到 Splitter Bolt 的实例。Splitter Bolt 将句子分割成多个单词,并将这些单词分布到 Counter Bolt。

因为字数会本地保留在 Counter 实例中,所以我们想要确保特定单词流向相同的 Counter Bolt 实例,因此只能有一个实例跟踪特定单词(即相同的单词计数保存在同一个Counter Bolt中)。但是,针对 Splitter Bolt,哪个 Bolt 收到哪个句子并不重要,因此,我们只想要将句子负载平衡到那些实例。

打开 Program.cs。重要的方法是 ITopologyBuilder,它用于定义提交到 Storm 的拓扑。将 ITopologyBuilder 的内容替换为以下代码,以实现上面所述的拓扑。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.SCP;
using Microsoft.SCP.Topology;
namespace StormTestWordCount
    class Program : TopologyDescriptor
        static void Main(string[] args)
        public ITopologyBuilder GetTopologyBuilder()
            // Create a new topology
            TopologyBuilder topologyBuilder = new TopologyBuilder("StormTestWordCount" + DateTime.Now.ToString("yyyyMMddHHmmss"));
            //Context.Logger.Info("StormTestWordCount, Hello Strom!!");
            // add the spout to the topology.
            // name the component 'sentences'
            // name the field that is emitted as 'sentence'
                new Dictionary<string, List<string>>()
                    {Constants.DEFAULT_STREAM_ID, new List<string>(){"sentence"}}
            // add the splitter bolt to the topology.
            // name the component 'splitter'
            // name the field that is emitted 'word'
            // use sufflegrouping to distribute incoming tuples from the 'sentences' spout across instances of the splitter
                new Dictionary<string, List<string>>()
                    { Constants.DEFAULT_STREAM_ID, new List<string>() { "word"} }
                1).shuffleGrouping("sentences");    //猜想,表示输入从sentences中来,负载均衡即可
            // add the counter bolt to the topology.
            // Use fieldGrouping to ensure that tuples are routed 
            //  to counter instances based on the contents of field position 0 (the word). 
            //  This could also have been List<string>(){"word"}
                new Dictionary<string, List<string>>()
                    { Constants.DEFAULT_STREAM_ID, new List<string>() { "word", "count"} }
                1).fieldsGrouping("splitter", new List<int>() { 0 });    //猜想,表示输入从splitter来,同一个word分发到相同的Counter中
            // add topology config
            topologyBuilder.SetTopologyConfig(new Dictionary<string, string>()
                {"topology.kryo.register", "[\"[B\"]" }                    //设置配置文件。详情请见:
            return topologyBuilder;


Azure Storm入门(一)_第6张图片



Azure Storm入门(一)_第7张图片

Azure Storm入门(一)_第8张图片

15、如何查看运行情况呢?通过日志系统。在创建工程中,我们指定了一个存储账号。我们可以在存储账号中查看代码中Context.Logger.Info写入的信息。在VS中的服务浏览器中,选择HDInsight->storm集群名-》Hasoop Service Log-》双击下边的Table,在右侧会显示查询结构。由于Storm的其他的代码也会产生日志信息,因此需要通过过滤条件将我们的程序产生的log打印出来。结果如下所示:



转载请注明: 康瑞部落  »  Azure Storm入门(一)
