二,使用Storm实现实时大数据分析实例:用storm来监测车辆速度是否超过80km/h



Bolt的实现

Spout的输出结果将给予Bolt进行更深一步的处理。经过对用例的思考,我们的topology中需要如Figure 3中的两个Bolt。

Figure 3:Spout到Bolt的数据流程。


ThresholdCalculatorBolt

Spout将tuple发出,由ThresholdCalculatorBolt接收并进行临界值处理。在这里,它将接收好几项输入进行检查;分别是:

临界值检查

  • 临界值栏数检查(拆分成字段的数目)

  • 临界值数据类型(拆分后字段的类型)

  • 临界值出现的频数

  • 临界值时间段检查

Listing Four中的类,定义用来保存这些值。

Listing Four:ThresholdInfo类

 

  1. public class ThresholdInfo implementsSerializable    

  2.    

  3. {      

  4.         private String action;     

  5.         private String rule;     

  6.         private Object thresholdValue;    

  7.         private int thresholdColNumber;     

  8.         private Integer timeWindow;     

  9.         private int frequencyOfOccurence;     

  10. }     

基于字段中提供的值,临界值检查将被Listing Five中的execute()方法执行。代码大部分的功能是解析和接收值的检测。

 

Listing Five:临界值检测代码段

 

  1. public void execute(Tuple tuple, BasicOutputCollector collector)     

  2. {    

  3.     if(tuple!=null)     

  4.     {    

  5.         List inputTupleList = (List) tuple.getValues();    

  6.         int thresholdColNum = thresholdInfo.getThresholdColNumber();     

  7.         Object thresholdValue = thresholdInfo.getThresholdValue();     

  8.         String thresholdDataType = tupleInfo.getFieldList().get(thresholdColNum-1).getColumnType();     

  9.         Integer timeWindow = thresholdInfo.getTimeWindow();    

  10.          int frequency = thresholdInfo.getFrequencyOfOccurence();    

  11.          if(thresholdDataType.equalsIgnoreCase("string"))    

  12.          {    

  13.              String valueToCheck = inputTupleList.get(thresholdColNum-1).toString();    

  14.              String frequencyChkOp = thresholdInfo.getAction();    

  15.              if(timeWindow!=null)    

  16.              {    

  17.                  long curTime = System.currentTimeMillis();    

  18.                  long diffInMinutes = (curTime-startTime)/(1000);    

  19.                  if(diffInMinutes>=timeWindow)    

  20.                  {    

  21.                      if(frequencyChkOp.equals("=="))    

  22.                      {    

  23.                           if(valueToCheck.equalsIgnoreCase(thresholdValue.toString()))    

  24.                           {    

  25.                               count.incrementAndGet();    

  26.                               if(count.get() > frequency)    

  27.                                   splitAndEmit(inputTupleList,collector);    

  28.                           }    

  29.                      }    

  30.                      else if(frequencyChkOp.equals("!="))    

  31.                      {    

  32.                          if(!valueToCheck.equalsIgnoreCase(thresholdValue.toString()))    

  33.                          {    

  34.                               count.incrementAndGet();    

  35.                               if(count.get() > frequency)    

  36.                                   splitAndEmit(inputTupleList,collector);    

  37.                           }    

  38.                       }    

  39.                       else                         System.out.println("Operator not supported");     

  40.                   }    

  41.               }    

  42.               else   

  43.               {    

  44.                   if(frequencyChkOp.equals("=="))    

  45.                   {    

  46.                       if(valueToCheck.equalsIgnoreCase(thresholdValue.toString()))    

  47.                       {    

  48.                           count.incrementAndGet();    

  49.                           if(count.get() > frequency)    

  50.                               splitAndEmit(inputTupleList,collector);    

  51.                           }    

  52.                   }    

  53.                   else if(frequencyChkOp.equals("!="))    

  54.                   {    

  55.                        if(!valueToCheck.equalsIgnoreCase(thresholdValue.toString()))    

  56.                        {    

  57.                            count.incrementAndGet();    

  58.                            if(count.get() > frequency)    

  59.                                splitAndEmit(inputTupleList,collector);    

  60.                           }    

  61.                    }    

  62.                }    

  63.             }    

  64.             else if(thresholdDataType.equalsIgnoreCase("int") ||                     thresholdDataType.equalsIgnoreCase("double") ||                     thresholdDataType.equalsIgnoreCase("float") ||                     thresholdDataType.equalsIgnoreCase("long") ||                     thresholdDataType.equalsIgnoreCase("short"))    

  65.             {    

  66.                 String frequencyChkOp = thresholdInfo.getAction();    

  67.                 if(timeWindow!=null)    

  68.                 {    

  69.                      long valueToCheck =                          Long.parseLong(inputTupleList.get(thresholdColNum-1).toString());    

  70.                      long curTime = System.currentTimeMillis();    

  71.                      long diffInMinutes = (curTime-startTime)/(1000);    

  72.                      System.out.println("Difference in minutes="+diffInMinutes);    

  73.                      if(diffInMinutes>=timeWindow)    

  74.                      {    

  75.                           if(frequencyChkOp.equals("<"))    

  76.                           {    

  77.                               if(valueToCheck < Double.parseDouble(thresholdValue.toString()))    

  78.                               {    

  79.                                    count.incrementAndGet();    

  80.                                    if(count.get() > frequency)    

  81.                                        splitAndEmit(inputTupleList,collector);    

  82.                               }    

  83.                           }    

  84.                           else if(frequencyChkOp.equals(">"))    

  85.                           {    

  86.                                if(valueToCheck > Double.parseDouble(thresholdValue.toString()))    

  87.                                 {    

  88.                                    count.incrementAndGet();    

  89.                                    if(count.get() > frequency)    

  90.                                        splitAndEmit(inputTupleList,collector);    

  91.                                }    

  92.                            }    

  93.                            else if(frequencyChkOp.equals("=="))    

  94.                            {    

  95.                               if(valueToCheck == Double.parseDouble(thresholdValue.toString()))    

  96.                               {    

  97.                                   count.incrementAndGet();    

  98.                                   if(count.get() > frequency)    

  99.                                       splitAndEmit(inputTupleList,collector);    

  100.                                }    

  101.                            }    

  102.                            else if(frequencyChkOp.equals("!="))    

  103.                            {    

  104.     . . .    

  105.                             }    

  106.                        }    

  107.              }    

  108.       else   

  109.           splitAndEmit(null,collector);    

  110.       }    

  111.       else   

  112.      {    

  113.            System.err.println("Emitting null in bolt");    

  114.            splitAndEmit(null,collector);    

  115.     }    

  116. }   



 

经由Bolt发送的的tuple将会传递到下一个对应的Bolt,在我们的用例中是DBWriterBolt。

DBWriterBolt

经过处理的tuple必须被持久化以便于触发tigger或者更深层次的使用。DBWiterBolt做了这个持久化的工作并把tuple存入了数据库。表的建立由prepare()函数完成,这也将是topology调用的第一个方法。方法的编码如Listing Six所示。

Listing Six:建表编码。

 

  1. public void prepare( Map StormConf, TopologyContext context )     

  2. {           

  3.     try     

  4.     {    

  5.         Class.forName(dbClass);    

  6.     }     

  7.     catch (ClassNotFoundException e)     

  8.     {    

  9.         System.out.println("Driver not found");    

  10.         e.printStackTrace();    

  11.     }    

  12.      

  13.     try     

  14.     {    

  15.        connection driverManager.getConnection(     

  16.            "jdbc:mysql://"+databaseIP+":"+databasePort+"/"+databaseName, userName, pwd);    

  17.        connection.prepareStatement("DROP TABLE IF EXISTS "+tableName).execute();    

  18.      

  19.        StringBuilder createQuery = new StringBuilder(    

  20.            "CREATE TABLE IF NOT EXISTS "+tableName+"(");    

  21.        for(Field fields : tupleInfo.getFieldList())    

  22.        {    

  23.            if(fields.getColumnType().equalsIgnoreCase("String"))    

  24.                createQuery.append(fields.getColumnName()+" VARCHAR(500),");    

  25.            else   

  26.                createQuery.append(fields.getColumnName()+" "+fields.getColumnType()+",");    

  27.        }    

  28.        createQuery.append("thresholdTimeStamp timestamp)");    

  29.        connection.prepareStatement(createQuery.toString()).execute();    

  30.      

  31.        // Insert Query    

  32.        StringBuilder insertQuery = new StringBuilder("INSERT INTO "+tableName+"(");    

  33.        String tempCreateQuery = new String();    

  34.        for(Field fields : tupleInfo.getFieldList())    

  35.        {    

  36.             insertQuery.append(fields.getColumnName()+",");    

  37.        }    

  38.        insertQuery.append("thresholdTimeStamp").append(") values (");    

  39.        for(Field fields : tupleInfo.getFieldList())    

  40.        {    

  41.            insertQuery.append("?,");    

  42.        }    

  43.      

  44.        insertQuery.append("?)");    

  45.        prepStatement = connection.prepareStatement(insertQuery.toString());    

  46.     }    

  47.     catch (SQLException e)     

  48.     {           

  49.         e.printStackTrace();    

  50.     }           

  51. }    



 

数据分批次的插入数据库。插入的逻辑由Listting Seven中的execute()方法提供。大部分的编码都是用来实现可能存在不同类型输入的解析。

Listing Seven:数据插入的代码部分。

 

  1. public void execute(Tuple tuple, BasicOutputCollector collector)     

  2. {    

  3.     batchExecuted=false;    

  4.     if(tuple!=null)    

  5.     {    

  6.        List inputTupleList = (List) tuple.getValues();    

  7.        int dbIndex=0;    

  8.        for(int i=0;i

  9.        {    

  10.            Field field = tupleInfo.getFieldList().get(i);    

  11.            try {    

  12.                dbIndex = i+1;    

  13.                if(field.getColumnType().equalsIgnoreCase("String"))                 

  14.                    prepStatement.setString(dbIndex, inputTupleList.get(i).toString());    

  15.                else if(field.getColumnType().equalsIgnoreCase("int"))    

  16.                    prepStatement.setInt(dbIndex,    

  17.                        Integer.parseInt(inputTupleList.get(i).toString()));    

  18.                else if(field.getColumnType().equalsIgnoreCase("long"))    

  19.                    prepStatement.setLong(dbIndex,     

  20.                        Long.parseLong(inputTupleList.get(i).toString()));    

  21.                else if(field.getColumnType().equalsIgnoreCase("float"))    

  22.                    prepStatement.setFloat(dbIndex,     

  23.                        Float.parseFloat(inputTupleList.get(i).toString()));    

  24.                else if(field.getColumnType().equalsIgnoreCase("double"))    

  25.                    prepStatement.setDouble(dbIndex,     

  26.                        Double.parseDouble(inputTupleList.get(i).toString()));    

  27.                else if(field.getColumnType().equalsIgnoreCase("short"))    

  28.                    prepStatement.setShort(dbIndex,     

  29.                        Short.parseShort(inputTupleList.get(i).toString()));    

  30.                else if(field.getColumnType().equalsIgnoreCase("boolean"))    

  31.                    prepStatement.setBoolean(dbIndex,     

  32.                        Boolean.parseBoolean(inputTupleList.get(i).toString()));    

  33.                else if(field.getColumnType().equalsIgnoreCase("byte"))    

  34.                    prepStatement.setByte(dbIndex,     

  35.                        Byte.parseByte(inputTupleList.get(i).toString()));    

  36.                else if(field.getColumnType().equalsIgnoreCase("Date"))    

  37.                {    

  38.                   Date dateToAdd=null;    

  39.                   if (!(inputTupleList.get(i) instanceof Date))      

  40.                   {      

  41.                        DateFormat df = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");    

  42.                        try     

  43.                        {    

  44.                            dateToAdd = df.parse(inputTupleList.get(i).toString());    

  45.                        }    

  46.                        catch (ParseException e)     

  47.                        {    

  48.                            System.err.println("Data type not valid");    

  49.                        }    

  50.                    }      

  51.                    else   

  52.                    {    

  53.             dateToAdd = (Date)inputTupleList.get(i);    

  54.             java.sql.Date sqlDate = new java.sql.Date(dateToAdd.getTime());    

  55.             prepStatement.setDate(dbIndex, sqlDate);    

  56.             }       

  57.             }     

  58.         catch (SQLException e)     

  59.         {    

  60.              e.printStackTrace();    

  61.         }    

  62.     }    

  63.     Date now = new Date();              

  64.     try   

  65.     {    

  66.         prepStatement.setTimestamp(dbIndex+1, new java.sql.Timestamp(now.getTime()));    

  67.         prepStatement.addBatch();    

  68.         counter.incrementAndGet();    

  69.         if (counter.get()== batchSize)     

  70.         executeBatch();    

  71.     }     

  72.     catch (SQLException e1)     

  73.     {    

  74.         e1.printStackTrace();    

  75.     }               

  76.    }    

  77.    else   

  78.    {    

  79.         long curTime = System.currentTimeMillis();    

  80.        long diffInSeconds = (curTime-startTime)/(60*1000);    

  81.        if(counter.get()batchTimeWindowInSeconds)    

  82.        {    

  83.             try {    

  84.                 executeBatch();    

  85.                 startTime = System.currentTimeMillis();    

  86.             }    

  87.             catch (SQLException e) {    

  88.                  e.printStackTrace();    

  89.             }    

  90.        }    

  91.    }    

  92. }    

  93.      

  94. public void executeBatch() throws SQLException    

  95. {    

  96.     batchExecuted=true;    

  97.     prepStatement.executeBatch();    

  98.     counter = new AtomicInteger(0);    

  99. }   



 

一旦Spout和Bolt准备就绪(等待被执行),topology生成器将会建立topology并准备执行。下面就来看一下执行步骤。

在本地集群上运行和测试topology

  • 通过TopologyBuilder建立topology。

  • 使用Storm Submitter,将topology递交给集群。以topology的名字、配置和topology的对象作为参数。

  • 提交topology。

Listing Eight:建立和执行topology。

 

  1. public class StormMain    

  2. {    

  3.      public static void main(String[] args) throws AlreadyAliveException,     

  4.                                                    InvalidTopologyException,     

  5.                                                    InterruptedException     

  6.      {    

  7.           ParallelFileSpout parallelFileSpout = new ParallelFileSpout();    

  8.           ThresholdBolt thresholdBolt = new ThresholdBolt();    

  9.           DBWriterBolt dbWriterBolt = new DBWriterBolt();    

  10.           TopologyBuilder builder = new TopologyBuilder();    

  11.           builder.setSpout("spout", parallelFileSpout, 1);    

  12.           builder.setBolt("thresholdBolt", thresholdBolt,1).shuffleGrouping("spout");    

  13.           builder.setBolt("dbWriterBolt",dbWriterBolt,1).shuffleGrouping("thresholdBolt");    

  14.           if(this.argsMain!=null && this.argsMain.length > 0)     

  15.           {    

  16.               conf.setNumWorkers(1);    

  17.               StormSubmitter.submitTopology(     

  18.                    this.argsMain[0], conf, builder.createTopology());    

  19.           }    

  20.           else   

  21.           {        

  22.               Config conf = new Config();    

  23.               conf.setDebug(true);    

  24.               conf.setMaxTaskParallelism(3);    

  25.               LocalCluster cluster = new LocalCluster();    

  26.               cluster.submitTopology(    

  27.               "Threshold_Test", conf, builder.createTopology());    

  28.           }    

  29.      }    

  30. }   

 

topology被建立后将被提交到本地集群。一旦topology被提交,除非被取缔或者集群关闭,它将一直保持运行不需要做任何的修改。这也是Storm的另一大特色之一。

这个简单的例子体现了当你掌握了topology、spout和bolt的概念,将可以轻松的使用Storm进行实时处理。如果你既想处理大数据又不想遍历Hadoop的话,不难发现使用Storm将是个很好的选择。


根据本文,我们基于storm开发了深圳市实时交通路况系统,源码已经在github上开源:

https://github.com/whughchen/RealTimeTraffic

欢迎关注并给出改进意见~

-------------------------------------------------

相关博文:

storm实战:深圳市实时路况分析和实时路径推荐系统

你可能感兴趣的:(二,使用Storm实现实时大数据分析实例:用storm来监测车辆速度是否超过80km/h)