druid 小测试

单机版安装记录

1,下载并解压
tar -xzvf ruid-0.12.1-bin.tar.gz

2,安装好zk
过程略

3,配置Druid

# vi conf-quickstart/druid/_common/common.runtime.properties
---

# 配置zookeeper连接,如果zookeeper端口是2181可以不写端口号,多个zookeeper使用英文逗号分隔
druid.zk.service.host=cdh-01-11:2181,cdh-01-12:2181,cdh-01-13:2181
# 配置druid在zookeeper的存储路径
druid.zk.paths.base=/druid

4,初始化druid
首先进入到Druid的根目录,执行bin/init。Druid会自动创建一个var目录,内含两个目录。
一个是druid,用于存放本地环境下Hadoop的临时文件、缓存和任务的临时文件等。另一个是tmp用于存放其他临时文件。

5,启动Druid

启动Druid脚本不分先后顺序,可以以任何顺序启动Druid各节点

快速启动druid 方式(即快速模式)
// 启动历史数据节点
java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical &

// 启动查询路由聚合节点
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker  &

// 启动分片管理发布节点
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator   &

// 启动任务分配节点
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord   &

// 启动任务执行节点
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager   &


正常启动druid方式(即分布式模式)此模式需要比较多的内存

// 启动历史数据节点
java `cat conf/druid/historical/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/historical:lib/*" io.druid.cli.Main server historical &

// 启动查询路由聚合节点
java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" io.druid.cli.Main server broker  &

// 启动分片管理发布节点
java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator   &

// 启动任务分配节点
java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" io.druid.cli.Main server overlord   &

// 启动任务执行节点
java `cat conf/druid/middleManager/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager   &


6,web 查看
druid  调度控制台网址:http://cdh-01-12:8090/console.html

#druid  调度控制台网址:http://cdh-01-13:8090/console.html

druid  查询任务监测网址:http://cdh-01-12:8081/#/datasources/wikiticker

#druid  查询任务监测网址:http://cdh-01-13:8081/#/datasources/wikiticker
7,加载并查询测试数据
加载数据请求
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json http://localhost:8090/druid/indexer/v1/task


curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json http://cdh-01-13:8090/druid/indexer/v1/task

查询请求测试
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2?pretty


Http方式查询

Druid支持通过Http方式查询Druid数据,查询参数必须是标准的JSON格式,并且Header必须是Content-Type: application/json,否则无法正常查询数据

http://localhost:8082/druid/v2/?pretty


8  DataSource结构
与传统关系型数据库相比,Druid的DataSource可以算是table,DataSource的结构包括以下几个方面

时间列:表明每行数据的时间值,默认只用UTC时间格式且精确到毫秒。这个列是数据聚合与范围查询的重要维度
维度列:用来表示数据行的各个类别信息
指标列:用于聚合和计算的列


  Segment结构
DataSource是一个逻辑概念,Segment确实数据的实际物理存储格式,segment 的组织方式是通过时间戳跟粒度来定义的.Druid通granularitySpec过Segment实现了对数据的横纵切割操作,
从数据按时间分布的角度来看,通过参数segmentGranularity设置,Druid将不同时间范围内的数据存储在不同的Segment数据块中,这边是所谓的数据横向切割。
带来的优点:按时间范围查询数据时,仅需访问对应时间段内的这些Segment数据块,而不需要进项全表数据查询。下图是Segment按时间范围存储的结构;同时在Segment中也面向列进行
数据压缩存储,这就是数据纵向切割;(Segment中使用Bitmap等技术对数据的访问进行了优化,没有详细了解bitmap)

9  数据格式定义
{
  "dataSchema" : {...},       #JSON对象,指明数据源格式、数据解析、维度等信息
  "ioConfig" : {...},         #JSON对象,指明数据如何在Druid中存储
  "tuningConfig" : {...}      #JSON对象,指明存储优化配置(非必填)
}

 DataSchema详解
{
 "datasource":"...",            #string类型,数据源名称
 "parser": {...},               #JSON对象,包含如何解析数据的相关内容
 "metricsSpec": [...],          #list 包含了所有的指标信息
 "granularitySpec": {...}       #JSON对象,指明数据的存储和查询力度
}


"granularitySpec" : {
  "type" : "uniform",                          //type : 用来指定粒度的类型使用 uniform, arbitrary(尝试创建大小相等的段).
  "segmentGranularity" : "day",                //segmentGranularity : 用来确定每个segment包含的时间戳范围
  "queryGranularity" : "none",               //控制注入数据的粒度。 最小的queryGranularity 是 millisecond(毫秒级)
  "rollup" : false,
  "intervals" : ["2018-01-09/2018-01-13"]    //intervals : 用来确定总的要获取的文件时间戳的范围
}

总结:

应该怎样调用数据集?这是“dataSchema”的“dataSource”字段。
数据集位于何处?文件路径属于“inputSpec”的“路径”。如果要加载多个文件,可以将它们作为逗号分隔的字符串提供。
哪个字段应该被视为时间戳?这属于“timestampSpec”的“列”。
应将哪些字段视为维度?这属于“dimensionsSpec”的“维度”。
应将哪些字段视为指标?这属于“metricsSpec”。
正在加载什么时间范围(间隔)?这属于“intervals”的“间隔”。

10  

官方文档,数据部分样例
{"time":"2015-09-12T00:46:58.771Z","channel":"#en.wikipedia","cityName":null,"comment":"added project","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Talk","page":"Talk:Oswald Tilghman","regionIsoCode":null,"regionName":null,"user":"GELongstreet","delta":36,"added":36,"deleted":0}
{"time":"2015-09-12T00:47:00.496Z","channel":"#ca.wikipedia","cityName":null,"comment":"Robot inserta {{Commonscat}} que enllaça amb [[commons:category:Rallicula]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":true,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Rallicula","regionIsoCode":null,"regionName":null,"user":"PereBot","delta":17,"added":17,"deleted":0}
{"time":"2015-09-12T00:47:05.474Z","channel":"#en.wikipedia","cityName":"Auburn","comment":"/* Status of peremptory norms under international law */ fixed spelling of 'Wimbledon'","countryIsoCode":"AU","countryName":"Australia","isAnonymous":true,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Peremptory norm","regionIsoCode":"NSW","regionName":"New South Wales","user":"60.225.66.142","delta":0,"added":0,"deleted":0}
{"time":"2015-09-12T00:47:08.770Z","channel":"#vi.wikipedia","cityName":null,"comment":"fix Lỗi CS1: ngày tháng","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":true,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Apamea abruzzorum","regionIsoCode":null,"regionName":null,"user":"Cheers!-bot","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:11.862Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Atractus flammigerus","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:13.987Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Agama mossambica","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:17.009Z","channel":"#ca.wikipedia","cityName":null,"comment":"/* Imperi Austrohongarès */","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Campanya dels Balcans (1914-1918)","regionIsoCode":null,"regionName":null,"user":"Jaumellecha","delta":-20,"added":0,"deleted":20}
{"time":"2015-09-12T00:47:19.591Z","channel":"#en.wikipedia","cityName":null,"comment":"adding comment on notability and possible COI","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":true,"isRobot":false,"isUnpatrolled":true,"metroCode":null,"namespace":"Talk","page":"Talk:Dani Ploeger","regionIsoCode":null,"regionName":null,"user":"New Media Theorist","delta":345,"added":345,"deleted":0}
{"time":"2015-09-12T00:47:21.578Z","channel":"#en.wikipedia","cityName":null,"comment":"Copying assessment table to wiki","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"User","page":"User:WP 1.0 bot/Tables/Project/Pubs","regionIsoCode":null,"regionName":null,"user":"WP 1.0 bot","delta":121,"added":121,"deleted":0}
{"time":"2015-09-12T00:47:25.821Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Agama persimilis","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}

 官方文档:批量加载文件例子
  wikiticker-index.json 
  
{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "quickstart/wikiticker-2015-09-12-sampled.json.gz"
      }
    },
    "dataSchema" : {
      "dataSource" : "wikiticker",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",        #存储力度
        "queryGranularity" : "none",         #最小查询力度
        "intervals" : ["2015-09-12/2015-09-13"]       #摄取数据的时间段,可以有多个值,可选
      },
      "parser" : {
        "type" : "hadoopyString",            #数据类型
        "parseSpec" : {                      #json对象
          "format" : "json",
          "dimensionsSpec" : {               
            "dimensions" : [                 #维度设置
              "channel",
              "cityName",
              "comment",
              "countryIsoCode",
              "countryName",
              "isAnonymous",
              "isMinor",
              "isNew",
              "isRobot",
              "isUnpatrolled",
              "metroCode",
              "namespace",
              "page",
              "regionIsoCode",
              "regionName",
              "user"
            ]
          },
          "timestampSpec" : {           #时间戳列名和格式
            "format" : "auto",
            "column" : "time"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "count",
          "type" : "count"
        },
        {
          "name" : "added",              #聚合后指标列名
          "type" : "longSum",            #聚合函数
          "fieldName" : "added"          #聚合用到的列名;可选
        },
        {
          "name" : "deleted",
          "type" : "longSum",
          "fieldName" : "deleted"
        },
        {
          "name" : "delta",
          "type" : "longSum",
          "fieldName" : "delta"
        },
        {
          "name" : "user_unique",
          "type" : "hyperUnique",
          "fieldName" : "user"
        }
      ]
    },
    "tuningConfig" : {
      "type" : "hadoop",
      "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000
      },
      "jobProperties" : {}
    }
  }
}

    官方文档查询例子
    wikiticker-top-pages.json
{
  "queryType" : "topN",                         #对于timeseries查询,该字段的值必须是timeseries;对于topN查询,该字段的值必须是topN
  "dataSource" : "wikiticker",                  #要查询数据集DataSource名字
  "intervals" : ["2015-09-12/2015-09-13"],      #查询时间区间范围,ISO-8601格式
  "granularity" : "all",                        #查询结果进行聚合的时间力度
  "dimension" : "page",
  "metric" : "edits",                           #进行统计排序的Metric,如PV
  "threshold" : 25,                             #TopN的N的取值
  "aggregations" : [                            #聚合器
    {
      "type" : "longSum",
      "name" : "edits",
      "fieldName" : "count"
    }
  ]
}    


driud  架构

driud  时序数据库,预先按照一定的时间对数据进行聚合,以加快分析查询。只支持结构化数据
driud  特点
1,快速查询      部分数据聚合,内存化
2,水平可扩展    分布式数据,并行化查询
3,实时分析      不可变的过去,直追加的未来


索引服务  Overlord Node (Indexing Service)
Overlord会形成一个加载批处理和实时数据到系统中的集群,同时会对存储在系统中的数据变更(也称为索引服务)做出响应。另外,还包含了Middle Manager和Peons,一个Peon负责执行单个task,而Middle Manager负责管理这些Peons。

协调节点   Coordinator Node
监控Historical节点组,以确保数据可用、可复制,并且在一般的“最佳”配置。它们通过从MySQL读取数据段的元数据信息,来决定哪些数据段应该在集群中被加载,使用Zookeeper来确定哪个Historical节点存在,并且创建Zookeeper条目告诉Historical节点加载和删除新数据段。

历史节点  Historical Node
是对“historical”数据(非实时)进行处理存储和查询的地方。Historical节点响应从Broker节点发来的查询,并将结果返回给broker节点。它们在Zookeeper的管理下提供服务,并使用Zookeeper监视信号加载或删除新数据段。

查询节点  Broker Node
接收来自外部客户端的查询,并将这些查询转发到Realtime和Historical节点。当Broker节点收到结果,它们将合并这些结果并将它们返回给调用者。由于了解拓扑,Broker节点使用Zookeeper来确定哪些Realtime和Historical节点的存在。

实时节点 Real-time Node
实时摄取数据,它们负责监听输入数据流并让其在内部的Druid系统立即获取,Realtime节点同样只响应broker节点的查询请求,返回查询结果到broker节点。旧数据会被从Realtime节点转存至Historical节点。


 

你可能感兴趣的:(大数据)