单机版安装记录
1,下载并解压
tar -xzvf ruid-0.12.1-bin.tar.gz
2,安装好zk
过程略
3,配置Druid
# vi conf-quickstart/druid/_common/common.runtime.properties
---
# 配置zookeeper连接,如果zookeeper端口是2181可以不写端口号,多个zookeeper使用英文逗号分隔
druid.zk.service.host=cdh-01-11:2181,cdh-01-12:2181,cdh-01-13:2181
# 配置druid在zookeeper的存储路径
druid.zk.paths.base=/druid
4,初始化druid
首先进入到Druid的根目录,执行bin/init。Druid会自动创建一个var目录,内含两个目录。
一个是druid,用于存放本地环境下Hadoop的临时文件、缓存和任务的临时文件等。另一个是tmp用于存放其他临时文件。
5,启动Druid
启动Druid脚本不分先后顺序,可以以任何顺序启动Druid各节点
快速启动druid 方式(即快速模式)
// 启动历史数据节点
java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical &
// 启动查询路由聚合节点
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker &
// 启动分片管理发布节点
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator &
// 启动任务分配节点
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord &
// 启动任务执行节点
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager &
正常启动druid方式(即分布式模式)此模式需要比较多的内存
// 启动历史数据节点
java `cat conf/druid/historical/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/historical:lib/*" io.druid.cli.Main server historical &
// 启动查询路由聚合节点
java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" io.druid.cli.Main server broker &
// 启动分片管理发布节点
java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator &
// 启动任务分配节点
java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" io.druid.cli.Main server overlord &
// 启动任务执行节点
java `cat conf/druid/middleManager/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager &
6,web 查看
druid 调度控制台网址:http://cdh-01-12:8090/console.html
#druid 调度控制台网址:http://cdh-01-13:8090/console.html
druid 查询任务监测网址:http://cdh-01-12:8081/#/datasources/wikiticker
#druid 查询任务监测网址:http://cdh-01-13:8081/#/datasources/wikiticker
7,加载并查询测试数据
加载数据请求
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json http://localhost:8090/druid/indexer/v1/task
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json http://cdh-01-13:8090/druid/indexer/v1/task
查询请求测试
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2?pretty
Http方式查询
Druid支持通过Http方式查询Druid数据,查询参数必须是标准的JSON格式,并且Header必须是Content-Type: application/json,否则无法正常查询数据
http://localhost:8082/druid/v2/?pretty
8 DataSource结构
与传统关系型数据库相比,Druid的DataSource可以算是table,DataSource的结构包括以下几个方面
时间列:表明每行数据的时间值,默认只用UTC时间格式且精确到毫秒。这个列是数据聚合与范围查询的重要维度
维度列:用来表示数据行的各个类别信息
指标列:用于聚合和计算的列
Segment结构
DataSource是一个逻辑概念,Segment确实数据的实际物理存储格式,segment 的组织方式是通过时间戳跟粒度来定义的.Druid通granularitySpec过Segment实现了对数据的横纵切割操作,
从数据按时间分布的角度来看,通过参数segmentGranularity设置,Druid将不同时间范围内的数据存储在不同的Segment数据块中,这边是所谓的数据横向切割。
带来的优点:按时间范围查询数据时,仅需访问对应时间段内的这些Segment数据块,而不需要进项全表数据查询。下图是Segment按时间范围存储的结构;同时在Segment中也面向列进行
数据压缩存储,这就是数据纵向切割;(Segment中使用Bitmap等技术对数据的访问进行了优化,没有详细了解bitmap)
9 数据格式定义
{
"dataSchema" : {...}, #JSON对象,指明数据源格式、数据解析、维度等信息
"ioConfig" : {...}, #JSON对象,指明数据如何在Druid中存储
"tuningConfig" : {...} #JSON对象,指明存储优化配置(非必填)
}
DataSchema详解
{
"datasource":"...", #string类型,数据源名称
"parser": {...}, #JSON对象,包含如何解析数据的相关内容
"metricsSpec": [...], #list 包含了所有的指标信息
"granularitySpec": {...} #JSON对象,指明数据的存储和查询力度
}
"granularitySpec" : {
"type" : "uniform", //type : 用来指定粒度的类型使用 uniform, arbitrary(尝试创建大小相等的段).
"segmentGranularity" : "day", //segmentGranularity : 用来确定每个segment包含的时间戳范围
"queryGranularity" : "none", //控制注入数据的粒度。 最小的queryGranularity 是 millisecond(毫秒级)
"rollup" : false,
"intervals" : ["2018-01-09/2018-01-13"] //intervals : 用来确定总的要获取的文件时间戳的范围
}
总结:
应该怎样调用数据集?这是“dataSchema”的“dataSource”字段。
数据集位于何处?文件路径属于“inputSpec”的“路径”。如果要加载多个文件,可以将它们作为逗号分隔的字符串提供。
哪个字段应该被视为时间戳?这属于“timestampSpec”的“列”。
应将哪些字段视为维度?这属于“dimensionsSpec”的“维度”。
应将哪些字段视为指标?这属于“metricsSpec”。
正在加载什么时间范围(间隔)?这属于“intervals”的“间隔”。
10
官方文档,数据部分样例
{"time":"2015-09-12T00:46:58.771Z","channel":"#en.wikipedia","cityName":null,"comment":"added project","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Talk","page":"Talk:Oswald Tilghman","regionIsoCode":null,"regionName":null,"user":"GELongstreet","delta":36,"added":36,"deleted":0}
{"time":"2015-09-12T00:47:00.496Z","channel":"#ca.wikipedia","cityName":null,"comment":"Robot inserta {{Commonscat}} que enllaça amb [[commons:category:Rallicula]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":true,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Rallicula","regionIsoCode":null,"regionName":null,"user":"PereBot","delta":17,"added":17,"deleted":0}
{"time":"2015-09-12T00:47:05.474Z","channel":"#en.wikipedia","cityName":"Auburn","comment":"/* Status of peremptory norms under international law */ fixed spelling of 'Wimbledon'","countryIsoCode":"AU","countryName":"Australia","isAnonymous":true,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Peremptory norm","regionIsoCode":"NSW","regionName":"New South Wales","user":"60.225.66.142","delta":0,"added":0,"deleted":0}
{"time":"2015-09-12T00:47:08.770Z","channel":"#vi.wikipedia","cityName":null,"comment":"fix Lỗi CS1: ngày tháng","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":true,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Apamea abruzzorum","regionIsoCode":null,"regionName":null,"user":"Cheers!-bot","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:11.862Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Atractus flammigerus","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:13.987Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Agama mossambica","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
{"time":"2015-09-12T00:47:17.009Z","channel":"#ca.wikipedia","cityName":null,"comment":"/* Imperi Austrohongarès */","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":false,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Campanya dels Balcans (1914-1918)","regionIsoCode":null,"regionName":null,"user":"Jaumellecha","delta":-20,"added":0,"deleted":20}
{"time":"2015-09-12T00:47:19.591Z","channel":"#en.wikipedia","cityName":null,"comment":"adding comment on notability and possible COI","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":true,"isRobot":false,"isUnpatrolled":true,"metroCode":null,"namespace":"Talk","page":"Talk:Dani Ploeger","regionIsoCode":null,"regionName":null,"user":"New Media Theorist","delta":345,"added":345,"deleted":0}
{"time":"2015-09-12T00:47:21.578Z","channel":"#en.wikipedia","cityName":null,"comment":"Copying assessment table to wiki","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"User","page":"User:WP 1.0 bot/Tables/Project/Pubs","regionIsoCode":null,"regionName":null,"user":"WP 1.0 bot","delta":121,"added":121,"deleted":0}
{"time":"2015-09-12T00:47:25.821Z","channel":"#vi.wikipedia","cityName":null,"comment":"clean up using [[Project:AWB|AWB]]","countryIsoCode":null,"countryName":null,"isAnonymous":false,"isMinor":false,"isNew":false,"isRobot":true,"isUnpatrolled":false,"metroCode":null,"namespace":"Main","page":"Agama persimilis","regionIsoCode":null,"regionName":null,"user":"ThitxongkhoiAWB","delta":18,"added":18,"deleted":0}
官方文档:批量加载文件例子
wikiticker-index.json
{
"type" : "index_hadoop",
"spec" : {
"ioConfig" : {
"type" : "hadoop",
"inputSpec" : {
"type" : "static",
"paths" : "quickstart/wikiticker-2015-09-12-sampled.json.gz"
}
},
"dataSchema" : {
"dataSource" : "wikiticker",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day", #存储力度
"queryGranularity" : "none", #最小查询力度
"intervals" : ["2015-09-12/2015-09-13"] #摄取数据的时间段,可以有多个值,可选
},
"parser" : {
"type" : "hadoopyString", #数据类型
"parseSpec" : { #json对象
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [ #维度设置
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user"
]
},
"timestampSpec" : { #时间戳列名和格式
"format" : "auto",
"column" : "time"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "added", #聚合后指标列名
"type" : "longSum", #聚合函数
"fieldName" : "added" #聚合用到的列名;可选
},
{
"name" : "deleted",
"type" : "longSum",
"fieldName" : "deleted"
},
{
"name" : "delta",
"type" : "longSum",
"fieldName" : "delta"
},
{
"name" : "user_unique",
"type" : "hyperUnique",
"fieldName" : "user"
}
]
},
"tuningConfig" : {
"type" : "hadoop",
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 5000000
},
"jobProperties" : {}
}
}
}
官方文档查询例子
wikiticker-top-pages.json
{
"queryType" : "topN", #对于timeseries查询,该字段的值必须是timeseries;对于topN查询,该字段的值必须是topN
"dataSource" : "wikiticker", #要查询数据集DataSource名字
"intervals" : ["2015-09-12/2015-09-13"], #查询时间区间范围,ISO-8601格式
"granularity" : "all", #查询结果进行聚合的时间力度
"dimension" : "page",
"metric" : "edits", #进行统计排序的Metric,如PV
"threshold" : 25, #TopN的N的取值
"aggregations" : [ #聚合器
{
"type" : "longSum",
"name" : "edits",
"fieldName" : "count"
}
]
}
driud 架构
driud 时序数据库,预先按照一定的时间对数据进行聚合,以加快分析查询。只支持结构化数据
driud 特点
1,快速查询 部分数据聚合,内存化
2,水平可扩展 分布式数据,并行化查询
3,实时分析 不可变的过去,直追加的未来
索引服务 Overlord Node (Indexing Service)
Overlord会形成一个加载批处理和实时数据到系统中的集群,同时会对存储在系统中的数据变更(也称为索引服务)做出响应。另外,还包含了Middle Manager和Peons,一个Peon负责执行单个task,而Middle Manager负责管理这些Peons。
协调节点 Coordinator Node
监控Historical节点组,以确保数据可用、可复制,并且在一般的“最佳”配置。它们通过从MySQL读取数据段的元数据信息,来决定哪些数据段应该在集群中被加载,使用Zookeeper来确定哪个Historical节点存在,并且创建Zookeeper条目告诉Historical节点加载和删除新数据段。
历史节点 Historical Node
是对“historical”数据(非实时)进行处理存储和查询的地方。Historical节点响应从Broker节点发来的查询,并将结果返回给broker节点。它们在Zookeeper的管理下提供服务,并使用Zookeeper监视信号加载或删除新数据段。
查询节点 Broker Node
接收来自外部客户端的查询,并将这些查询转发到Realtime和Historical节点。当Broker节点收到结果,它们将合并这些结果并将它们返回给调用者。由于了解拓扑,Broker节点使用Zookeeper来确定哪些Realtime和Historical节点的存在。
实时节点 Real-time Node
实时摄取数据,它们负责监听输入数据流并让其在内部的Druid系统立即获取,Realtime节点同样只响应broker节点的查询请求,返回查询结果到broker节点。旧数据会被从Realtime节点转存至Historical节点。