Druid单机版安装及离线导入数据
1.概述
本文快速安装基于单机服务器,很多配置可以默认不需要修改,数据存储在操作系统级别的磁盘。推出快速安装的目的,便于了解并指导基于Druid进行大数据分析的开发流程。
2.安装要求
Java 8 or higher
Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
8G of RAM
2 vCPUs
3.zookeeper安装
本次采单机版安装,如果采用分布式安装,则需要修改Druid相应配置,反之不需要。Zookeeper默认启用2181端口监听。
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz -o zookeeper-3.4.10.tar.gz
tar -xzf zookeeper-3.4.10.tar.gz
cd zookeeper-3.4.10
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
➜ zookeeper-3.4.10 jps
10565 QuorumPeerMain
17832 Jps
4.Druid安装
curl -O http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz
tar -xzf druid-0.12.3-bin.tar.gz
cd druid-0.12.3
解压后 Druid 相关目录说明
LICENSE - 许可证文件。
bin/ - 快速启动脚本。
conf/* - 集群安装配置(包括Hadoop)。
conf-quickstart/* - 快速启动相关配置。
extensions/* - Druid扩展。
hadoop-dependencies/* - Druid hadoop依赖。
lib/* - Druid核心软件包。
quickstart/* - 快速启动示例文件及数据。
5.启动 Druid 准备
启动Druid相关服务之前,我们需要做两件事:
- 启动Zookeeper
- 切换到Druid根目录,执行 bin/init
6.启动 Druid 相关服务
启动5个Druid进程在不同远程终端窗口,因为是单机模式,所有进程在同一服务器上;在大的分布式集群中,很多Druid进程可以在同一服务器,我们需要启动的5个Druid进程:Historical、Broker、coordinator、overlord、middleManager。
启动historical
java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical
注意跟官网的区别,druid安装目录下没有examples目录
java `cat examples/conf/druid/coordinator/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
启动broker
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker
启动coordinator
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
启动overload
java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord
启动middleManager
java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
7.Druid 控制台
如果上述服务启动成功,则可以访问如下控制台
-
- 访问http://localhost:8090/console.html 可以查看数据批量导入Druid的任务执情况,间隔一段时间刷新一下控制台,如果看到SUCCESS任务状态,说明任务执行成功,如下图所示:
-
- 访问http://localhost:8081/ 查看任完成进度、数据分片情况、索引创建等
8.导入离线数据到Druid
{ "type" : "index",
"spec" : {
"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "local",
"baseDir" : "/Users/zzy/Documents/zzy/software/druid-0.12.3/quickstart",
"filter" : "wikiticker-2015-09-12-sampled.json.gz"
}
},
"dataSchema" : {
"dataSource" : "wikiticker",
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2015-09-12/2015-09-13"]
},
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"channel",
"cityName",
"comment",
"countryIsoCode",
"countryName",
"isAnonymous",
"isMinor",
"isNew",
"isRobot",
"isUnpatrolled",
"metroCode",
"namespace",
"page",
"regionIsoCode",
"regionName",
"user"
]
},
"timestampSpec" : {
"format" : "auto",
"column" : "time"
}
}
},
"metricsSpec" : [
{
"name" : "count",
"type" : "count"
},
{
"name" : "added",
"type" : "longSum",
"fieldName" : "added"
},
{
"name" : "deleted",
"type" : "longSum",
"fieldName" : "deleted"
},
{
"name" : "delta",
"type" : "longSum",
"fieldName" : "delta"
},
{
"name" : "user_unique",
"type" : "hyperUnique",
"fieldName" : "user"
}
]
},
"tuningConfig" : {
"type" : "index",
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize" : 5000000
},
"jobProperties" : {}
}
}
}
注意baseDir最好是绝对路径
执行curl命令
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index_local.json localhost:8090/druid/indexer/v1/task
控制台打印如下
➜ druid-0.12.3 curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index_local.json localhost:8090/druid/indexer/v1/task
{"task":"index_wikiticker_2018-11-27T03:33:42.307Z"}%
去overlord console查看下task的状态http://localhost:8090/console.html
任务状态是failed的
查看日志发现报错如下:
2018-11-27T03:10:43,416 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[AbstractTask{id='index_wikiticker_2018-11-27T03:10:39.850Z', groupId='index_wikiticker_2018-11-27T03:10:39.850Z', taskResource=TaskResource{availabilityGroup='index_wikiticker_2018-11-27T03:10:39.850Z', requiredCapacity=1}, dataSource='wikiticker', context={}}]
java.lang.IllegalStateException: Failed to create directory within 10000 attempts (tried 1543288243332-0 to 1543288243332-9999)
at com.google.common.io.Files.createTempDir(Files.java:600) ~[guava-16.0.1.jar:?]
at io.druid.segment.indexing.RealtimeTuningConfig.createNewBasePersistDirectory(RealtimeTuningConfig.java:58) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.segment.indexing.RealtimeTuningConfig.makeDefaultTuningConfig(RealtimeTuningConfig.java:68) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.segment.realtime.FireDepartment.(FireDepartment.java:62) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:572) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:264) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
2018-11-27T03:10:43,420 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_wikiticker_2018-11-27T03:10:39.850Z] status changed to [FAILED].
2018-11-27T03:10:43,423 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_wikiticker_2018-11-27T03:10:39.850Z",
"status" : "FAILED",
"duration" : 109
}
解决方法:手动创建临时目录,比如上面的临时目录var/tmp
mkdir -p tmp
➜ druid-0.12.3 ll var/tmp
total 0
drwxr-xr-x 2 zzy staff 64 Nov 27 11:33 1543289625953-0
➜ druid-0.12.3 pwd
/Users/zzy/Documents/zzy/software/druid-0.12.3
注意在druid目录下创建,不是根目录!!!
load本地数据成功后,可以在coordinator页面看到多了一个wikiticker的datasources
查看数据
curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty
返回如下
➜ druid-0.12.3 curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/\?pretty
[ {
"timestamp" : "2015-09-12T00:46:58.771Z",
"result" : [ {
"edits" : 33,
"page" : "Wikipedia:Vandalismusmeldung"
}, {
"edits" : 28,
"page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
}, {
"edits" : 27,
"page" : "Jeremy Corbyn"
}, {
"edits" : 21,
"page" : "Wikipedia:Administrators' noticeboard/Incidents"
}, {
"edits" : 20,
"page" : "Flavia Pennetta"
}, {
"edits" : 18,
"page" : "Total Drama Presents: The Ridonculous Race"
}, {
"edits" : 18,
"page" : "User talk:Dudeperson176123"
}, {
"edits" : 18,
"page" : "Wikipédia:Le Bistro/12 septembre 2015"
}, {
"edits" : 17,
"page" : "Wikipedia:In the news/Candidates"
}, {
"edits" : 17,
"page" : "Wikipedia:Requests for page protection"
}, {
"edits" : 16,
"page" : "Utente:Giulio Mainardi/Sandbox"
}, {
"edits" : 16,
"page" : "Wikipedia:Administrator intervention against vandalism"
}, {
"edits" : 15,
"page" : "Anthony Martial"
}, {
"edits" : 13,
"page" : "Template talk:Connected contributor"
}, {
"edits" : 12,
"page" : "Chronologie de la Lorraine"
}, {
"edits" : 12,
"page" : "Wikipedia:Files for deletion/2015 September 12"
}, {
"edits" : 12,
"page" : "Гомосексуальный образ жизни"
}, {
"edits" : 11,
"page" : "Constructive vote of no confidence"
}, {
"edits" : 11,
"page" : "Homo naledi"
}, {
"edits" : 11,
"page" : "Kim Davis (county clerk)"
}, {
"edits" : 11,
"page" : "Vorlage:Revert-Statistik"
}, {
"edits" : 11,
"page" : "Конституция Японской империи"
}, {
"edits" : 10,
"page" : "The Naked Brothers Band (TV series)"
}, {
"edits" : 10,
"page" : "User talk:Buster40004"
}, {
"edits" : 10,
"page" : "User:Valmir144/sandbox"
} ]
} ]%
执行Druid SQL查询
SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
cat quickstart/wikipedia-top-pages-sql.json
{
"query":"SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE \"__time\" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10"
}
执行命令
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql
返回结果
[{"page":"Wikipedia:Vandalismusmeldung","Edits":33},
{"page":"User:Cyde/List of candidates for speedy deletion/Subpage","Edits":28},
{"page":"Jeremy Corbyn","Edits":27},
{"page":"Wikipedia:Administrators' noticeboard/Incidents","Edits":21},
{"page":"Flavia Pennetta","Edits":20},
{"page":"Total Drama Presents: The Ridonculous Race","Edits":18},
{"page":"User talk:Dudeperson176123","Edits":18},
{"page":"Wikipédia:Le Bistro/12 septembre 2015","Edits":18},
{"page":"Wikipedia:In the news/Candidates","Edits":17},
{"page":"Wikipedia:Requests for page protection","Edits":17}]
更多查询查看官网Tutorial: Querying data
至此Druid单机版及导入离线数据完成,后面会继续更新Druid其他的文章,欢迎关注交流学习。
参考:
http://yangyangmyself.iteye.com/blog/2321487
http://druid.io/docs/latest/tutorials/index.html
https://blog.csdn.net/paicmis/article/details/72625404
imply