环境配置
Java1.8 | Mysql:5.6.42 | Hadoop:3.1.1 | Druid:0.12.3 。本篇文章默认读者环境中已经有了Java Mysql Hadoop,对于Druid所依赖的这些配置不做具体讲解。
准备工作
1:mysql(作为Druid的 Metadata Storage)
1):为druid创建库druid
CREATE DATABASE 'druid' DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES;
2):为druid创建用户druid:druid1234
grant all on druid.* to druid@'%' identified by 'druid1234' WITH GRANT OPTION;
flush privileges;
2:Hadoop集群正常提供服务(zk,hdfs)
HDFS作为Druid的 Deep Storage
ZK作为Druid的集群状态管理,用于集群协调
集群节点规划
集群部署在三个节点上(192.168.0.180,192.168.0.181,192.168.0.182),具体硬件要求,官网:http://druid.io/docs/0.12.3/tutorials/cluster.html
节点192.168.0.180-Master Server: Coordinator / Overlord
节点192.168.0.181-Query Server:Broker
节点192.168.0.182-Data Server: Historical / Middle Manager
集群配置
下载&解压
1)在节点192.168.0.180:/opt/app目录下下载Druid:
curl -O http://static.druid.io/artifacts/releases/druid- 0.12.3-bin.tar.gz
2)解压:tar -zxvf druid-0.12.3-bin.tar.gz 得到文件夹:druid-0.12.3
修改配置
1)进入druid-0.12.3/conf/druid/_common,修改配置文件 [common.runtime.properties]
[common.runtime.properties]
#Extensions
druid.extensions.loadList=["druid-datasketches", "druid-lookups-cached-global","mysql-metadata-storage","druid-hdfs-storage""druid-histogram","druid-kafka-indexing-service"]
# Logging
druid.startup.logging.logProperties=true
# Zookeeper
druid.zk.service.host=192.168.0.180:2181,192.168.0.181:2181,192.168.0.182:2181
druid.zk.paths.base=/druid
# For MySQL
druid.metadata.storage.type=mysql
druid.metadata.storage.connector.connectURI=jdbc:mysql://192.168.0.182:3306/druid?characterEncoding=UTF-8
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=druid1234
# Deep storage - HDFS
druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments
# Indexing service logs
druid.indexer.logs.type=hdfs
druid.indexer.logs.directory=/druid/indexing-logs
# Monitoring
druid.monitoring.monitors=["io.druid.java.util.metrics.JvmMonitor"]
druid.emitter=logging
druid.emitter.logging.logLevel=info
# Storage type of double columns
druid.indexing.doubleStorage=double
2)进入druid-0.12.3/conf/druid/coordinator,修改配置文件[jvm.config] 和 [runtime.properties]:
[jvm.config]
-server
-Xms256m
-Xmx256m
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dderby.stream.error.file=var/druid/derby.log
[runtime.properties]
druid.service=druid/coordinator
druid.host=192.168.0.180
druid.port=8081
druid.coordinator.startDelay=PT30S
druid.coordinator.period=PT30S
3)进入druid-0.12.3/conf/druid/overlord,修改配置文件[jvm.config] 和 [runtime.properties]:
[jvm.config]
-server
-Xms512m
-Xmx512m
-XX:NewSize=256m
-XX:MaxNewSize=256m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[runtime.properties]
druid.service=druid/overlord
druid.host=192.168.0.180
druid.port=8090
druid.indexer.queue.startDelay=PT30S
druid.indexer.runner.type=remote
druid.indexer.storage.type=metadata
4)将节点192.168.0.180上的druid-0.12.3文件copy到其它两个节点上
scp -r /opt/app/druid-0.12.3/ [email protected]:/opt/app/.
scp -r /opt/app/druid-0.12.3/ [email protected]:/opt/app/.
注意
在修改broker和historical的配置文件时,以下参数设置要求可以参考官网:
MaxDirectMemorySize >= druid.processing.buffer.sizeByte *(druid.processing.numMergeBuffers + druid.processing.numThreads + 1)
druid.processing.numMergeBuffers = max(2, druid.processing.numThreads / 4)
druid.processing.numThreads = Number of cores - 1 (or 1)
druid.server.http.numThreads = max(10, (Number of cores * 17) / 16 + 2) + 30
5)在节点192.168.0.181上,进入druid-0.12.3/conf/druid/broker,修改配置文件[jvm.config] 和 [runtime.properties]
[jvm.config]
-server
-Xms1g
-Xmx1g
-XX:NewSize=256m
-XX:MaxNewSize=256m
-XX:MaxDirectMemorySize=4608m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[runtime.properties]
druid.service=druid/broker
druid.host=192.168.0.181
druid.port=8082
# HTTP server threads
druid.broker.http.numConnections=5
druid.server.http.numThreads=25
# Processing threads and buffers
druid.processing.buffer.sizeBytes=536870912
druid.processing.numMergeBuffers=2
druid.processing.numThreads=6
# Query cache
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false
druid.cache.type=local
druid.cache.sizeInBytes=2000000000
# SQL
druid.sql.enable=true
6)在节点192.168.0.182上,进入druid-0.12.3/conf/druid/historical,修改配置文件[jvm.config] 和 [runtime.properties]
[jvm.config]
-server
-Xms1g
-Xmx1g
-XX:NewSize=512m
-XX:MaxNewSize=512m
-XX:MaxDirectMemorySize=3072m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[runtime.properties]
druid.service=druid/historical
druid.host=192.168.0.182
druid.port=8083
# HTTP server threads
druid.server.http.numThreads=25
# Processing threads and buffers
druid.processing.buffer.sizeBytes=25600000
druid.processing.numMergeBuffers=2
druid.processing.numThreads=6
# Segment storage
druid.segmentCache.locations=[{"path":"var/druid/segment-cache","maxSize":130000000000}]
druid.server.maxSize=130000000000
# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=2000000000
7)在节点192.168.0.182上,进入druid-0.12.3/conf/druid/middleManager,修改配置文件[jvm.config] 和 [runtime.properties]
[jvm.config]
-server
-Xms1024m
-Xmx1024m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC+8
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=var/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
[runtime.properties]
druid.service=druid/middleManager
druid.host=192.168.0.182
druid.port=8091
# Number of tasks per middleManager
druid.worker.capacity=3
# Task launch parameters
druid.indexer.runner.javaOpts=-server -Xmx2g -Duser.timezone=UTC+8 -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
druid.indexer.task.restoreTasksOnRestart=true
# HTTP server threads
druid.server.http.numThreads=25
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=25600000
druid.indexer.fork.property.druid.processing.numThreads=2
# Hadoop indexing
druid.indexer.task.hadoopWorkingPath=/druid/hadoop-tmp
druid.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:3.1.1"]
8)需要在三个节点分别执行下面的步骤:
①切换到Druid根目录,下载hadoop-client,3.1.1的外部依赖
java -classpath "lib/*" io.druid.cli.Main tools pull-deps -h org.apache.hadoop:hadoop-client:3.1.1
②本文选用mysql存metadata,需要下载mysql storage
curl -O http://static.druid.io/artifacts/releases/mysql-metadata-storage-0.12.3.tar.gz
解压:tar -zxvf mysql-metadata-storage-0.12.3.tar.gz 生成:mysql-metadata-storage ,将解压后的文件copy到/druid-0.12.3/extensions目录下
③本文选用hdfs作为数据的存储位置,故需要将Hadoop配置XML文件(core-site.xml,hdfs-site.xml,yarn-site.xml,mapred-site.xml)放在Druid节点的classpath上。你可以通过将它们复制到conf/druid/_common/中来实现
④切换到Druid根目录,执行 bin/init,会在根目录下生成: var 和 log两个文件夹
var文件
集群启动
1)在192.168.0.180节点上启动coordinator和overlord
方式1:java `cat conf/druid/coordinator/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator
java `cat conf/druid/overlord/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/overlord:lib/*" io.druid.cli.Main server overlord
方式2:./bin/coordinator.sh start
./bin/overlord.sh start
2)在192.168.0.181节点上启动broker
方式1:java `cat conf/druid/broker/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/broker:lib/*" io.druid.cli.Main server broker
方式2:./bin/broker.sh start
3)在192.168.0.182节点上启动historical 和 middleManager
方式1:java `cat conf/druid/historical/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/historical:lib/*" io.druid.cli.Main server historical
java `cat conf/druid/middleManager/jvm.config | xargs` -cp "conf/druid/_common:conf/druid/middleManager:lib/*" io.druid.cli.Main server middleManager
方式2:./bin/historical.sh start
./bin/middleManager.sh start
查看集群状态
http://192.168.0.180:8081
http://192.168.0.180:8090
小小案例
描述:现在我们实现一个从hdfs中把数据加载到Druid中的Demo。然后在broker节点上再去将我们刚刚创建的表中放入的数据查询出来。
1:首先我们将在本地创建的一个json文件hdfs-data.json上传到hdfs /tmp/druid目录下,hdfs-data.json内容如下:
{"timestamp":"2018-01-01T01:01:35Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":20,"bytes":9024}
{"timestamp":"2018-01-01T01:01:51Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":255,"bytes":21133}
{"timestamp":"2018-01-01T01:01:59Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":11,"bytes":5780}
{"timestamp":"2018-01-01T01:02:14Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":38,"bytes":6289}
{"timestamp":"2018-01-01T01:02:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":377,"bytes":359971}
{"timestamp":"2018-01-01T01:03:29Z","srcIP":"1.1.1.1", "dstIP":"2.2.2.2","packets":49,"bytes":10204}
{"timestamp":"2018-01-02T21:33:14Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":38,"bytes":6289}
{"timestamp":"2018-01-02T21:33:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":123,"bytes":93999}
{"timestamp":"2018-01-02T21:35:45Z","srcIP":"7.7.7.7", "dstIP":"8.8.8.8","packets":12,"bytes":2818}
hdfs dfs -put /opt/app/druid-0.12.3/quickstart/hdfs-book/hdfs-data.json /tmp/druid/.
2:创建一个index的json文件 hdfs-index.json, hdfs-index.json内容:
{
"type" : "index_hadoop",
"spec" : {
"dataSchema" : {
"dataSource" : "rollup-tutorial",
"parser" : {
"type" : "string",
"parseSpec" : {
"format" : "json",
"dimensionsSpec" : {
"dimensions" : [
"srcIP",
"dstIP",
"packets",
"bytes"
]
},
"timestampSpec": {
"column": "timestamp",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "day",
"queryGranularity" : "none",
"intervals" : ["2008-01-01/2018-01-03"],
"rollup" : false
}
},
"ioConfig" : {
"type" : "hadoop",
"inputSpec" : {
"type" : "static",
"paths" : "/tmp/druid/hdfs-data.json"
}
},
"tuningConfig" : {
"type" : "hadoop",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true,
"jobProperties" : {
"mapreduce.job.classloader" : "true"
}
}
},
"hadoopDependencyCoordinates" : [
"org.apache.hadoop:hadoop-client:3.1.1"
]
}
3:执行下面的语句,摄入数据
curl -X 'POST' -H 'Content-Type:application/json' -d @hdfs-index.json http://host:8090/druid/indexer/v1/task