Druid单机版安装及离线导入数据

Druid单机版安装及离线导入数据


1.概述

本文快速安装基于单机服务器,很多配置可以默认不需要修改,数据存储在操作系统级别的磁盘。推出快速安装的目的,便于了解并指导基于Druid进行大数据分析的开发流程。

2.安装要求

  • Java 8 or higher

  • Linux, Mac OS X, or other Unix-like OS (Windows is not supported)

  • 8G of RAM

  • 2 vCPUs

3.zookeeper安装

本次采单机版安装,如果采用分布式安装,则需要修改Druid相应配置,反之不需要。Zookeeper默认启用2181端口监听。

curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz -o zookeeper-3.4.10.tar.gz

tar -xzf zookeeper-3.4.10.tar.gz
cd zookeeper-3.4.10
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start

➜  zookeeper-3.4.10 jps
10565 QuorumPeerMain
17832 Jps

4.Druid安装

curl -O http://static.druid.io/artifacts/releases/druid-0.12.3-bin.tar.gz
tar -xzf druid-0.12.3-bin.tar.gz
cd druid-0.12.3

解压后 Druid 相关目录说明

LICENSE - 许可证文件。
bin/ - 快速启动脚本。
conf/* - 集群安装配置(包括Hadoop)。
conf-quickstart/* - 快速启动相关配置。
extensions/* - Druid扩展。
hadoop-dependencies/* - Druid hadoop依赖。
lib/* - Druid核心软件包。
quickstart/* - 快速启动示例文件及数据。

5.启动 Druid 准备

启动Druid相关服务之前,我们需要做两件事:

  1. 启动Zookeeper
  2. 切换到Druid根目录,执行 bin/init

6.启动 Druid 相关服务

启动5个Druid进程在不同远程终端窗口,因为是单机模式,所有进程在同一服务器上;在大的分布式集群中,很多Druid进程可以在同一服务器,我们需要启动的5个Druid进程:Historical、Broker、coordinator、overlord、middleManager。

启动historical

java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical

注意跟官网的区别,druid安装目录下没有examples目录

java `cat examples/conf/druid/coordinator/jvm.config | xargs` -cp "examples/conf/druid/_common:examples/conf/druid/_common/hadoop-xml:examples/conf/druid/coordinator:lib/*" io.druid.cli.Main server coordinator

启动broker

java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker

启动coordinator

java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator

启动overload

java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord

启动middleManager

java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager

7.Druid 控制台

如果上述服务启动成功,则可以访问如下控制台

    1. 访问http://localhost:8090/console.html 可以查看数据批量导入Druid的任务执情况,间隔一段时间刷新一下控制台,如果看到SUCCESS任务状态,说明任务执行成功,如下图所示:
druid-console.png
    1. 访问http://localhost:8081/ 查看任完成进度、数据分片情况、索引创建等
druid-006.png

8.导入离线数据到Druid

{ "type" : "index", 
  "spec" : {
    "ioConfig" : {
      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "/Users/zzy/Documents/zzy/software/druid-0.12.3/quickstart",
        "filter" : "wikiticker-2015-09-12-sampled.json.gz"
      }
    },
    "dataSchema" : {
      "dataSource" : "wikiticker",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "none",
        "intervals" : ["2015-09-12/2015-09-13"]
      },
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "dimensionsSpec" : {
            "dimensions" : [
              "channel",
              "cityName",
              "comment",
              "countryIsoCode",
              "countryName",
              "isAnonymous",
              "isMinor",
              "isNew",
              "isRobot",
              "isUnpatrolled",
              "metroCode",
              "namespace",
              "page",
              "regionIsoCode",
              "regionName",
              "user"
            ]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "time"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "count",
          "type" : "count"
        },
        {
          "name" : "added",
          "type" : "longSum",
          "fieldName" : "added"
        },
        {
          "name" : "deleted",
          "type" : "longSum",
          "fieldName" : "deleted"
        },
        {
          "name" : "delta",
          "type" : "longSum",
          "fieldName" : "delta"
        },
        {
          "name" : "user_unique",
          "type" : "hyperUnique",
          "fieldName" : "user"
        }
      ]
    },
    "tuningConfig" : {
      "type" : "index",
      "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000
      },
      "jobProperties" : {}
    }
  }
}

注意baseDir最好是绝对路径

执行curl命令
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index_local.json localhost:8090/druid/indexer/v1/task

控制台打印如下

➜  druid-0.12.3 curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index_local.json localhost:8090/druid/indexer/v1/task
{"task":"index_wikiticker_2018-11-27T03:33:42.307Z"}%

去overlord console查看下task的状态http://localhost:8090/console.html

druid-007.png

任务状态是failed的

druid-008.png
druid-009.png

查看日志发现报错如下:

2018-11-27T03:10:43,416 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[AbstractTask{id='index_wikiticker_2018-11-27T03:10:39.850Z', groupId='index_wikiticker_2018-11-27T03:10:39.850Z', taskResource=TaskResource{availabilityGroup='index_wikiticker_2018-11-27T03:10:39.850Z', requiredCapacity=1}, dataSource='wikiticker', context={}}]
java.lang.IllegalStateException: Failed to create directory within 10000 attempts (tried 1543288243332-0 to 1543288243332-9999)
  at com.google.common.io.Files.createTempDir(Files.java:600) ~[guava-16.0.1.jar:?]
  at io.druid.segment.indexing.RealtimeTuningConfig.createNewBasePersistDirectory(RealtimeTuningConfig.java:58) ~[druid-server-0.12.3.jar:0.12.3]
  at io.druid.segment.indexing.RealtimeTuningConfig.makeDefaultTuningConfig(RealtimeTuningConfig.java:68) ~[druid-server-0.12.3.jar:0.12.3]
  at io.druid.segment.realtime.FireDepartment.(FireDepartment.java:62) ~[druid-server-0.12.3.jar:0.12.3]
  at io.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:572) ~[druid-indexing-service-0.12.3.jar:0.12.3]
  at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:264) ~[druid-indexing-service-0.12.3.jar:0.12.3]
  at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
  at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
  at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
2018-11-27T03:10:43,420 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_wikiticker_2018-11-27T03:10:39.850Z] status changed to [FAILED].
2018-11-27T03:10:43,423 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_wikiticker_2018-11-27T03:10:39.850Z",
  "status" : "FAILED",
  "duration" : 109
}

解决方法:手动创建临时目录,比如上面的临时目录var/tmp

mkdir -p tmp

➜  druid-0.12.3 ll var/tmp
total 0
drwxr-xr-x  2 zzy  staff  64 Nov 27 11:33 1543289625953-0
➜  druid-0.12.3 pwd
/Users/zzy/Documents/zzy/software/druid-0.12.3

注意在druid目录下创建,不是根目录!!!

load本地数据成功后,可以在coordinator页面看到多了一个wikiticker的datasources

druid-010.png

查看数据

curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/?pretty

返回如下

➜  druid-0.12.3 curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/\?pretty
[ {
  "timestamp" : "2015-09-12T00:46:58.771Z",
  "result" : [ {
    "edits" : 33,
    "page" : "Wikipedia:Vandalismusmeldung"
  }, {
    "edits" : 28,
    "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
  }, {
    "edits" : 27,
    "page" : "Jeremy Corbyn"
  }, {
    "edits" : 21,
    "page" : "Wikipedia:Administrators' noticeboard/Incidents"
  }, {
    "edits" : 20,
    "page" : "Flavia Pennetta"
  }, {
    "edits" : 18,
    "page" : "Total Drama Presents: The Ridonculous Race"
  }, {
    "edits" : 18,
    "page" : "User talk:Dudeperson176123"
  }, {
    "edits" : 18,
    "page" : "Wikipédia:Le Bistro/12 septembre 2015"
  }, {
    "edits" : 17,
    "page" : "Wikipedia:In the news/Candidates"
  }, {
    "edits" : 17,
    "page" : "Wikipedia:Requests for page protection"
  }, {
    "edits" : 16,
    "page" : "Utente:Giulio Mainardi/Sandbox"
  }, {
    "edits" : 16,
    "page" : "Wikipedia:Administrator intervention against vandalism"
  }, {
    "edits" : 15,
    "page" : "Anthony Martial"
  }, {
    "edits" : 13,
    "page" : "Template talk:Connected contributor"
  }, {
    "edits" : 12,
    "page" : "Chronologie de la Lorraine"
  }, {
    "edits" : 12,
    "page" : "Wikipedia:Files for deletion/2015 September 12"
  }, {
    "edits" : 12,
    "page" : "Гомосексуальный образ жизни"
  }, {
    "edits" : 11,
    "page" : "Constructive vote of no confidence"
  }, {
    "edits" : 11,
    "page" : "Homo naledi"
  }, {
    "edits" : 11,
    "page" : "Kim Davis (county clerk)"
  }, {
    "edits" : 11,
    "page" : "Vorlage:Revert-Statistik"
  }, {
    "edits" : 11,
    "page" : "Конституция Японской империи"
  }, {
    "edits" : 10,
    "page" : "The Naked Brothers Band (TV series)"
  }, {
    "edits" : 10,
    "page" : "User talk:Buster40004"
  }, {
    "edits" : 10,
    "page" : "User:Valmir144/sandbox"
  } ]
} ]%

执行Druid SQL查询

SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
cat quickstart/wikipedia-top-pages-sql.json
{
  "query":"SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE \"__time\" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10"
}

执行命令

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikipedia-top-pages-sql.json http://localhost:8082/druid/v2/sql

返回结果

[{"page":"Wikipedia:Vandalismusmeldung","Edits":33},
{"page":"User:Cyde/List of candidates for speedy deletion/Subpage","Edits":28},
{"page":"Jeremy Corbyn","Edits":27},
{"page":"Wikipedia:Administrators' noticeboard/Incidents","Edits":21},
{"page":"Flavia Pennetta","Edits":20},
{"page":"Total Drama Presents: The Ridonculous Race","Edits":18},
{"page":"User talk:Dudeperson176123","Edits":18},
{"page":"Wikipédia:Le Bistro/12 septembre 2015","Edits":18},
{"page":"Wikipedia:In the news/Candidates","Edits":17},
{"page":"Wikipedia:Requests for page protection","Edits":17}]

更多查询查看官网Tutorial: Querying data

至此Druid单机版及导入离线数据完成,后面会继续更新Druid其他的文章,欢迎关注交流学习。

参考:

http://yangyangmyself.iteye.com/blog/2321487

http://druid.io/docs/latest/tutorials/index.html

https://blog.csdn.net/paicmis/article/details/72625404

imply

你可能感兴趣的:(Druid单机版安装及离线导入数据)