安装
系统环境
linux版本:redhat6
jdk:jdk1.7
1.本地安装与测试
1.1安装
1.1.1下载Drill M1 binary release
http://people.apache.org/~jacques/apache-drill-1.0.0-m1.rc3/apache-drill-1.0.0-m1-binary-release.tar.gz
1.1.2 解压apache-drill-1.0.0-m1-binary-release.tar.gz并做链接
tar -zxf apache-drill-1.0.0-m1-binary-release.tar.gz
做link链接
ln -s apache-drill-1.0.0-m1 drill
1.1.3 配置环境变量
export DRILL_HOME=/home/{username}/drill
export PATH=$PATH:$DRILL_HOME/bin
1.2测试
1.2.1连接
[sudo] sqlline -u jdbc:drill:schema=parquet-local -n admin -p admin
解析:schema原生定义了5种类型:
parquet-local(本地parquet),parquet-cp(classpath-parquet), jsonl(本地json),parquet(classpath-parquet),parquet
具体的定义,参照conf/storage-engines.json
1.2.2退出
jdbc:drill:schema=parquet-local> !q
1.2.3运行一个QUERY
select * from “sample-data/region.parquet";
语句指南
https://developers.google.com/bigquery/query-reference
https://cwiki.apache.org/confluence/display/DRILL/Running+Queries
2. 分布式安装与测试
2.1安装
2.1.1.安装Hadoop
当前drill的原生支持的版本为hadoop1.2
http://litongbupt.iteye.com/blog/1473179
http://litongbupt.iteye.com/blog/1473265
启动hadoop
2.1.2.安装Zookeeper
官网推荐安装Zookeeper3.4.3,经笔者测试,3.4.5也是可以使用的。
部署并启动zookeeper
http://litongbupt.iteye.com/admin/blogs/1987737
2.1.3 部署drill的分布式模式
- 修改conf/drill-override.conf文件 zk:connect:“{zookeeper地址}:2181”
- 修改conf/storage-engines文件
"parquet" :
{
"type":"parquet",
"dfsName" : “hdfs://{hadoop的namenode地址}:9000”
},
"json" :
{
"type":"json",
"dfsName" : "hdfs://{hadoop的namenode地址}:9000"
}
- 将drill目录拷贝到其他节点
- 将.bashrc拷贝到其他节点
- 在每一个节点启动drill: sudo drillbit.sh start
2.2测试
2.2.1测试drill集群是否启动成功
zkCli.sh -server {zookeeper地址}:2181
get /drill/drillbits1
cZxid = 0x100000003
ctime = Tue Dec 10 10:18:42 CST 2013
mZxid = 0x100000003
mtime = Tue Dec 10 10:18:42 CST 2013
pZxid = 0x10000001c
cversion = 12
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 4
这次测试用了numChildren = 4个节点
2.2.2测试QUERY
把数据放到HDFS上 hadoop fs -put sample-data /
链接集群 sqlline -u jdbc:drill:schema=parquet
SELECT _MAP['R_REGIONKEY'] as region_key, _MAP['R_NAME'] AS name, _MAP['R_COMMENT'] AS comment FROM “/sample-data/region.parquet";
SELECT count(distinct _MAP['N_REGIONKEY']) FROM “/sample-data/nation.parquet";
SELECT _MAP['N_REGIONKEY'] as regionKey, _MAP['N_NAME'] as name FROM “/sample-data/nation.parquet" WHERE cast(_MAP['N_NAME'] as varchar) < 'M';
2.3 关闭集群
2.3.1关闭drill集群
在每个节点上执行 sudo drillbit.sh stop
2.3.2关闭zookeeper
在每个节点上执行 sudo zkServer.sh stop
2.3.3在namenode上执行
sudo stop-all.sh