本来是想既然可以通过es-hadoop插件用hive查询es的数据,为啥不能用impala来做分析呢;
结果是hive查es成功了,impala查询不了,悲剧,但还是记录一下过程中遇到的问题,特别是日期格式转换那里搞了好久。
安装:
下载地址:https://www.elastic.co/cn/downloads/hadoop
找对应或高于自己es的版本即可
下载下来其实是一堆jar包
找了个简单的es索引,就两列,detectTime,flow
"mappings": {
"flow_message_": {
"properties": {
"detectTime": {
"format": "YYYY-MM-dd HH:mm:ss",
"type": "date"
},
"flow": {
"type": "integer"
}
}
}
},
进入hive命令行:
1.添加jar
hive> add jar file:///home/elasticsearch/es-hadoop/elasticsearch-hadoop-5.5.0.jar;
2.创建表
CREATE EXTERNAL TABLE flow_message_xxx (
id string,
detect_time timestamp,
flow int
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = '192.168.x.xx:9500,192.168.x.xx:9500',
'es.index.auto.create' = 'false',
'es.resource' = 'flow_message_201708/flow_message_',
'es.read.metadata' = 'true',
'es.mapping.names' = 'id:_metadata._id, detect_time:detectTime, flow:flow');
//es.nodes节点
//es.resource:es index/type
//es.mapping.names:hive字段和es字段对应
表创建成功
然而悲剧的是查询失败:
Failed with exception java.io.IOException:
org.elasticsearch.hadoop.rest.EsHadoopParsingException:
Cannot parse value [2017-08-24 14:15:52] for field [detectTime]
应该是es的这种时间格式无法正确解析
导入es-hadoop源码看了下
发现HiveValueReader中使用的是下面这个方法解析的时间字符串
@Override
protected Object parseDate(String value, boolean richDate) {
return (richDate ? new TimestampWritable(new Timestamp(DatatypeConverter.parseDateTime(value).getTimeInMillis())) : parseString(value));
}
DatatypeConverter.parseDateTime(value)
并不是我们自定的格式YYYY-MM-dd HH:mm:ss
这样我们可以在Hive中指定时间格式,并自定义自己的Reader
CREATE EXTERNAL TABLE flow_message_xxx (
id string,
detect_time timestamp,
flow int
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes' = 'es.nodes' = '192.168.x.xx:9500,192.168.x.xx:9500',
'es.index.auto.create' = 'false',
'es.resource' = 'flow_message_201708/flow_message_',
'es.read.metadata' = 'true',
'es.date.format' = 'yyyy-MM-dd HH:mm:ss',
'es.ser.reader.value.class' = 'com.eshadoop.EsValueReader',
'es.mapping.names' = 'id:_metadata._id,detect_time:detectTime');
//es.date.format:格式
//es.ser.reader.value.class:类
自定义的EsValueReader:
package com.eshadoop;
/**
* @function
* @author meyao
* @create 2018-04-18 16:21
* @version v1.0
**/
import org.apache.hadoop.hive.serde2.io.TimestampWritable;
import org.elasticsearch.hadoop.cfg.Settings;
import org.elasticsearch.hadoop.hive.HiveValueReader;
import javax.xml.bind.DatatypeConverter;
import java.sql.Timestamp;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public class EsValueReader extends HiveValueReader {
private String dateFormat;
private static SimpleDateFormat format_ymd_hms;
@Override
public void setSettings(Settings settings) {
super.setSettings(settings);
//获取我们自定的时间格式
dateFormat = settings.getProperty("es.date.format");
format_ymd_hms = new SimpleDateFormat(dateFormat);
}
@Override
protected Object parseDate(String value, boolean richDate) {
Date d = null;
if (!"".equals(dateFormat) && dateFormat != null) {
try {
d = format_ymd_hms.parse(value);
} catch (ParseException e) {
e.printStackTrace();
}
} else {
d = DatatypeConverter.parseDateTime(value).getTime();
}
return (richDate ? new TimestampWritable(new Timestamp(d.getTime())) : parseString(value));
}
}