使用Hive读写ElasticSearch中的数据

    本文将介绍如何通过Hive来读取ElasticSearch中的数据,然后我们可以像操作其他正常Hive表一样,使用Hive来直接操作ElasticSearch中的数据,将极大的方便开发人员。本文使用的各组件版本分别为 Hive2.3.3、HDFS2.7.2、ElasticSearch 6.3.0
0.ElasticSearch中已有的数据
"_index": "contract_index_v2",
"_type": "contract_type_v2",
"_id": "1_217",
"_score": 1,
字段:districtCode、cityCode、goodsTypeName…...

字段:districtCodecityCodegoodsTypeName…...

 
使用Hive读写ElasticSearch中的数据_第1张图片
 
1.Hive 读取 ElasticSearch 的数据(下面是本人亲自操作,阿里云环境)
(1)hive中添加ElasticSearch的jar包进行解压
    参考博客: https://www.cnblogs.com/Dhouse/p/7228557.html(本人亲自测试,只有第二种方式有效)
 
[root@emr-worker-2 software]#  wget https://artifacts.elastic.co/downloads/elasticsearch-hadoop/elasticsearch-hadoop-6.3.0.zip
[root@emr-worker-2 software]# pwd
/home/software
[root@emr-worker-2 software]# ls -l
total 7424
drwxr-xr-x 3 root root    4096 Jun 12  2018 elasticsearch-hadoop-6.3.0
-rw-r--r-- 1 root root 7596770 Jun 13  2018 elasticsearch-hadoop-6.3.0.zip
 
(2)在${HIVE_HOME}中创建文件夹auxlib,然后将自定义jar文件放入该文件夹中。
[root@emr-worker-2 software]# whereis hive
hive: /opt/apps/ecm/service/hive/2.3.3-1.0.1/package/apache-hive-2.3.3-1.0.1-bin/bin/hive
[root@emr-worker-2 software]# scp -r /home/software/elasticsearch-hadoop-6.3.0/dist/elasticsearch-hadoop-6.3.0.jar /opt/apps/ecm/service/hive/2.3.3-1.0.1/package/apache-hive-2.3.3-1.0.1-bin/auxlib/

2.Hive 创建外部表并导入数据


CREATE EXTERNAL TABLE stg_contract_index_v2(
  id string,
  warehouseCode string,
  warehouseTypeCode string,
  detailAddress string,
  cityCode string,
  cityName string,
  provinceCode string,
  provinceName string,
  zone string,
  zoneName string,
  goodsTypeName string,
  gStoreType TINYINT,
  projectStatusName string,
  statusName string,
  businessModeType TINYINT,
  deleted TINYINT,
status TINYINT
)
  STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
  TBLPROPERTIES(
  'es.resource'='contract_index_v2/contract_type_v2',
  'es.nodes'='172.16.120.174,172.16.120.159,172.16.120.160',
  'es.port'='9200',
  'es.nodes.wan.only'='true',
  'es.read.metadata' = 'true',
  'es.mapping.names' = '
id:id,
warehouseCode:warehouseCode,
warehouseTypeCode:warehouseTypeCode,
detailAddress:detailAddress,
cityCode:cityCode,
cityName:cityName,
provinceCode:provinceCode,
provinceName:provinceName,
zone:zone,
zoneName:zoneName,
goodsTypeName:goodsTypeName,
gStoreType:gStoreType,
projectStatusName:projectStatusName,
statusName:statusName,
businessModeType:businessModeType,
deleted:deleted,
status:status');

Hive中数据展示效果:

 
使用Hive读写ElasticSearch中的数据_第2张图片
 
3.参考博文:
     https://resources.zaloni.com/blog/hive-basics-elasticsearch-integration
     https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html

你可能感兴趣的:(#,Hive)