flume 1.9.0 把kafka数据sink 到 hive

flume 1.9.0 把kafka数据sink 到 hive

  • 创建hive表
    • hive建表要求
    • hive 配置要求
  • 配置flume的conf文件
  • 在flume lib中添加依赖包
  • 在flume bin中运行命令
  • 最后在hive表中查询可以看到数据了

创建hive表

CREATE TABLE IF NOT EXISTS  user (userid string,sex string) PARTITIONED BY (dt string) clustered by (sex) into 2 buckets  row format delimited fields terminated by ',' stored as orc tblproperties  ('transactional'='true');

hive建表要求

1.必须要分桶
2. 存储格式必须orc
3. 必须开启事务

hive 配置要求

hive> set hive.support.concurrency=true;
hive> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

配置flume的conf文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

#数据源 kafka source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.zookeeperConnect = xxx:2181
a1.sources.r1.topic = user
a1.sources.r1.groupId = flume
a1.sources.r1.kafka.consumer.timeout.ms = 1000
#a1.sources.r1.kafka.consumer.auto.offset.reset = earliest

#写入hive Describe the sink
a1.sinks.k1.type = hive
a1.sinks.k1.hive.metastore = thrift://xxxx:9083
a1.sinks.k1.hive.database = default
a1.sinks.k1.hive.table = user
a1.sinks.k1.hive.partition = %y-%m-%d
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = ","
a1.sinks.k1.serializer.serdeSeparator = ','
a1.sinks.k1.serializer.fieldnames =userid,sex

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 15000
a1.channels.c1.transactionCapacity = 15000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

在flume lib中添加依赖包

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

在flume bin中运行命令

flume-ng agent --conf ../conf --conf-file ../conf/hive.conf --name a1 --Dflume.root.logger=INFO,console

最后在hive表中查询可以看到数据了

你可能感兴趣的:(flume 1.9.0 把kafka数据sink 到 hive)