flume接收数据传入hbase,并生成指定的rowkey和column

接口源码文章:https://blogs.apache.org/flume/entry/streaming_data_into_apache_hbase
参考博客:https://blog.csdn.net/m0_37739193/article/details/72868456

目的:flume从event中取出数据作为hbase的rowkey
使用flume接收数据,再传入hbase中,要求中间数据不落地。
flume使用http source入口,使用sink连接hbase实现数据导入,并且通过channels使flume的内存数据保存到本地磁盘(防止集群出现故障,数据可以备份至本地)
传入数据格式为 http:10.0.0.1_{asdasd} 格式说明(url_数据)

hbase存储的结果为:
在这里插入图片描述rowkey:当前时间_url
value:数据
即要对传入的数据进行切分,将url作为rowkey的一部分,当前时间作为另一部分,数据存储到value中

步骤:
1.重写flume中能指定rowkey的源码(HbaseEventSerializer接口)。再打成jar包
java源码见下面:

2.将制作jar包放入flume的/home/hadoop/apache-flume-1.6.0-cdh5.5.2-bin/lib目录下
在这里插入图片描述
3.flume配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = http
a1.sources.r1.port = 44444
a1.sources.r1.bind = 10.0.0.183

# Describe the sink
a1.sinks.k1.type = hbase
a1.sinks.k1.channel = c1
a1.sinks.k1.table = httpdata
a1.sinks.k1.columnFamily = a
a1.sinks.k1.serializer = com.hbase.Rowkey
a1.sinks.k1.channel = memoryChannel

# Use a channel which buffers events in memory
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/x/oyzm_test/flu-hbase/checkpoint/
a1.channels.c1.useDualCheckpoints = false
a1.channels.c1.dataDirs = /home/x/oyzm_test/flu-hbase/flumedir/
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 10000

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 
a1.sinks.k1.channel = c1

4.在hbase中建表 create 'httpdata,‘a’

5.flume启动命令
flume-ng agent -c . -f /mysoftware/flume-1.7.0/conf/hbase_simple.conf -n a1 -Dflume.root.logger=INFO,console

6.flume数据写入命令
curl -X POST -d’[{“body”:“http:10.0.0.1_{asdasd}”}]’ http://10.0.0.183:44444

hbase中数据结果:
20181108104034_http:10.0.0.183 column=a:data, timestamp=1541644834926, value={asdasd}

java源码:

package com.hbase;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

import org.apache.flume.Context;  
import org.apache.flume.Event;
import org.apache.flume.conf.ComponentConfiguration;  
import org.apache.flume.sink.hbase.HbaseEventSerializer;  
import org.apache.hadoop.hbase.client.Increment;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Row;

public class Rowkey implements HbaseEventSerializer {   
   //列族(不用管)
    private byte[] colFam="cf".getBytes();  
    //获取文件
    private Event currentEvent;  
    
    public void initialize(Event event, byte[] colFam) {  
        //byte[]字节型数组  
        this.currentEvent = event;
        this.colFam = colFam;  
    }  
    public void configure(Context context) {}  
    
    public void configure(ComponentConfiguration conf) {  
    }  
    
    //指定rowkey,单元格修饰名,值
    public List getActions() {  
         // 切分 currentEvent文件 从中拿到的值
         String eventStr = new String(currentEvent.getBody());
         
         //body格式为:url_value
         String url = eventStr.split("_")[0];
         String value = eventStr.split("_")[1];
         
         //得到系统日期  
		 Date d = new Date();
		 SimpleDateFormat df = new SimpleDateFormat("yyyyMMddHHmmss");
         //rowkey
         byte[] currentRowKey = (df.format(d)+"_"+url).getBytes(); 
         
         //hbase的put操作
         List puts = new ArrayList();  
         Put putReq = new Put(currentRowKey);  
         //putReq.addColumn  列族,单元格修饰名(可指定),值
         //putReq:  column=a, data, value={asdasd} 
         putReq.addColumn(colFam,  "data".getBytes(), value.getBytes());  
         puts.add(putReq);                
         return puts;  
    }   
    public List getIncrements() {  
        List incs = new ArrayList();      
        return incs;  
    }  
   //关闭流
    public void close() {  
        colFam = null;  
        currentEvent = null;  
    }  
} 

pom文件:

  
    
      junit
      junit
      4.12
      test
    
    
	    org.apache.flume.flume-ng-sinks
		flume-ng-hbase-sink
		1.7.0
	
   	
		org.apache.hbase
		hbase-client
		1.2.4
	
	
         jdk.tools
         jdk.tools
         1.8
         system
         ${JAVA_HOME}/lib/tools.jar
	
  

你可能感兴趣的:(flume,hbase)