flume+Hbase搭建问题详解

近期在做日志系统,所以选配了一些方案,在搭建时候遇到了问题,所以记录下

flume-ng去连接hbase
# Describe  the sink
a1.sinks.k1.type  = org.apache.flume.sink.hbase.AsyncHBaseSink
a1.sinks.k1.table = Router #设置hbase的表名
a1.sinks.k1.columnFamily = log #设置hbase中的columnFamily
a1.sinks.k1.serializer.payloadColumn=serviceTime,browerOS,clientTime,screenHeight,screenWidth,url,userAgent,mobileDevice,gwId,mac # 设置hbase的column
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.BaimiAsyncHbaseEventSerializer # 设置serializer的处理类


重点说明几个属性 a1.sinks.k1.serializer.payloadColumn 中列出了所有的列名。a1.sinks.k1.serializer设置了flume serializer的处理类。BaimiAsyncHbaseEventSerializer类中会获取payloadColumn的内容,将它以逗号分隔,从而得出所有的列名。
因为flume-ng一般只支持简单的存入hbase

由于依赖的版本问题。此处需要将flume的lib 文件夹 下的protobuf用Hadoop-2.2.0中的2.5.0版本替换,还需要用hadoop-2.2.0中的guava替换flume的lib文件夹下的guava,删除原来相应的jar文件。启动即可生效。(PS:这是网上的说法,其实在用到最新的版本根本不需要这么搞

在搭建hbase时候发成了这样的问题
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
        at org.apache.flume.sink.hbase.HBaseSink.(HBaseSink.java:116)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:43)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:415)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
        at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 17 more


真无语 腰包hbase hbase-common-0.98.7-hadoop2.jar 考到flume里面

 follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTable
        at org.apache.flume.sink.hbase.HBaseSink$1.run(HBaseSink.java:148)
        at org.apache.flume.sink.hbase.HBaseSink$1.run(HBaseSink.java:145)
        at org.apache.flume.sink.hbase.HBaseSink.runPrivileged(HBaseSink.java:427)
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:145)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.HTable
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 14 more
又TM出现了这个,同解,考包吧hbase-client-0.98.7-hadoop2.jar这个

java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/protobuf/generated/MasterProtos$MasterService$BlockingInterface
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:399)
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:388)
        at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:269)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:198)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:160)
        at org.apache.flume.sink.hbase.HBaseSink$1.run(HBaseSink.java:148)
        at org.apache.flume.sink.hbase.HBaseSink$1.run(HBaseSink.java:145)
        at org.apache.flume.sink.hbase.HBaseSink.runPrivileged(HBaseSink.java:427)
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:145)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingInterface
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 21 more
依然缺少jar  hbase-protocol-0.98.7-hadoop2.jar

org.apache.flume.FlumeException: Could not load table, test from HBase
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:159)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:411)
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:388)
        at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:269)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:198)
        at org.apache.hadoop.hbase.client.HTable.(HTable.java:160)
        at org.apache.flume.sink.hbase.HBaseSink$1.run(HBaseSink.java:148)
        at org.apache.flume.sink.hbase.HBaseSink$1.run(HBaseSink.java:145)
        at org.apache.flume.sink.hbase.HBaseSink.runPrivileged(HBaseSink.java:427)
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:145)
        ... 10 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:409)
        ... 18 more
Caused by: java.lang.NoClassDefFoundError: org/cloudera/htrace/Trace
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:218)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479)
        at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
        at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:837)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:640)
        ... 23 more

依然缺少jar   htrace-core-2.04.jar 

这次启动没问题,但有个这个事情可能是我zookeeper没配置好
[ERROR - org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.checkIfBaseNodeAvailable(HConnectionManager.java:858)] The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.

表示被占用了,好吧,我们给别的人换一个端口,后来发现还真不行,flume-ng自己不能配置端口了,所以只能修改编译了


看了很多文档也没写到底什么原因各种奇葩解决方法更多,最后无奈去看官方文档发现
zookeeperQuorum 参数,有看了源代码发现
 String zkQuorum = context.getString("zookeeperQuorum");

    Integer port = null;

    if ((zkQuorum != null) && (!zkQuorum.isEmpty())) {
      StringBuilder zkBuilder = new StringBuilder();
      logger.info(new StringBuilder().append("Using ZK Quorum: ").append(zkQuorum).toString());
      String[] zkHosts = zkQuorum.split(",");
      int length = zkHosts.length;
      for (int i = 0; i < length; i++) {
        String[] zkHostAndPort = zkHosts[i].split(":");
        zkBuilder.append(zkHostAndPort[0].trim());
        if (i != length - 1)
          zkBuilder.append(",");
        else {
          zkQuorum = zkBuilder.toString();
        }
        if (zkHostAndPort[1] == null) {
          throw new FlumeException("Expected client port for the ZK node!");
        }
        if (port == null)
          port = Integer.valueOf(Integer.parseInt(zkHostAndPort[1].trim()));
        else if (!port.equals(Integer.valueOf(Integer.parseInt(zkHostAndPort[1].trim())))) {
          throw new FlumeException("All Zookeeper nodes in the quorum must use the same client port.");
        }
      }

      if (port == null) {
        port = Integer.valueOf(2181);
      }


哈哈,问题就简单咯加个配置搞定

java.net.ConnectException: 拒绝连接
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
又报拒绝连接,我勒个去,你不是内置的zookeeper吗,继续寻找真相,感觉flume-ng其实还是需要外置zookeeper,连接hbase时候两个zookeeper要一直的端口所以要和hbase的端口一致哦

搞定后发现出现了下面的问题
org.apache.flume.FlumeException: Error getting column family from HBase.Please verify that the table test and Column Family, cf exists in HBase, and the current user has permissions to access that table.
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:176)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Table test has no such column family cf
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:169)
        ... 10 more
2014-10-24 12:55:32,062 (lifecycleSupervisor-1-5) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@452d0297 counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.IllegalArgumentException: Please call stop before calling start on an old instance.
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:132)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
2014-10-24 12:55:35,063 (lifecycleSupervisor-1-0) [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:253)] Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@452d0297 counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.IllegalArgumentException: Please call stop before calling start on an old instance.
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
        at org.apache.flume.sink.hbase.HBaseSink.start(HBaseSink.java:132)
        at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:46)
        at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

坑爹啊,一看就是hbase column family 错了  在详读hbase文档后配置对了,启动成功

你可能感兴趣的:(Flume)