由于上级领导给的需求,让把hbase中的数据通过mapreduce导入到es中,并且需要重新设计es中存储的结构。因为本人菜鸟一名,初次接触es就遇到了以下的坑,在此总结出来:
首先,先推荐两个博客,我就是参考这两个博客加上自己上百次的实验,最终完成了任务,非常感谢这两位博主的分享。
1.https://blog.csdn.net/fxsdbt520/article/details/53893421?utm_source=itdadao&utm_medium=referral
2.https://blog.csdn.net/u014231523/article/details/52816218
下面就是我本人遇到的坑:
通过博客一将基本的项目搭建好了,但是运行就会报下面的错
Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:229)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:202)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324)
at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:88)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.(RegionSizeCalculator.java:81)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.eminem.hadoop.ESInitCall.run(ESInitCall.java:51)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.eminem.hadoop.ESInitCall.main(ESInitCall.java:75)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1185)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1152)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
... 26 more
这是由于hbase和es中的guava包冲突报的错,hbase用的是guava-12.0.1,es用的是guava-18,在网上找了半天的解决办法,最终是通过另建一个只引es包的maven项目,然后将hbase和es分开如下
这是一个空项目,只配置了pom.xml文件,以下是pom.xml文件的配置
4.0.0
my.elasticsearch
es-shaded
1.0-SNAPSHOT
jar
es-shaded
http://maven.apache.org
UTF-8
2.3.4
org.elasticsearch
elasticsearch
${elasticsearch.version}
maven-compiler-plugin
2.3.2
1.8
org.apache.maven.plugins
maven-shade-plugin
2.4.1
false
package
shade
com.google.guava
my.elasticsearch.guava
org.joda
my.elasticsearch.joda
com.google.common
my.elasticsearch.common
org.elasticsearch
my.elasticsearch
配置好后,利用maven 清理,更新,打包,会在maven的源路径下生成一个jar包,然后关闭这个编译好的项目,在主项目中的pom.xml中引用这个jar包。
注:
这一段配置是非常重要的,你在主项目中需要更换导入类的包名。例如:
基本的问题解决了,在eclipse中运行程序没有问题,但是在集群中运行却一直报
Error: FAIL_ON_SYMBOL_HASH_OVERFLOW
这个错误。这让我很头疼,查看集群的日志,错误如下:
2018-08-14 14:51:15,802 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: FAIL_ON_SYMBOL_HASH_OVERFLOW
at my.elasticsearch.common.xcontent.json.JsonXContent.(JsonXContent.java:49)
at my.elasticsearch.common.xcontent.XContentFactory.contentBuilder(XContentFactory.java:122)
at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:382)
at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:372)
at my.elasticsearch.action.update.UpdateRequest.doc(UpdateRequest.java:472)
at my.elasticsearch.action.update.UpdateRequestBuilder.setDoc(UpdateRequestBuilder.java:163)
at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:135)
at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
在网上查找解决方法,都是在说jackson版本不对,但是我在两个项目的pom文件中加了各种版本jackson但是都不行,最后尝试之前的maven映射引用,就在es的那个项目的配置文件中加入了
这个配置,结果文件是:
加入了这个配置,然后将项目重新打包编译,放到集群上运行,结果成功!
下面是我在码云上面分享的自己根据博客一修改好的代码:
https://gitee.com/zhangxiaoze/hbaseToEs