使用Mapreduce将hbase 1.1.2 数据导入elasticsearch 2.3.4 中所遇到的坑

由于上级领导给的需求,让把hbase中的数据通过mapreduce导入到es中,并且需要重新设计es中存储的结构。因为本人菜鸟一名,初次接触es就遇到了以下的坑,在此总结出来:

首先,先推荐两个博客,我就是参考这两个博客加上自己上百次的实验,最终完成了任务,非常感谢这两位博主的分享。

1.https://blog.csdn.net/fxsdbt520/article/details/53893421?utm_source=itdadao&utm_medium=referral 

2.https://blog.csdn.net/u014231523/article/details/52816218 

下面就是我本人遇到的坑:

通过博客一将基本的项目搭建好了,但是运行就会报下面的错

Exception in thread "main" org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.translateException(RpcRetryingCaller.java:229)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:202)
	at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
	at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
	at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
	at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:155)
	at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
	at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324)
	at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:88)
	at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94)
	at org.apache.hadoop.hbase.util.RegionSizeCalculator.(RegionSizeCalculator.java:81)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
	at org.eminem.hadoop.ESInitCall.run(ESInitCall.java:51)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.eminem.hadoop.ESInitCall.main(ESInitCall.java:75)
Caused by: java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class org.apache.hadoop.hbase.zookeeper.MetaTableLocator
	at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:596)
	at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580)
	at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559)
	at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1185)
	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1152)
	at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
	... 26 more

这是由于hbase和es中的guava包冲突报的错,hbase用的是guava-12.0.1,es用的是guava-18,在网上找了半天的解决办法,最终是通过另建一个只引es包的maven项目,然后将hbase和es分开如下

使用Mapreduce将hbase 1.1.2 数据导入elasticsearch 2.3.4 中所遇到的坑_第1张图片

这是一个空项目,只配置了pom.xml文件,以下是pom.xml文件的配置


	4.0.0

	my.elasticsearch
	es-shaded
	1.0-SNAPSHOT
	jar

	es-shaded
	http://maven.apache.org

	
		UTF-8
		2.3.4
	

	
		
			org.elasticsearch
			elasticsearch
			${elasticsearch.version}
		
	
	
	
		
			
				maven-compiler-plugin
				2.3.2
				
					1.8
					1.8
				
			
			
				org.apache.maven.plugins
				maven-shade-plugin
				2.4.1
				
					false
				
				
					
						package
						
							shade
						
						
							
								
									com.google.guava
									my.elasticsearch.guava
								
								
									org.joda
									my.elasticsearch.joda
								
								
									com.google.common
									my.elasticsearch.common
								
								
									org.elasticsearch
									my.elasticsearch
								
							
							
								
							
						
					
				
			
		
	

配置好后,利用maven 清理,更新,打包,会在maven的源路径下生成一个jar包,然后关闭这个编译好的项目,在主项目中的pom.xml中引用这个jar包。

注:

使用Mapreduce将hbase 1.1.2 数据导入elasticsearch 2.3.4 中所遇到的坑_第2张图片

这一段配置是非常重要的,你在主项目中需要更换导入类的包名。例如:

基本的问题解决了,在eclipse中运行程序没有问题,但是在集群中运行却一直报

Error: FAIL_ON_SYMBOL_HASH_OVERFLOW 

这个错误。这让我很头疼,查看集群的日志,错误如下:

2018-08-14 14:51:15,802 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchFieldError: FAIL_ON_SYMBOL_HASH_OVERFLOW
	at my.elasticsearch.common.xcontent.json.JsonXContent.(JsonXContent.java:49)
	at my.elasticsearch.common.xcontent.XContentFactory.contentBuilder(XContentFactory.java:122)
	at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:382)
	at my.elasticsearch.action.index.IndexRequest.source(IndexRequest.java:372)
	at my.elasticsearch.action.update.UpdateRequest.doc(UpdateRequest.java:472)
	at my.elasticsearch.action.update.UpdateRequestBuilder.setDoc(UpdateRequestBuilder.java:163)
	at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:135)
	at org.eminem.hadoop.mapper.ESInitMapper.map(ESInitMapper.java:1)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

在网上查找解决方法,都是在说jackson版本不对,但是我在两个项目的pom文件中加了各种版本jackson但是都不行,最后尝试之前的maven映射引用,就在es的那个项目的配置文件中加入了

                               
                                    com.fasterxml.jackson
                                    my.elasticsearch
                                

这个配置,结果文件是:

使用Mapreduce将hbase 1.1.2 数据导入elasticsearch 2.3.4 中所遇到的坑_第3张图片

加入了这个配置,然后将项目重新打包编译,放到集群上运行,结果成功!

使用Mapreduce将hbase 1.1.2 数据导入elasticsearch 2.3.4 中所遇到的坑_第4张图片

 

下面是我在码云上面分享的自己根据博客一修改好的代码:

https://gitee.com/zhangxiaoze/hbaseToEs

你可能感兴趣的:(hbase,elasticsearch,mapreduce)