elasticsearch索引数据导入导出

最近,因为业务需求,需要把一个集群的索引数据导入到另外一个索引中去,就是简单的读取索引文件并且将读到的数据写入到索引中,代码如下(注意,代码适应es2.0版本以下,其中参数path格式为:索引数据路径,一直指向到../索引名称/分片数/index这块):

public static void executeRecreate(String path, String indexName,
            String indexType) {
        Settings settings = ImmutableSettings.settingsBuilder()
                .put("cluster.name", "es").put("client.transport.sniff", true)
                .put("client.transport.ping_timeout", "30s")
                .put("client.transport.nodes_sampler_interval", "30s").build();
        TransportClient client = new TransportClient(settings);
        client.addTransportAddress(
                new InetSocketTransportAddress("127.0.0.1", 9300));

        File file = new File(path);
        try {
            Directory dir = FSDirectory.open(file);
            IndexReader reader = DirectoryReader.open(dir);
            IndexSearcher searcher = new IndexSearcher(reader);

            int maxDoc = reader.maxDoc();
            int docNum = 0;
            BulkRequestBuilder bulkBuilder = new BulkRequestBuilder(client);
            for (int i = 0; i < maxDoc; i++) {
                Document doc = searcher.doc(i);
                List ifList = doc.getFields();
                for (IndexableField iField : ifList) {
                    if ((iField instanceof StoredField)) {
                        BytesRef bytesRef = ((StoredField) iField)
                                .binaryValue();
                        IndexRequest req = (IndexRequest) new IndexRequestBuilder(
                                client, indexName).setType(indexType)
                                        .setSource(bytesRef.utf8ToString())
                                        .request();
                        bulkBuilder.add(req);
                        docNum++;
                    }
                }
                if (i == maxDoc - 1) {
                    bulkBuilder.execute();
                } else if (docNum == 1000) {
                    bulkBuilder.execute();

                    bulkBuilder = new BulkRequestBuilder(client);
                    docNum = 0;
                }
            }
        } catch (Exception e) {
            System.out.println("fail message :" + e.getMessage());
        }
    }

还有第二种方法通过scroll方式去读取,然后写到另外一个索引中(建议使用这种方式,不用一一指定路径了,速度也可以。。),代码如下:

此代码是基于es5.0以上的,5.0以下应该换下类名也是一样的:

//build source settings
    	Settings settings = Settings.builder()
    			.put("cluster.name", clustName).put("client.transport.sniff", true)
    			.put("client.transport.ping_timeout", "30s")
    			.put("client.transport.nodes_sampler_interval", "30s").build();
    	TransportClient client = new PreBuiltTransportClient(settings);
    	client.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress(sourceIp, 9300)));
    	
    	//build destination settings
    	Settings destiSettings = Settings.builder()
    			.put("cluster.name", destiClustName).put("client.transport.sniff", true)
    			.put("client.transport.ping_timeout", "30s")
    			.put("client.transport.nodes_sampler_interval", "30s").build();
    	TransportClient destiClient = new PreBuiltTransportClient(destiSettings);
    	destiClient.addTransportAddress(new InetSocketTransportAddress(new InetSocketAddress(destiIp, 9300)));
    	
    	SearchResponse scrollResp = client.prepareSearch(indexName)
    			.setScroll(new TimeValue(60000)).setSize(1000).execute().actionGet();
    	
    	//build destination bulk
    	BulkRequestBuilder bulk = destiClient.prepareBulk();
    	
    	ExecutorService executor = Executors.newFixedThreadPool(5);
    	while(true){
    		bulk = destiClient.prepareBulk();
    		final BulkRequestBuilder bulk_new = bulk;
    		for(SearchHit hit : scrollResp.getHits().getHits()){
    			IndexRequest req = destiClient.prepareIndex().setIndex(destiIndexName)
			.setType(destiIndexType).setSource(hit.getSourceAsString()).request();
    			bulk_new.add(req);
    		}
    		executor.execute(new Runnable() {
				@Override
				public void run() {
					bulk_new.execute();
				}
			});
    		
    		Thread.sleep(10);
    		
    		scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
    				.setScroll(new TimeValue(60000)).execute().actionGet();
    		if(scrollResp.getHits().getHits().length == 0){
    			break;
    		}
    	}

然而,最新我又发现一个开源的小工具:elasticdump,他也是可以实现导出一个索引的数据到文件中,并且可以导出索引数据到另一个集群的索引中等等功能(更详细的可以参照github:https://github.com/taskrabbit/elasticsearch-dump)

不过,我测试了下,他有个缺点,就是执行的速度比较慢,没有我上面的那个直接读文件再写快速。

我也简单介绍下其使用:

1.首先需要安装nodejs和npm

下载nodejs压缩包(可以到我的资源下载,也可以到官网下载),解压缩,进去,会看到bin目录下有两个文件,node和npm

2.然后通过下面两步将node和npm加到环境变量中

  ln -s /tools/node-v0.10.32-linux-x64/bin/node /usr/local/bin/node
ln -s /tools/node-v0.10.32-linux-x64/bin/npm /usr/local/bin/npm

通过node -v和npm -v可以查看到版本号,准备工作就OK了

3.可以安装elasticdump了

npm install elasticdump -g

4.安装完成后,就可以使用了,下面举一个例子(将索引a中的数据导入到索引b中):

elasticdump --input=http://localhost:9200/a --output=http://localhost:9200/b --type=data


大家在平常遇到这种需求的时候,以上两种方式都可以使用,就是第二种方式有点慢。

有问题欢迎交流哈。。

你可能感兴趣的:(elasticsearch)