Solr /export 海量数据导出实现

阅读更多

    Solr需要流式导出海量数据,导出数据是基于流式的,当服务端match到第一条数据之后就会向客户端flush出数据。

    需要导出的列需要将schema中field元素的docvalue设置为true,并且在solrconfig.xml中配置


 
   {!xport}
   xsort
   false
 
 
   query
 

 

 客户端的查询代码如下: 

                final String[] fl = StringUtils.split(fields, ",");
		SolrClient client = new HttpSolrClient(url);

		query.setDistrib(false);
		query.setFields(fields);
		query.setRows(9999999);

		final PrintWriter writer = new PrintWriter(new OutputStreamWriter(
				FileUtils.openOutputStream(outfile), Charset.forName("utf8")));

		for (String f : fl) {
			writer.print(f);
			writer.print(",");
		}

		final AtomicInteger count = new AtomicInteger(0);
		QueryResponse result = client.queryAndStreamResponse(query,
				new StreamingResponseCallback() {
					@Override
		public void streamSolrDocument(SolrDocument doc) {
						// process doc
					}

		public void streamDocListInfo(long numFound, long start,
							Float maxScore) {
						// writer.println("numFound:" + numFound);
					}
				});
		writer.close();
		System.out.println("numFound:" + result.getResults().getNumFound());
		client.close();

 

solr服务端相关的代码:

QP:

  ExportQParserPlugin 在export handler中使用QP

查询结果流式排序输出:

  SortingResponseWriter

 

你可能感兴趣的:(Solr /export 海量数据导出实现)