solr5.3.1 json xml csv 等文件类型,建立索引


curl http://localhost:8080/solr/update/json --data-binary @books.json -H 'Content-type:text/json; charset=utf-8'

可是换了solr5 尝试了很多次通过curl命令建立索引都失败了,没搞定为什么。

后来通过post.jar 提交本地文件的数据,例如json,xml等,支持很多种数据类型
post.jar 在下载的solr压缩包中的example-docs中,下面的命令需要在post.jar所在目录下执行。

  1. 加载一个json文件(需要将参数换成自己的solr core 和端口)
 java -Dc=core0 -Dport=8080 -Dtype=application/json  -jar post.jar books.json
  1. 加载该目录下的所有该类型的文件(批量索引)
  java -Dc=core0 -Dport=8080 -Dtype=application/json  -jar post.jar dir(目录) 



post.jar 可以接受很多参数。

SimplePostTool version 5.1.0
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]

Supported System Properties and their defaults:
  -Durl=URL> (overrides -Dc option if specified)
  -Ddata=files|web|args|stdin (default=files)
  -Dtype= (default=application/xml)
  -Dhost= (default: localhost)
  -Dport= (default: 8983)
  -Dauto=yes|no (default=no)
  -Drecursive=yes|no| (default=0)
  -Ddelay=<seconds> (default=0 for files, 10 for web)
  -Dfiletypes=[,,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
  -Dparams="=[&=...]" (values must be URL-encoded)
  -Dcommit=yes|no (default=yes)
  -Doptimize=yes|no (default=no)
  -Dout=yes|no (default=no)

This is a simple command line tool for POSTing raw data to a Solr port.
NOTE: Specifying the url/core/collection name is mandatory.
Data can be read from files specified as commandline args,
URLs specified as args, as raw commandline arg strings or via STDIN.
  java -Dc=gettingstarted -jar post.jar *.xml
  java -Ddata=args -Dc=gettingstarted -jar post.jar '42'
  java -Ddata=stdin -Dc=gettingstarted -jar post.jar < hd.xml
  java -Ddata=web -Dc=gettingstarted -jar post.jar
  java -Dtype=text/csv -Dc=gettingstarted -jar post.jar *.csv
  java -Dtype=application/json -Dc=gettingstarted -jar post.jar *.json
  java -Durl=http://localhost:8983/solr/techproducts/update/extract -jar post.jar solr-word.pdf
  java -Dauto -Dc=gettingstarted -jar post.jar *
  java -Dauto -Dc=gettingstarted -Drecursive -jar post.jar afolder
  java -Dauto -Dc=gettingstarted -Dfiletypes=ppt,html -jar post.jar afolder
The options controlled by System Properties include the Solr
URL to POST to, the Content-Type of the data, whether a commit
or optimize should be executed, and whether the response should
be written to STDOUT. If auto=yes the tool will try to set type
automatically from file name. When posting rich documents the
file name will be propagated as "" and also used
as "". You may override these or any other request parameter
through the -Dparams property. To do a commit only, use "-" as argument.
The web mode is a simple crawler following links within domain, default delay=10s.
