Solr5 POST TOOL

Solr includes a simple command line tool for POSTing various types of content to a Solr server. The tool is bin/post. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the Windows section below.

To run it, open a window and enter:

bin/post -c gettingstarted example/films/films.json

This will contact the server at localhost:8983. Specifying the collection/core name is mandatory. The '-help' (or simply '-h' option will output information on its usage (i.e., bin/post -help).

Using the bin/post Tool

Specifying either the collection/core name or the full update url is mandatory when using bin/post.

The basic usage of bin/post is:

$ bin/post -help
 
Usage: post -c [OPTIONS]
     or post -help
    collection name defaults to DEFAULT_SOLR_COLLECTION if not specified
 
OPTIONS
=======
   Solr options:
     -url (overrides collection, host, and port)
     -host (default: localhost)
     -port (default: 8983)
     -commit yes|no (default: yes)
   Web crawl options:
     -recursive (default: 1)
     -delay (default: 10)
   Directory crawl options:
     -delay (default: 0)
   stdin/args options:
     -type (default: application/xml)
   Other options:
     -filetypes [,,...] (default: xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
     -params "=[&=...]" (values must be URL-encoded; these pass through to Solr update request)
     -out yes|no (default: no; yes outputs Solr response to console)
...

 

Examples

There are several ways to use bin/post.  This section presents several examples.

Indexing XML

Add all documents with file extension .xml to collection or core named gettingstarted.

bin/post -c gettingstarted *.xml

Add all documents with file extension .xml to the gettingstarted collection/core on Solr running on port 8984.

bin/post -c gettingstarted -port 8984 *.xml

Send XML arguments to delete a document from gettingstarted.

bin/post -c gettingstarted -d '42'

Indexing CSV

Index all CSV files into gettingstarted:

bin/post -c gettingstarted *.csv

Index a tab-separated file into gettingstarted:

bin/post -c signals -params "separator=%09" -type text/csv data.tsv

The content type (-type) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is.  The CSV handler supports the separator parameter, and is passed through using the -params setting.

Indexing JSON

Index all JSON files into gettingstarted.

bin/post -c gettingstarted *.json

Indexing rich documents (PDF, Word, HTML, etc)

Index a PDF file into gettingstarted.

bin/post -c gettingstarted a.pdf

Automatically detect content types in a folder, and recursively scan it for documents for indexing into gettingstarted.

bin/post -c gettingstarted afolder/

Automatically detect content types in a folder, but limit it to PPT and HTML files and index into gettingstarted.

bin/post -c gettingstarted -filetypes ppt,html afolder/

Windows support

bin/post exists currently only as a Unix shell script, however it delegates its work to a cross-platform capable Java program.  The  SimplePostTool can be run directly in supported environments, including Windows.

SimplePostTool

The bin/post script currently delegates to a standalone Java program called SimplePostTool.  This tool, bundled into a executable JAR, can be run directly using java -jar example/exampledocs/post.jar.  See the help output and take it from there to post files, recurse a website or file system folder, or send direct commands to a Solr server.  

$ java -jar example/exampledocs/post.jar -h
SimplePostTool version 5.0.0
Usage: java [SystemProperties] -jar post.jar [-h|-] [ [...]]
.
.
.

你可能感兴趣的:(Solr,solr,post)