《Solr in action》笔记与总结:一

之前没做过搜索引擎相关的业务,最近口袋的文献和指南搜索需要进行调整优化,遂入坑solr,出乎意料的是solr的相关资料非常少(更别提中文了),官网的介绍又非常的干,堆砌各种example,刚好发现了一本《solr in action》(以solr4为例讲解),读了几章后,收获颇丰,所以这次的双周分享是摘录《solr in action》中那些让我感到醍醐灌顶的话。

Why do I need a search engine?

Search engines like Solr are optimized to handle data exhibiting four main characteristics:

  1. Text-centric(文本为中心)
  2. Read- dominant(以读为主)
  3. Document- oriented(面向文档)
  4. Flexible schema(灵活的schema)


We think text-centric is more appropriate for describing the type of data Solr handles.
Of course, a search engine also supports non text data such as dates and numbers, but its primary strength is handling text data based on natural language.


Read- dominant

Think of read-dominant as meaning that documents are read far more often than they’re created or updated.
if you must update existing data in a search engine often, that could be an indication that a search engine might not be the best solution for your needs. Another NoSQL technology, like Cassandra, might be a better choice when you need fast random writes to existing data.



Ina search engine, a document is a self-contained collection of fields, in which each field only holds data and doesn’t contain nested fields.
In general, you should store the minimal set of information for each document needed to satisfy search requirements.


Flexible schema

In a relational database, every row in a table has the same structure. In Solr, documents can have different fields.


Don’t use a search engine to ...

  1. First, search engines are designed to return a small set of documents per query, usually 10 to 100.
  2. Another use case in which you shouldn’t use a search engine is deep analytic tasks that require access to a large subset of the index.
  3. Also, there’s no direct support in most search engines for document-level security, at least not in Solr.

What is Solr?

Information retrieval engine

Solr is built on Apache Lucene, a popular, Java-based, open source, information retrieval library.

In a nutshell, Solr uses Lucene to provide the core data structures for indexing documents and executing searches to find documents.

one key difference between a Lucene query and a database query is that in Lucene results are ranked by their relevance to a query, and database results can only be sorted by one or more of the table columns.

Map Reduce is a programming model that distributes large-scaled data-processing operations across a cluster of commodity servers by formulating an algorithm into two phases: map and reduce.

With Lucene, you need to write Java code to define fields and how to analyze those fields. Solr adds a simple, declarative way to define the structure of your index and how you want fields to be represented and analyzed: an XML-configuration document named schema.xml. Solr also provides copy and dynamic fields.

ok,既然Solr is built on Lucene,那么两者有什么区别呢?Lucene其实是用户不友好的,直接使用Lucene的话,你需要写繁琐的java代码去定义field,而solr提供了简单的xml文件来配置field,同时solr还提供了copy and dynamic fields。
所谓copy field,提供了一个联合field,即一个name可以对应多个Field。

你可能感兴趣的:(《Solr in action》笔记与总结:一)