Flexible schema for new field:
We frequently want to expand our solr index to include new fields, normally, this requires adding new fields to solr’s schema.xml. Solr provides a handy feature called dynamic field, which allows us to dynamically add new fields into solr index without changing schema definition, as long as the field name following certain pattern.
We can define dynamic field for common data types and type information is built into the field name. Dynamic field’s “type” attribute maps to a “type” definition section in solr’s schema.xml which defines analyze/tokenize/filter behavior.
Eg: we want to add user rating field to the product document. User rating is a integer value with range 1-10 and each product can have multiple user ratings. In schema.xml, we have:
<fieldType name="int"class="solr.TrieIntField" precisionStep="0"omitNorms="true" positionIncrementGap="0"/>
<dynamicField name="*_int_mv" type="int"indexed="true" stored="true" multiValued="true"/>
To submit document to solr:
doc.addField(“userrate_int_mv”, rating1), doc.addField(“userrate_int_mv”, rating2)..
In java, we can easily use reflection or annotation to generate the field name based on field type + addition metadata
Sorting:
Use separate fields for searching and sorting: field for searching requires a full analyzer that splits it into multiple tokens. field for sorting needs to be preserved as a single token
Eg:
<fieldType name="sortabletext" class="solr.TextField" sortMissingLast="true" omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, sothe entire input string is preserved as a single token -->
<tokenizerclass="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what youexpect, which can be when you want your sorting to be case insensitive -->
<filterclass="solr.LowerCaseFilterFactory" />
<!-- The TrimFilterremoves any leading or trailing whitespace -->
<filterclass="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
<fieldType name="text"class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizerclass="solr.StandardTokenizerFactory" />
<filterclass="solr.LowerCaseFilterFactory" />
<filterclass="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />
<filterclass="solr.StopFilterFactory" ignoreCase="true"words="stopwords.txt" enablePositionIncrements="true" />
</analyzer>
</fieldType>
<dynamicField name="*_text"type="text" indexed="true" stored="true" />
<dynamicField name="*_sortabletext"type="sortabletext" indexed="true" stored="true"/>
Multi-language search:
Use separate field and field type for multi-language support, so that each language can have different tokenizer and filter configuration
Eg:
<fieldType name="text_zh" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizerclass="de.hybris.search.analyze.IKTokenizerFactory" useSmart="true"/>
<filterclass="de.hybris.search.suggest.PinYinFilterFactory"/>
<filterclass="de.hybris.platform.solrfacetsearch.ysolr.synonyms.HybrisSynonymFilterFactory" ignoreCase="true" synonyms="zh" coreName="${solr.core.name}"/>
</analyzer>
</fieldType>
<fieldType name="text_en"class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizerclass="solr.StandardTokenizerFactory" />
<filterclass="solr.StandardFilterFactory" />
<filterclass="solr.LowerCaseFilterFactory" />
<filterclass="de.hybris.platform.solrfacetsearch.ysolr.synonyms.HybrisSynonymFilterFactory" ignoreCase="true" synonyms="en"coreName="${solr.core.name}"/>
<filterclass="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" />
<filterclass="de.hybris.platform.solrfacetsearch.ysolr.stopwords.HybrisStopWordsFilterFactory" ignoreCase="true" coreName="${solr.core.name}"/>
<filterclass="solr.StopFilterFactory"words="lang/stopwords_en.txt" ignoreCase="true" />
<filterclass="solr.ASCIIFoldingFilterFactory" />
<filterclass="solr.SnowballPorterFilterFactory" language="English"/>
</analyzer>
</fieldType>
<dynamicField name="*_text_en"type="text_en" indexed="true" stored="true" />
<dynamicField name="*_text_zh"type="text_zh" indexed="true" stored="true" />
Multi-language suggestion:
Define multiple spell checkers with different names inside SpellCheckComponent. Each spell checker is for a specific language and is built on a different field with different analyzer configuration
Eg:
<requestHandler name="/suggest"class="solr.SearchHandler">
<lstname="defaults">
<strname="spellcheck">true</str>
<strname="spellcheck.dictionary">default</str>
<strname="spellcheck.onlyMorePopular">true</str>
<strname="spellcheck.count">5</str>
<strname="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent name="suggest"class="solr.SpellCheckComponent">
<strname="queryAnalyzerFieldType">text_spell</str>
<lstname="spellchecker">
<strname="name">default</str>
<strname="classname">org.apache.solr.spelling.suggest.Suggester</str>
<strname="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<strname="field">autosuggest_en</str>
<strname="buildOnCommit">true</str>
<strname="buildOnOptimize">true</str>
<strname="accuracy">0.35</str>
</lst>
<lstname="spellchecker">
<strname="name">en</str>
<strname="classname">org.apache.solr.spelling.suggest.Suggester</str>
<strname="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<strname="field">autosuggest_en</str>
<strname="buildOnCommit">true</str>
<strname="buildOnOptimize">true</str>
<strname="accuracy">0.35</str>
</lst>
<lstname="spellchecker">
<strname="name">zh</str>
<strname="classname">org.apache.solr.spelling.suggest.Suggester</str>
<strname="lookupImpl">com.hybris.search.suggest.PinYinTSTLookupFactory</str>
<strname="storeDir">spellcheckdata</str>
<strname="field">autosuggest_zh</str>
<strname="buildOnCommit">true</str>
<strname="buildOnOptimize">true</str>
<strname="accuracy">0.35</str>
</lst>
</searchComponent>
During search time, format the query as:
Query.setQueryType(“/suggest”) -> matches name of the request handler
Query.set(“spellcheck.dictionary”, “zh”) -> matches name of the spellchecker in solr configuration
Query.set(“spellcheck.q”, autosuggest keyword)
Multi-value facet search:
A field may appear in filter as well as facet, resulting in all facet count=0 except for value appearing in filter
Eg: q=mainquery&fq=status:public&fq=doctype:pdf&facet=on&facet.field=doctype
Use tag and exclusion to solve the problem: still return facet count for the other values that are not included in filter
Eg: q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype
Support faceting on the same field with different exclusions
Eg: facet.field={!ex=dt key=mylabel}doctype
Renames doctype to mylabel with exclusion as dt -- useful for display purpose
Multi-core:
It is usually good to leverage solr's multi-core feature, one core for each data entity, Eg, product, location, or multiple catalogs, provide that they are relatively independent. Cores are configured in solr.xml. Each one has an instanceDir for configuration and dataDir for storing index etc. By using multi-core, you can have different indexing and searching configuration for each data entity which leads to greater flexibility.
For indexing, multi-core can also be used. You can create a backup core for updating index, where the original core still serves normal request and is unaffected. Once the update completes, you perform a core swap operation, and the updated core serves user requests. This has two benefits: 1. search request's response time is not affected while indexing is performed. 2. you can easily rollback to the original index if update corrupts the index.