上一篇(http://blog.csdn.net/peppengliu/article/details/51463918)做了solr测试环境的安装,本篇学习下solr中document及field。
下面就介绍下document、field及相关的概念:
document:文档是索引的基本单元,它是一组待索引数据的描述的集合。document由field组成。一个简单的document示例如下:
{"id":"138761234112","goods_name":"product","value":12}
field:document的组成部分,它描述了了待索引数据的更详细的信息。定义了document中每个field的数据类型,由
default
The default value for this field if none is provided while adding documents
indexed=true|false
True if this field should be "indexed". If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
stored=true|false
True if the value of the field should be retrievable during a search, or if you're using highlighting or MoreLikeThis.
compressed=true|false
True if this field should be stored using gzip compression. (This will only apply if the field type is compressible; among the standard field types, only TextField and StrField are.)
compressThreshold=
multiValued=true|false
True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document
omitNorms=true|false
This is arguably an advanced option.
Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
termVectors=false|true > Solr 1.1
If set, include full term vector info.
If enabled, often also used with termPositions="true" and termOffsets="true".
To use interactively, requires TermVectorComponent
Corresponds to TV button in Luke, and V field attribute.
omitTermFreqAndPositions=true|false Solr1.4
If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option fail with an exception. Prior to Solr4.0 the queries would silently fail to find documents.
omitPositions=true|false Solr3.4
If set, omits positions, but keeps term frequencies
field分为以下几类:define fields(由field定义)、
copyField(由copyField定义)、dynamicField(由dynamicField定义)。field analysis:field值分析器,定义了field域中value的分析方法。当field需要进行额外处理时(如分词、过滤等)需定义此项。典型配置如下:
该配置型定义一个名为text_general的数据类型,当field中type为text_general时,自动为该field的值使用该标签中定义的类型来处理field的值。
以上配置均配置于schema.xml中,想了解其他配置项,请参考:http://wiki.apache.org/solr/SchemaXml
本文主要参考http://wiki.apache.org/solr/SchemaXml及https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide,有总结不足之处,欢迎留言指正。