1. UIMA 集成
你可以使用solr集成Apache的非结构化信息管理架构(UIMA).UIMA可以让你定义自己的分析引擎通道,逐步添加元数据到文档的标注.
关于Solr UIMA的更多信息,参考https://wiki.apache.org/solr/SolrUIMA.
1.1 Configuring UIMA
solr UIMA的UpdateRequestProcessor是一个自定义的更新请求处理器.发送它们给UIMA管道,然后返回具有丰富元数据的文档.按照下面步骤配置UIMA:
1. solrconfig.xml,复制/solr-4.x.y/dist/solr-uima-4.x.y.jar包和它的contrib/uima/lib下面的类库到solr的类库目录下.
<lib dir="../../contrib/uima/lib" /> <lib dir="../../dist/" regex="solr-uima-\d.*\.jar" />
2.schema.xml中,添加元数据字段:
<field name="language" type="string" indexed="true" stored="true" required="false" /> <field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false" /> <field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />
3.在solrconfig.xml中添加如下片段:
<updateRequestProcessorChain name="uima"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> <str name="keyword_apikey">VALID_ALCHEMYAPI_KEYstr> <str name="concept_apikey">VALID_ALCHEMYAPI_KEYstr> <str name="lang_apikey">VALID_ALCHEMYAPI_KEYstr> <str name="cat_apikey">VALID_ALCHEMYAPI_KEYstr> <str name="entities_apikey">VALID_ALCHEMYAPI_KEYstr> <str name="oc_licenseID">VALID_OPENCALAIS_KEYstr> lst> <str name="analysisEngine"> /org/apache/uima/desc/OverridingParamsExtServicesAE.xml st r> <bool name="ignoreErrors">truebool> <lst name="analyzeFields"> <bool name="merge">falsebool> <arr name="fields"> <str>textstr> arr> lst> <lst name="fieldMappings"> <lst name="type"> <str name="name">org.apache.uima.alchemy.ts.concept.ConceptFSstr> <lst name="mapping"> <str name="feature">textstr> <str name="field">conceptstr> lst> lst> <lst name="type"> <str name="name">org.apache.uima.alchemy.ts.language.LanguageFSstr> <lst name="mapping"> <str name="feature">languagestr> <str name="field">languagestr> lst> lst> <lst name="type"> <str name="name">org.apache.uima.SentenceAnnotationstr> <lst name="mapping"> <str name="feature">coveredTextstr> <str name="field">sentencestr> lst> lst> lst> lst> processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> updateRequestProcessorChain>
4. 在solrconfig.xml中替换已经存在的UpdateRequestHandler或者创建新的UpdateRequestHandler.
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler"> <lst name="defaults"> <str name="update.processor">uimastr> lst> requestHandler>