在SolrNet中使用Apache Tika抽取文件元数据

1.添加jar文件:

tika-core-0.10.jar

tika-parsers-0.10.jar

.....

2.修改solrconfig.xml,修改完成后重启solr实例:

  <lib dir="solr路径/dist/" regex="apache-solr-cell-\d.*\.jar" />
  <lib dir="solr路径/contrib/extraction/lib" regex=".*\.jar" />

 

  <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <lst name="defaults">
      <str name="map.Last-Modified">last_modified</str>
      <str name="uprefix">metadata_</str>
    </lst>
  </requestHandler>

 

3.c#调用代码:

var solr = ServiceLocator.Current.GetInstance<ISolrOperations<IndexDocument>>();

private void AddFile(ISolrOperations<IndexDocument> solr, string id, byte[] content, string resourceName)
{
    using (MemoryStream stream = new MemoryStream(content))
    {
        var response = solr.Extract(new ExtractParameters(stream, id, resourceName)
        {
            ExtractFormat = ExtractFormat.Text,
            ExtractOnly = false,
            Fields = new[] 
            { 
                new ExtractField("name1", "value1"), 
                new ExtractField("name2", "value2")
            }
        });
        Console.WriteLine(response.Content);
    }
}

 

你可能感兴趣的:(apache)