Apache Solr集成至Tomcat

Apache Solr集成至Tomcat

 

  Apache Solr集成至Tomcat

    原有系统已经开发,框架以SSH为基础,页面编码GBK,数据库为oracle,容器为tomcat6,需要整合全文检索,下面只是一个简单的整合测试。

1、嵌入Tomcat

解压 apache-solr-1.3.0.tgz,拷贝apache-solr-1.3.0\example\example-DIH\solr目录到Tomcat的安装目录后,修改solr目录中的solr.xml,屏蔽rss部分的配置,内容如下:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<solr sharedLib="lib" persistent="true">
 <cores adminPath="/admin/cores">
  <core default="true" instanceDir="db" name="db"></core>
<!--
  <core default="false" instanceDir="rss" name="rss"></core>
 -->
 </cores>
</solr>

  • 删除Tomcat\solr\rss目录
  • Tomcat\solr\db\lib目录下添加必要jarojdbc14.jarslf4j-jdk14-1.5.5.jarslf4j-api-1.5.5.jarsolr-dataimporthandler-1.4-SNAPSHOT.jar
  • 拷贝apache-solr-1.3.0\example\webapps\solr.warTomcat\webapps目录下。
  • 创建Tomcat\conf\Catalina\localhost\solr.xml,内容如下:

<Context docBase="${catalina.home}/webapps/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="${catalina.home}/solr" override="true" />
</Context>

  • 修改Tomcat\conf\server.xml,添加一个端口为8983Connector,内容如下:

<Connector port="8983" protocol="HTTP/1.1" 

               connectionTimeout="20000" 

               redirectPort="8443" URIEncoding="UTF-8"/>





2、配置DataImportHandler

change @ 20097110:19:57

    主要修改Tomcat\solr\db\conf\db-data-config.xmlTomcat\solr\db\conf\schema.xmlTomcat\solr\db\conf\solrconfig.xml,三个配置文件:

  • db-data-config.xml

<dataConfig>

    <dataSource driver="oracle.jdbc.driver.OracleDriver"

    url="jdbc:oracle:thin:@localhost:1521:orcl"

    user="solr" password="solr" batchSize="50"/>

    <document name="contents" >

        <entity name="content" pk="ID"

        query="select * from CONTENT"

        deltaQuery="select ID from CONTENT where to_char(PUBTIME,'yyyy-mm-dd hh24:mi:ss') > '${dataimporter.last_index_time}'"

        transformer="ClobTransformer">

            <field name="title" column="TITLE" />

            <field column="CONTENT" clob="true"/>

            <field name="pubtime" column="PUBTIME" />

        </entity>

    </document>

</dataConfig>

  • schema.xml

    types标签最后,追加fieldtype,名称为text_cjk,中日韩分词分析器;

    ......



    <fieldtype name="text_cjk" class="solr.TextField"> 

      <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer"/> 

    </fieldtype>



 </types>



    屏蔽或清除<fields></fields>中的全部代码,并添加如下内容:

 <fields>

   <field name="id" type="slong" indexed="true" stored="true" required="true" /> 

   <field name="title" type="text_cjk" indexed="true" stored="false"/>

   <field name="content" type="text_cjk" indexed="true" stored="true"/>

   <field name="pubtime" type="date" indexed="true" stored="true"/>

   <field name="searchtext" type="text_cjk" indexed="true" stored="false" multiValued="true"/>

 </fields>



    修改默认唯一索引为先前定义的id

 <uniqueKey>id</uniqueKey>



    修改默认搜索字段为先前定义的searchtext,并将要搜索的titlecontent都拷贝到searchtext中,方便统一检索:

......



 <defaultSearchField>searchtext</defaultSearchField>

......



   <copyField source="title" dest="searchtext"/>

   <copyField source="content" dest="searchtext"/>

......

 

add @ 20097110:19:57

  • solrconfig.xml

    修改<dataDir></dataDir>标签中的检索数据存放路径。内容如下:

<dataDir>${catalina.home}/solr/db/data</dataDir>

 

 



3、导入和查询:

  • 完全导入:
    http://localhost:8983/solr/db/dataimport?command=full-import
  • 增量导入:
    http://localhost:8983/solr/db/dataimport?command=delta-import
  • 查询
    http://localhost:8983/solr
    点击db,进入一个搜索页面,填入如下内容:

pubtime:[2007-11-16T00:00:00Z TO 2008-11-28T00:00:00Z]

AND

工作;

pubtime desc

    点击搜索,测试。

4xml解析:

一个简单solr查询结果的辅助搜索类,内容如下:



import java.net.URL;

import java.util.ArrayList;

import java.util.Date;

import java.util.List;



import org.dom4j.Document;

import org.dom4j.Node;

import org.dom4j.io.SAXReader;

import org.apache.commons.lang.time.DateUtils;



public class SolrUtils {

private List<Node> docs = new ArrayList<Node>();

private Number numFound = 0;

private Document doc;

public List<Node> getDocs() {

return docs;

}



public Number getNumFound() {

return numFound;

}

@SuppressWarnings("unchecked")

public SolrUtils(String urlString) {

doc = documentFromURL(urlString);

if (doc != null ) {

docs = (List<Node>)doc.selectNodes("/response/result/doc");

numFound = doc.numberValueOf("/response/result/@numFound");

}

}

public Document documentFromURL(String urlString){

try {

SAXReader reader = new SAXReader();

URL url = new URL(urlString);

       doc = reader.read(url);

       return doc;

} catch (Exception e) {

e.printStackTrace();

}

return null;

}



public static String valueOf(Object obj, String name){

return valueOf(obj, "str", name);

}

public static Date dateValueOf(Object obj, String name){

String[] parsePatterns = new String[]{

"yyyy-MM-dd'T'HH:mm:ss'Z'",

"yyyy-MM-dd'T'HH:mm:ss.S'Z'",

"yyyy-MM-dd'T'HH:mm:ss.SS'Z'",

"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",

};

try {

return DateUtils.parseDate(valueOf(obj, "date", name), parsePatterns);

} catch (Exception e) {

e.printStackTrace();

}

return null;

}

public static String valueOf(Object obj, String type, String name){

String path = "./" + type + "[@name='" + name + "']";

if (obj instanceof Node) {

Node n = (Node)obj;

return n.valueOf(path);

}

return "";

}

public static Number numberValueOf(Object obj, String type, String name){

String path = "./" + type + "[@name='" + name + "']";

if (obj instanceof Node) {

Node n = (Node)obj;

return n.numberValueOf(path);

}

return null;

}

public static void main(String[] args) throws Exception {

String url = "http://localhost:8983/solr/db/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on";

SolrUtils su = new SolrUtils(url);

System.out.println(su.getNumFound());

System.out.println(su.getDocs().size());

for (Node doc : su.getDocs()) {

System.out.println(valueOf(doc, "id"));

System.out.println(valueOf(doc, "title"));

System.out.println(dateValueOf(doc, "pubtime"));

}



}



}



参考资料:

1Apache Solr 的新特性

http://www.ibm.com/developerworks/cn/java/j-solr-update/



2Solr开发经验[]

http://www.jinsehupan.com/blog/?p=25



3slf4j-jdk14-1.5.5.jarslf4j-api-1.5.5.jarsolr-dataimporthandler-1.4-SNAPSHOT.jar

https://svn.apache.org/repos/asf/lucene/solr/trunk/lib/slf4j-jdk14-1.5.5.jar

https://svn.apache.org/repos/asf/lucene/solr/trunk/lib/slf4j-api-1.5.5.jar

http://people.apache.org/repo/m2-snapshot-repository/org/apache/solr/solr-dataimporthandler/1.4-SNAPSHOT/solr-dataimporthandler-1.4-SNAPSHOT.jar



4、本文地址

http://docs.google.com/View?id=ajfmzbdvh8wz_37f4jv46gb








你可能感兴趣的:(Apache Solr集成至Tomcat)