Apache Solr集成至Tomcat
原有系统已经开发,框架以SSH为基础,页面编码GBK,数据库为oracle,容器为tomcat6,需要整合全文检索,下面只是一个简单的整合测试。
1、嵌入Tomcat:
解压 apache-solr-1.3.0.tgz,拷贝apache-solr-1.3.0\example\example-DIH\solr目录到Tomcat的安装目录后,修改solr目录中的solr.xml,屏蔽rss部分的配置,内容如下:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> |
- 删除Tomcat\solr\rss目录
- Tomcat\solr\db\lib目录下添加必要jar:ojdbc14.jar、slf4j-jdk14-1.5.5.jar、slf4j-api-1.5.5.jar、solr-dataimporthandler-1.4-SNAPSHOT.jar
- 拷贝apache-solr-1.3.0\example\webapps\solr.war到Tomcat\webapps目录下。
- 创建Tomcat\conf\Catalina\localhost\solr.xml,内容如下:
<Context docBase="${catalina.home}/webapps/solr.war" debug="0" crossContext="true" > |
- 修改Tomcat\conf\server.xml,添加一个端口为8983的Connector,内容如下:
<Connector port="8983" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8"/> |
2、配置DataImportHandler:
change @ 2009年7月1日10:19:57
主要修改Tomcat\solr\db\conf\db-data-config.xml、Tomcat\solr\db\conf\schema.xml、Tomcat\solr\db\conf\solrconfig.xml,三个配置文件:
- db-data-config.xml
<dataConfig> <dataSource driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@localhost:1521:orcl" user="solr" password="solr" batchSize="50"/> <document name="contents" > <entity name="content" pk="ID" query="select * from CONTENT" deltaQuery="select ID from CONTENT where to_char(PUBTIME,'yyyy-mm-dd hh24:mi:ss') > '${dataimporter.last_index_time}'" transformer="ClobTransformer"> <field name="title" column="TITLE" /> <field column="CONTENT" clob="true"/> <field name="pubtime" column="PUBTIME" /> </entity> </document> </dataConfig> |
- schema.xml
在types标签最后,追加fieldtype,名称为text_cjk,中日韩分词分析器;
......
<fieldtype name="text_cjk" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.cjk.CJKAnalyzer"/> </fieldtype>
</types> |
屏蔽或清除<fields></fields>中的全部代码,并添加如下内容:
<fields> <field name="id" type="slong" indexed="true" stored="true" required="true" /> <field name="title" type="text_cjk" indexed="true" stored="false"/> <field name="content" type="text_cjk" indexed="true" stored="true"/> <field name="pubtime" type="date" indexed="true" stored="true"/> <field name="searchtext" type="text_cjk" indexed="true" stored="false" multiValued="true"/> </fields> |
修改默认唯一索引为先前定义的id:
<uniqueKey>id</uniqueKey> |
修改默认搜索字段为先前定义的searchtext,并将要搜索的title和content都拷贝到searchtext中,方便统一检索:
......
<defaultSearchField>searchtext</defaultSearchField> ......
<copyField source="title" dest="searchtext"/> <copyField source="content" dest="searchtext"/> ...... |
add @ 2009年7月1日10:19:57
- solrconfig.xml
修改<dataDir></dataDir>标签中的检索数据存放路径。内容如下:
<dataDir>${catalina.home}/solr/db/data</dataDir> |
3、导入和查询:
- 完全导入:
http://localhost:8983/solr/db/dataimport?command=full-import - 增量导入:
http://localhost:8983/solr/db/dataimport?command=delta-import - 查询
http://localhost:8983/solr
点击db,进入一个搜索页面,填入如下内容:
pubtime:[2007-11-16T00:00:00Z TO 2008-11-28T00:00:00Z] AND 工作; pubtime desc |
点击搜索,测试。
4、xml解析:
一个简单solr查询结果的辅助搜索类,内容如下:
import java.net.URL; import java.util.ArrayList; import java.util.Date; import java.util.List;
import org.dom4j.Document; import org.dom4j.Node; import org.dom4j.io.SAXReader; import org.apache.commons.lang.time.DateUtils;
public class SolrUtils { private List<Node> docs = new ArrayList<Node>(); private Number numFound = 0; private Document doc; public List<Node> getDocs() { return docs; }
public Number getNumFound() { return numFound; } @SuppressWarnings("unchecked") public SolrUtils(String urlString) { doc = documentFromURL(urlString); if (doc != null ) { docs = (List<Node>)doc.selectNodes("/response/result/doc"); numFound = doc.numberValueOf("/response/result/@numFound"); } } public Document documentFromURL(String urlString){ try { SAXReader reader = new SAXReader(); URL url = new URL(urlString); doc = reader.read(url); return doc; } catch (Exception e) { e.printStackTrace(); } return null; }
public static String valueOf(Object obj, String name){ return valueOf(obj, "str", name); } public static Date dateValueOf(Object obj, String name){ String[] parsePatterns = new String[]{ "yyyy-MM-dd'T'HH:mm:ss'Z'", "yyyy-MM-dd'T'HH:mm:ss.S'Z'", "yyyy-MM-dd'T'HH:mm:ss.SS'Z'", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", }; try { return DateUtils.parseDate(valueOf(obj, "date", name), parsePatterns); } catch (Exception e) { e.printStackTrace(); } return null; } public static String valueOf(Object obj, String type, String name){ String path = "./" + type + "[@name='" + name + "']"; if (obj instanceof Node) { Node n = (Node)obj; return n.valueOf(path); } return ""; } public static Number numberValueOf(Object obj, String type, String name){ String path = "./" + type + "[@name='" + name + "']"; if (obj instanceof Node) { Node n = (Node)obj; return n.numberValueOf(path); } return null; } public static void main(String[] args) throws Exception { String url = "http://localhost:8983/solr/db/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on"; SolrUtils su = new SolrUtils(url); System.out.println(su.getNumFound()); System.out.println(su.getDocs().size()); for (Node doc : su.getDocs()) { System.out.println(valueOf(doc, "id")); System.out.println(valueOf(doc, "title")); System.out.println(dateValueOf(doc, "pubtime")); }
}
} |
参考资料:
1、Apache Solr 的新特性
http://www.ibm.com/developerworks/cn/java/j-solr-update/
2、Solr开发经验[原]
http://www.jinsehupan.com/blog/?p=25
3、slf4j-jdk14-1.5.5.jar、slf4j-api-1.5.5.jar、solr-dataimporthandler-1.4-SNAPSHOT.jar
https://svn.apache.org/repos/asf/lucene/solr/trunk/lib/slf4j-jdk14-1.5.5.jar
https://svn.apache.org/repos/asf/lucene/solr/trunk/lib/slf4j-api-1.5.5.jar
http://people.apache.org/repo/m2-snapshot-repository/org/apache/solr/solr-dataimporthandler/1.4-SNAPSHOT/solr-dataimporthandler-1.4-SNAPSHOT.jar
4、本文地址
http://docs.google.com/View?id=ajfmzbdvh8wz_37f4jv46gb