在新架构中打算选择Compass或Hibernate Search作为搜索引擎框架,比较后,感觉Hibernate Search上还是没有Compass成熟,另外考虑到后期对网页的爬取及搜索需求,决定还是基于Compass来作为架构缺省的搜索引擎。网上关于Compass的文档很多,但说得相对完整其详细的入门文档基本上没有,Compass的官方文档倒是说得很详细,但是例子一塌糊涂,存在很大问题。记录一下搭建的过程,作为入门的指南。
Compass 通过OSEM(Object/Search Engine Mapping)允许把应用对象的领域模型映射到搜索引擎,最终通过访问common meta data来达到访问对象的目的。
1.1、annotation vs. xml配置文件
Compass的配置文件主要分成三类:
第一类:*.cmd.xml文件*
.cmd.xml文件是对common meta data进行定义,定义了最终搜索的结果中的最基本的元数据。
第二类:*.cpm.xml文件
*.cpm.xml是Object/Search Engine Mapping,提供了POJO到common meta data的映射。
第三类:*.cfg.xml文件
Compass的*.cfg.xml定义了Compass的Index存放路径、搜索引擎分词等相关信息。
与采用xml配置文件相比较,采用Annonation方式还是相对简单,尤其是采用Spring时候,不用写*.cmd.xml文件、*.cpm.xml、*.cfg.xml,相对很方便,而且不像Hibernate的Annonation很多,Compass的Annonation的核心标注只有@Searchable、@SearchableId、@SearchableProperty、@SearchableComponent个,很容易记忆。因此推荐使用Annonation方式
Compass的核心API借鉴了Hibernate的术语,因此在操作上基本上与Hibernate类似,以下为Compass的几个核心接口:
CompassConfiguration(类似Hibernate Configuration):用来在一些设置参数、配置文件和映射定义上配置Compass。通常用来创建Compass接口。
Compass(类似Hibernate SessionFactory):为单线程使用,创建线程安全的实例来打开Compass Seesion。同样还提供了一些搜索引擎索引级别的操作。
CompassSesssion(类似Hibernate Session):用来执行像保存、删除、查找、装载这样的搜索操作。很轻量但是并不是线程安全的。
CompassTransaction(类似Hibernate Transaction):管理Compass事务的接口。使用它并不需要事务管理环境(像Spring、JTA)。
1.3、Compass与Spring集成
Compass已经对对spring集成做了很好的封装,同时与Spring对Hibernate的支持类似,Compass也提供了CompassTemplate来简化诸如对Session、Transaction、Exception等操作,尽量充分使用此工具,可以有效提高效率。例如:
CompassTemplate ct = (CompassTemplate) context.getBean("compassTemplate");
Article article = new Article();
article.setTitle("Compass Test");
article.setPublishDate(new Date());
article.setAuthor(1);
ct.save(article); //存储对象需要索引的数据到Compass的索引中。
Spring :2.5
Compas:1.2.1
Hibernate:3.2.5
Mysql :5.0.5
CREATE TABLE `article` ( `Id` int(11) NOT NULL auto_increment, `title` varchar(40) NOT NULL default '', `author` int(11) default '0', `publish_date` date NOT NULL default '0000-00-00', PRIMARY KEY (`Id`) ) TYPE=MyISAM; CREATE TABLE `author` ( `Id` int(11) NOT NULL auto_increment, `username` varchar(20) NOT NULL default '', `password` varchar(20) NOT NULL default '', `age` smallint(6) default '0', PRIMARY KEY (`Id`) ) TYPE=MyISAM;
从测试用例讲起比较容易把关系理清楚,不然一堆术语和概念很让人晕乎。
import org.apache.log4j.Logger; import java.util.Date; import junit.framework.TestCase; import org.compass.core.Compass; import org.compass.core.CompassDetachedHits; import org.compass.core.CompassHit; import org.compass.core.CompassHits; import org.compass.core.CompassSession; import org.compass.core.CompassTemplate; import org.compass.core.CompassTransaction; import org.compass.core.support.search.CompassSearchCommand; import org.compass.core.support.search.CompassSearchResults; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.mobilesoft.esales.dao.hibernate.ArticleDAO; import com.mobilesoft.esales.dao.hibernate.AuthorDAO; import com.mobilesoft.esales.model.Article; import com.mobilesoft.esales.model.Author; import com.mobilesoft.framework.search.service.CompassSearchService; /** * Compass服务使用的测试用例 * * @author [email protected] * */ public class TestCompass extends TestCase { private static final Logger logger = Logger.getLogger(TestCompass.class); private static ClassPathXmlApplicationContext context = null; private static CompassTemplate ct; static { context = new ClassPathXmlApplicationContext(new String[] { "applicationContext.xml", "applicationContext-resources.xml", "applicationContext-dao.xml", "applicationContext-service.xml", "applicationContext-compass.xml" }); ct = (CompassTemplate) context.getBean("compassTemplate"); } protected void setUp() throws Exception { } /** * 插入测试数据 */ public void testInsert() { ArticleDAO articleDao = (ArticleDAO) context.getBean("articleDAO"); AuthorDAO authorDao = (AuthorDAO) context.getBean("authorDAO"); Article article = new Article(); Author author = new Author(); author.setAge((short) 27); author.setUsername("liangchuan"); author.setPassword("liangchuan"); article.setTitle("Compass Test"); article.setPublishDate(new Date()); article.setAuthor(1); authorDao.save(author); articleDao.save(article); ct.save(article); ct.save(author); } /** * 用于测试使用CompassTransaction事务方式 */ public void testTransactionalFind() { Compass compass = ct.getCompass(); CompassSession session = compass.openSession(); CompassTransaction tx = null; try { tx = session.beginTransaction(); CompassHits hits = session.find("Compass*"); logger.error("testTransactionalFind() - CompassHits hits=" + hits.getLength()); for (int i = 0; i < hits.getLength(); i++) { Object hit = hits.data(i); if (hit instanceof Article) { Article item = (Article) hit; logger.error("testTransactionalFind() - article hits=" + item.getTitle()); } else if (hit instanceof Author) { Author item = (Author) hit; logger.error("testTransactionalFind() - author hits=" + item.getUsername()); } else { logger.error("testTransactionalFind() - error hits="); } } tx.commit(); } catch (Exception e) { if (tx != null) { tx.rollback(); } } finally { session.close(); } } /** * 用于演示CompassDetachedHits的使用。 * 由于CompassTempalte得到的结果集必须在transactionalcontext中才能使用, * 因此必须使用CompassDetachedHits方式测试CompassDetachedHits方式 */ public void testDetachedFind() { // 由于CompassTempalte得到的结果集必须在transactional // context中才能使用,因此必须使用CompassDetachedHits方式 // 测试CompassDetachedHits方式 CompassDetachedHits hits = ct.findWithDetach("Compass*"); logger.error("testDetachedFind() - CompassHits hits=" + hits.getLength()); for (int i = 0; i < hits.getLength(); i++) { Object hit = hits.data(i); if (hit instanceof Article) { Article item = (Article) hit; logger.error("testDetachedFind() - article hits=" + item.getTitle()); } else if (hit instanceof Author) { Author item = (Author) hit; logger.error("testDetachedFind() - author hits=" + item.getUsername()); } else { logger.error("testDetachedFind() - error hits="); } } } /** * 用于演示com.mobilesoft.framework.search.service.CompassSearchService的使用 * */ class CompassSearch extends CompassSearchService{ CompassSearch(){ Compass compass = ct.getCompass(); CompassSession session = compass.openSession(); CompassTransaction tx = null; try { tx = session.beginTransaction(); CompassSearchCommand command = new CompassSearchCommand(); command.setQuery("Compass"); CompassSearchResults results= performSearch(command,session); logger.error("CompassSearch() - CompassHit TotalHits value=" +results.getTotalHits()); for (int i = 0; i < results.getHits().length; i++) { CompassHit hits=results.getHits()[i]; Object hit=hits.getData(); logger.error("CompassSearch() - CompassHit hit=" + hit); //$NON-NLS-1$ if (hit instanceof Article) { Article item = (Article) hit; logger.error("testCompassSearchService() - article hits=" + item.getTitle()); } else if (hit instanceof Author) { Author item = (Author) hit; logger.error("testCompassSearchService() - author hits=" + item.getUsername()); } else { logger.error("testCompassSearchService() - error hits="); } tx.commit(); } } catch (Exception e) { if (tx != null) { tx.rollback(); } } finally { session.close(); } } } public void testCompassSearchService() { new CompassSearch(); } protected void tearDown() throws Exception { } }
xml version="1.0"?> DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN 2.0//EN" "http://www.springframework.org/dtd/spring-beans-2.0.dtd"> <beans default-lazy-init="true"> <bean id="compassTemplate" class="org.compass.core.CompassTemplate"> <property name="compass" ref="compass"/> bean> <bean id="annotationConfiguration" class="org.compass.annotations.config.CompassAnnotationsConfiguration"> bean> <bean id="compass" class="org.compass.spring.LocalCompassBean"> <property name="classMappings"> <list> <value>com.mobilesoft.esales.model.Articlevalue> <value>com.mobilesoft.esales.model.Authorvalue> list> property> <property name="compassConfiguration" ref="annotationConfiguration"/> <property name="compassSettings"> <props> <prop key="compass.engine.connection"> file://compass prop> <prop key="compass.transaction.factory"> org.compass.spring.transaction.SpringSyncTransactionFactory prop> <prop key="compass.engine.highlighter.default.formatter.simple.pre"> [CDATA[<font color="red"><b>]]> prop> <prop key="compass.engine.highlighter.default.formatter.simple.post"> [CDATA[b>font>]]> prop> props> property> <property name="transactionManager" ref="transactionManager"/> bean> <bean id="hibernateGpsDevice" class="org.compass.gps.device.hibernate.HibernateGpsDevice"> <property name="name"> <value>hibernateDevicevalue> property> <property name="sessionFactory" ref="sessionFactory"/> <property name="mirrorDataChanges"> <value>truevalue> property> bean> <bean id="compassGps" class="org.compass.gps.impl.SingleCompassGps" init-method="start" destroy-method="stop"> <property name="compass" ref="compass"/> <property name="gpsDevices"> <list> <bean class="org.compass.spring.device.SpringSyncTransactionGpsDeviceWrapper"> <property name="gpsDevice" ref="hibernateGpsDevice"/> bean> list> property> bean> <bean id="compassSearchService" class="com.mobilesoft.framework.search.service.CompassSearchService"> <property name="compass" ref="compass"/> <property name="pageSize" value="15"/> bean> <bean id="compassIndexBuilder" class="com.mobilesoft.framework.search.service.CompassIndexBuilder" lazy-init="false"> <property name="compassGps" ref="compassGps"/> <property name="buildIndex" value="false"/> <property name="lazyTime" value="10"/> bean> beans>
applicationContext-dao.xml、applicationContext-service.xml、applicationContext-resources.xml等略去。
AdvancedSearchCommand.java
package com.mobilesoft.framework.search.service; import java.util.HashSet; import java.util.Set; import org.apache.commons.lang.StringUtils; import org.compass.core.CompassQuery.SortDirection; import org.compass.core.CompassQuery.SortPropertyType; import org.compass.core.support.search.CompassSearchCommand; import org.springframework.util.Assert; public class AdvancedSearchCommand extends CompassSearchCommand { /** * 封装基于Compass 的排序参数. */ class CompassSort { private String name; private SortPropertyType type; private SortDirection direction; public CompassSort() { } public CompassSort(String sortParamName, String paramType, boolean isAscend) { Assert.isTrue(StringUtils.isNotBlank(sortParamName)); setName(sortParamName); if ("int".equalsIgnoreCase(paramType)) { setType(SortPropertyType.INT); } else if ("float".equalsIgnoreCase(paramType)) { setType(SortPropertyType.FLOAT); } else if ("string".equalsIgnoreCase(paramType)) { setType(SortPropertyType.STRING); } else { setType(SortPropertyType.AUTO); } if (isAscend) { setDirection(SortDirection.AUTO); } else { setDirection(SortDirection.REVERSE); } } public String getName() { return name; } public void setName(String name) { this.name = name; } public SortPropertyType getType() { return type; } public void setType(SortPropertyType type) { this.type = type; } public SortDirection getDirection() { return direction; } public void setDirection(SortDirection direction) { this.direction = direction; } } /** * 搜索结果排序表. */ private SetsortMap = new HashSet (); private String[] highlightFields; /** * @param paramType 现定义了三种类型: int string 以及 float。
* 除去这三种外,其他会被自动定义为SortPropertyType.AUTO 具体的可见{@link org.compass.core.CompassQuery.SortPropertyType} * @param isAscend 顺序还是倒序排序 * @see org.compass.core.CompassQuery.SortPropertyType#AUTO * @see org.compass.core.CompassQuery.SortPropertyType#INT * @see org.compass.core.CompassQuery.SortPropertyType#STRING * @see org.compass.core.CompassQuery.SortPropertyType#FLOAT * @see org.compass.core.CompassQuery.SortDirection#AUTO * @see org.compass.core.CompassQuery.SortDirection#REVERSE */ public void addSort(String sortParamName, String paramType, boolean isAscend) { this.sortMap.add(new CompassSort(sortParamName, paramType, isAscend)); } public SetgetSortMap() { return sortMap; } public void setSortMap(Set sortMap) { this.sortMap = sortMap; } public String[] getHighlightFields() { return highlightFields; } public void setHighlightFields(String[] highlightFields) { this.highlightFields = highlightFields; } }
CompassIndexBuilder.java
package com.mobilesoft.framework.search.service;
import org.apache.log4j.Logger;
import org.compass.gps.CompassGps;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.util.Assert;
/**
* 通过quartz定时调度定时重建索引或自动随Spring ApplicationContext启动而重建索引的Builder.
* 会启动后延时数秒新开线程调用compassGps.index()函数.
* 默认会在Web应用每次启动时重建索引,可以设置buildIndex属性为false来禁止此功能.
* 也可以不用本Builder, 编写手动调用compassGps.index()的代码.
*
*/
public class CompassIndexBuilder implements InitializingBean {
private static final Logger log = Logger.getLogger(CompassIndexBuilder.class);
// 是否需要建立索引,可被设置为false使本Builder失效.
private boolean buildIndex = false;
// 索引操作线程延时启动的时间,单位为秒
private int lazyTime = 10;
// Compass封装
private CompassGps compassGps;
// 索引线程
private Thread indexThread = new Thread() {
@Override
public void run() {
try {
Thread.sleep(lazyTime * 1000);
log.info("begin compass index...");
long beginTime = System.currentTimeMillis();
// 重建索引.
// 如果compass实体中定义的索引文件已存在,索引过程中会建立临时索引,
// 索引完成后再进行覆盖.
compassGps.index();
long costTime = System.currentTimeMillis() - beginTime;
log.info("compss index finished.");
log.info("costed " + costTime + " milliseconds");
} catch (InterruptedException e) {
// simply proceed
}
}
};
/**
* 实现InitializingBean
接口,在完成注入后调用启动索引线程.
*
* @see org.springframework.beans.factory.InitializingBean#afterPropertiesSet()
*/
public void afterPropertiesSet() throws Exception {
if (buildIndex) {
Assert.notNull(compassGps, "CompassIndexBuilder not set CompassGps yet.");
indexThread.setDaemon(true);
indexThread.setName("Compass Indexer");
indexThread.start();
}
}
public void setBuildIndex(boolean buildIndex) {
this.buildIndex = buildIndex;
}
public void setLazyTime(int lazyTime) {
this.lazyTime = lazyTime;
}
public void setCompassGps(CompassGps compassGps) {
this.compassGps = compassGps;
}
}
CompassSearchService.java
package com.mobilesoft.framework.search.service; import org.compass.core.Compass; import org.compass.core.CompassCallback; import org.compass.core.CompassDetachedHits; import org.compass.core.CompassHits; import org.compass.core.CompassQuery; import org.compass.core.CompassSession; import org.compass.core.CompassTemplate; import org.compass.core.CompassTransaction; import org.compass.core.support.search.CompassSearchCommand; import org.compass.core.support.search.CompassSearchResults; import org.springframework.beans.factory.InitializingBean; import org.springframework.util.Assert; import com.mobilesoft.framework.search.service.AdvancedSearchCommand.CompassSort; /** * 仿照 {@link org.compass.spring.web.mvc.CompassSearchController} * 中的代码,构建了一个Service,方便不使用Spring MVC * * @see org.compass.spring.web.mvc.CompassSearchController * @see org.compass.spring.web.mvc.AbstractCompassCommandController */ public class CompassSearchService implements InitializingBean { //每页显示的条目数量 private Integer pageSize = 15; private Compass compass; private CompassTemplate compassTemplate; /** * 公开的搜索接口,返回匹配的搜索结果,与 * {@link org.compass.spring.web.mvc.CompassSearchController#handle(javax.servlet.http.HttpServletRequest, *javax.servlet.http.HttpServletResponse,Object,org.springframework.validation.BindException) 处理相似 * * @see org.compass.spring.web.mvc.CompassSearchController#handle(javax.servlet.http.HttpServletRequest, *javax.servlet.http.HttpServletResponse,java.lang.Object,org.springframework.validation.BindException) */ public CompassSearchResults search(final CompassSearchCommand command) { return (CompassSearchResults) getCompassTemplate().execute( CompassTransaction.TransactionIsolation.READ_ONLY_READ_COMMITTED, new CompassCallback() { public Object doInCompass(CompassSession session) { return performSearch(command, session); } }); } /** * 通过此方法调用搜索引擎,进行结果匹配搜索. * * @see org.compass.spring.web.mvc.CompassSearchController#performSearch( *org.compass.spring.web.mvc.CompassSearchCommand,org.compass.core.CompassSession) */ protected CompassSearchResults performSearch(CompassSearchCommand searchCommand, CompassSession session) { long time = System.currentTimeMillis(); CompassQuery query = buildQuery(searchCommand, session); CompassHits hits = query.hits(); CompassDetachedHits detachedHits; CompassSearchResults.Page[] pages = null; if (pageSize == null) { doProcessBeforeDetach(searchCommand, session, hits, -1, -1); detachedHits = hits.detach(); } else { int iPageSize = pageSize; int page = 0; int hitsLength = hits.getLength(); if (searchCommand.getPage() != null) { page = searchCommand.getPage(); } int from = page * iPageSize; if (from > hits.getLength()) { // 如果起始的条目大于搜索到的条目 from = hits.getLength() - iPageSize; doProcessBeforeDetach(searchCommand, session, hits, from, hitsLength); detachedHits = hits.detach(from, hitsLength); } else if ((from + iPageSize) > hitsLength) { // 结束的条目大于搜索到的结果 doProcessBeforeDetach(searchCommand, session, hits, from, hitsLength); detachedHits = hits.detach(from, hitsLength); } else { // 中间的页码,直接取出相应的条目 doProcessBeforeDetach(searchCommand, session, hits, from, iPageSize); detachedHits = hits.detach(from, iPageSize); } doProcessAfterDetach(searchCommand, session, detachedHits); int numberOfPages = (int) Math.ceil((float) hitsLength / iPageSize); pages = new CompassSearchResults.Page[numberOfPages]; for (int i = 0; i < pages.length; i++) { pages[i] = new CompassSearchResults.Page(); pages[i].setFrom(i * iPageSize + 1); pages[i].setSize(iPageSize); pages[i].setTo((i + 1) * iPageSize); if (from >= (pages[i].getFrom() - 1) && from < pages[i].getTo()) { pages[i].setSelected(true); } else { pages[i].setSelected(false); } } if (numberOfPages > 0) { CompassSearchResults.Page lastPage = pages[numberOfPages - 1]; if (lastPage.getTo() > hitsLength) { lastPage.setSize(hitsLength - lastPage.getFrom()); lastPage.setTo(hitsLength); } } } time = System.currentTimeMillis() - time; CompassSearchResults searchResults = new CompassSearchResults(detachedHits.getHits(), time, pageSize); searchResults.setPages(pages); return searchResults; } /** * 构建Lucene搜索器. */ protected CompassQuery buildQuery(CompassSearchCommand searchCommand, CompassSession session) { CompassQuery query = session.queryBuilder().queryString(searchCommand.getQuery().trim()).toQuery(); if (AdvancedSearchCommand.class.isAssignableFrom(searchCommand.getClass())) { AdvancedSearchCommand advancedSearchCommand = (AdvancedSearchCommand) searchCommand; for (CompassSort sort : advancedSearchCommand.getSortMap()) { query.addSort(sort.getName(), sort.getType(), sort.getDirection()); } } return query; } /** * 在detach 之前,可以做一些操作。比如highlighting... * * @param from 需要注意的是,如果pageSize 没有指定,那么这里传入的参数为-1 */ protected void doProcessBeforeDetach(CompassSearchCommand searchCommand, CompassSession session, CompassHits hits, int from, int size) { if (AdvancedSearchCommand.class.isAssignableFrom(searchCommand.getClass())) { if (from < 0) { from = 0; size = hits.getLength(); } String[] highlightFields = ((AdvancedSearchCommand) searchCommand).getHighlightFields(); if (highlightFields == null) { return; } // highlight fields for (int i = from; i < size; i++) { for (String highlightField : highlightFields) { hits.highlighter(i).fragment(highlightField); } } } } /** * An option to perform any type of processing before the hits are detached. */ protected void doProcessAfterDetach(CompassSearchCommand searchCommand, CompassSession session, CompassDetachedHits hits) { } public void afterPropertiesSet() throws Exception { Assert.notNull(compass, "Must set compass property"); this.compassTemplate = new CompassTemplate(compass); } public Integer getPageSize() { return pageSize; } public void setPageSize(Integer pageSize) { this.pageSize = pageSize; } public void setCompass(Compass compass) { this.compass = compass; } protected CompassTemplate getCompassTemplate() { return this.compassTemplate; } }
@SearchableId 声明Document的id列;
@SearchableProperty 声明要索引的field;
@SearchableComponent 声明要索引的其他关联对象。
Article.java
package com.mobilesoft.esales.model; import java.util.Date; import org.compass.annotations.Searchable; import org.compass.annotations.SearchableId; import org.compass.annotations.SearchableProperty; import org.compass.core.CompassTemplate; @Searchable public class Article implements java.io.Serializable { @SearchableId private Integer id; @SearchableProperty(name="title") private String title; @SearchableProperty(name="author") private Integer author; @SearchableProperty(name="publishDate") private Date publishDate; /** default constructor */ public Article() { } /** minimal constructor */ public Article(String title, Date publishDate) { this.title = title; this.publishDate = publishDate; } /** full constructor */ public Article(String title, Integer author, Date publishDate) { this.title = title; this.author = author; this.publishDate = publishDate; } public Integer getId() { return this.id; } public void setId(Integer id) { this.id = id; } public String getTitle() { return this.title; } public void setTitle(String title) { this.title = title; } public Integer getAuthor() { return this.author; } public void setAuthor(Integer author) { this.author = author; } public Date getPublishDate() { return this.publishDate; } public void setPublishDate(Date publishDate) { this.publishDate = publishDate; } }
Author.java
package com.mobilesoft.esales.model; import org.compass.annotations.Searchable; import org.compass.annotations.SearchableId; import org.compass.annotations.SearchableProperty; import org.compass.core.CompassTemplate; @Searchable public class Author implements java.io.Serializable { @SearchableId private Integer id; @SearchableProperty(name="username") private String username; private String password; @SearchableProperty(name="age") private Short age; public Author() { } /** minimal constructor */ public Author(String username, String password) { this.username = username; this.password = password; } /** full constructor */ public Author(String username, String password, Short age) { this.username = username; this.password = password; this.age = age; } // Property accessors public Integer getId() { return this.id; } public void setId(Integer id) { this.id = id; } public String getUsername() { return this.username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return this.password; } public void setPassword(String password) { this.password = password; } public Short getAge() { return this.age; } public void setAge(Short age) { this.age = age; } }
ArticleDAO.java和AuthorDAO.java省略
直接用MyEclipse生成的,没有什么特别的。
http://www.compass-project.org/docs/1.2.1/reference/html/
The Compass Framework Search made easy.pdf
Compass TSSJS Europe 06.pdf
Hello World Tutorial
InfoQ: Compass: Integrate Search into your apps
InfoQ: Compass: Simplifying and Extending Lucene to Provide Google-like Search
InfoQ: Compass: 在你的应用中集成搜索功能
Compass 指南
http://www.kimchy.org/