Lucene 中自定义排序的实现

使用Lucene来搜索内容,搜索结果的显示顺序当然是比较重要的.Lucene中Build-in的几个排序定义在大多数情况下是不适合我们使用的.要适合自己的应用程序的场景,就只能自定义排序功能,本节我们就来看看在Lucene中如何实现自定义排序功能.

Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.

SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的comparator(Expert: returns a comparator for sorting ScoreDocs).该接口只定义了一个方法.如下:

Java代码
  1. /**
  2. *Createsacomparatorforthefieldinthegivenindex.
  3. *@paramreader-Indextocreatecomparatorfor.
  4. *@paramfieldname-Fieldtocreatecomparatorfor.
  5. *@returnComparatorofScoreDocobjects.
  6. *@throwsIOException-Ifanerroroccursreadingtheindex.
  7. */
  8. publicScoreDocComparatornewComparator(IndexReaderreader,Stringfieldname)throwsIOException

该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:

Java代码
  1. //Specialcomparatorforsortinghitsaccordingtocomputedrelevance(documentscore).
  2. publicstaticfinalScoreDocComparatorRELEVANCE;
  3. //Specialcomparatorforsortinghitsaccordingtoindexorder(documentnumber).
  4. publicstaticfinalScoreDocComparatorINDEXORDER;

有3个方法与排序相关,需要我们实现 分别如下:

Java代码
  1. /**
  2. *ComparestwoScoreDocobjectsandreturnsaresultindicatingtheirsortorder.
  3. *@paramiFirstScoreDoc
  4. *@paramjSecondScoreDoc
  5. *@return-1ifishouldcomebeforej;
  6. *1ifishouldcomeafterj;
  7. *0iftheyareequal
  8. */
  9. publicintcompare(ScoreDoci,ScoreDocj);
  10. /**
  11. *Returnsthevalueusedtosortthegivendocument.Theobjectreturnedmustimplementthejava.io.Serializableinterface.Thisisusedbymultisearcherstodeterminehowtocollateresultsfromtheirsearchers.
  12. *@paramiDocument
  13. *@returnSerializableobject
  14. */
  15. publicComparablesortValue(ScoreDoci);
  16. /**
  17. *Returnsthetypeofsort.ShouldreturnSortField.SCORE,SortField.DOC,SortField.STRING,SortField.INTEGER,SortField.FLOATorSortField.CUSTOM.ItisnotvalidtoreturnSortField.AUTO.Thisisusedbymultisearcherstodeterminehowtocollateresultsfromtheirsearchers.
  18. *@returnOneoftheconstantsinSortField.
  19. */
  20. publicintsortType();

看个例子吧!

该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.

Java代码
  1. packagecom.nikee.lucene;
  2. importjava.io.IOException;
  3. importorg.apache.lucene.index.IndexReader;
  4. importorg.apache.lucene.index.Term;
  5. importorg.apache.lucene.index.TermDocs;
  6. importorg.apache.lucene.index.TermEnum;
  7. importorg.apache.lucene.search.ScoreDoc;
  8. importorg.apache.lucene.search.ScoreDocComparator;
  9. importorg.apache.lucene.search.SortComparatorSource;
  10. importorg.apache.lucene.search.SortField;
  11. //实现了搜索距你最近的餐馆的名字.餐馆坐标用字符串"x,y"来存储
  12. //DistanceComparatorSource实现了SortComparatorSource接口
  13. publicclassDistanceComparatorSourceimplementsSortComparatorSource{
  14. privatestaticfinallongserialVersionUID=1L;
  15. //xy用来保存坐标位置
  16. privateintx;
  17. privateinty;
  18. publicDistanceComparatorSource(intx,inty){
  19. this.x=x;
  20. this.y=y;
  21. }
  22. //返回ScoreDocComparator用来实现排序功能
  23. publicScoreDocComparatornewComparator(IndexReaderreader,Stringfieldname)throwsIOException{
  24. returnnewDistanceScoreDocLookupComparator(reader,fieldname,x,y);
  25. }
  26. //DistanceScoreDocLookupComparator实现了ScoreDocComparator用来排序
  27. privatestaticclassDistanceScoreDocLookupComparatorimplementsScoreDocComparator{
  28. privatefloat[]distances;//保存每个餐馆到指定点的距离
  29. //构造函数,构造函数在这里几乎完成所有的准备工作.
  30. publicDistanceScoreDocLookupComparator(IndexReaderreader,Stringfieldname,intx,inty)throwsIOException{
  31. System.out.println("fieldName2="+fieldname);
  32. finalTermEnumenumerator=reader.terms(newTerm(fieldname,""));
  33. System.out.println("maxDoc="+reader.maxDoc());
  34. distances=newfloat[reader.maxDoc()];//初始化distances
  35. if(distances.length>0){
  36. TermDocstermDocs=reader.termDocs();
  37. try{
  38. if(enumerator.term()==null){
  39. thrownewRuntimeException("notermsinfield"+fieldname);
  40. }
  41. inti=0,j=0;
  42. do{
  43. System.out.println("indo-while:"+i++);
  44. Termterm=enumerator.term();//取出每一个Term
  45. if(term.field()!=fieldname)//与给定的域不符合则比较下一个
  46. break;
  47. //SetsthistothedataforthecurrentterminaTermEnum.
  48. //Thismaybeoptimizedinsomeimplementations.
  49. termDocs.seek(enumerator);//参考TermDocsDoc
  50. while(termDocs.next()){
  51. System.out.println("inwhile:"+j++);
  52. System.out.println("inwhile,Term:"+term.toString());
  53. String[]xy=term.text().split(",");//去处xy
  54. intdeltax=Integer.parseInt(xy[0])-x;
  55. intdeltay=Integer.parseInt(xy[1])-y;
  56. //计算距离
  57. distances[termDocs.doc()]=(float)Math.sqrt(deltax*deltax+deltay*deltay);
  58. }
  59. }
  60. while(enumerator.next());
  61. }finally{
  62. termDocs.close();
  63. }
  64. }
  65. }
  66. //有上面的构造函数的准备这里就比较简单了
  67. publicintcompare(ScoreDoci,ScoreDocj){
  68. if(distances[i.doc]<distances[j.doc])
  69. return-1;
  70. if(distances[i.doc]>distances[j.doc])
  71. return1;
  72. return0;
  73. }
  74. //返回距离
  75. publicComparablesortValue(ScoreDoci){
  76. returnnewFloat(distances[i.doc]);
  77. }
  78. //指定SortType
  79. publicintsortType(){
  80. returnSortField.FLOAT;
  81. }
  82. }
  83. publicStringtoString(){
  84. return"Distancefrom("+x+","+y+")";
  85. }
  86. }


这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.

Java代码
  1. packagecom.nikee.lucene.test;
  2. importjava.io.IOException;
  3. importjunit.framework.TestCase;
  4. importorg.apache.lucene.analysis.WhitespaceAnalyzer;
  5. importorg.apache.lucene.document.Document;
  6. importorg.apache.lucene.document.Field;
  7. importorg.apache.lucene.index.IndexWriter;
  8. importorg.apache.lucene.index.Term;
  9. importorg.apache.lucene.search.FieldDoc;
  10. importorg.apache.lucene.search.Hits;
  11. importorg.apache.lucene.search.IndexSearcher;
  12. importorg.apache.lucene.search.Query;
  13. importorg.apache.lucene.search.ScoreDoc;
  14. importorg.apache.lucene.search.Sort;
  15. importorg.apache.lucene.search.SortField;
  16. importorg.apache.lucene.search.TermQuery;
  17. importorg.apache.lucene.search.TopFieldDocs;
  18. importorg.apache.lucene.store.RAMDirectory;
  19. importcom.nikee.lucene.DistanceComparatorSource;
  20. publicclassDistanceComparatorSourceTestextendsTestCase{
  21. privateRAMDirectorydirectory;
  22. privateIndexSearchersearcher;
  23. privateQueryquery;
  24. //建立测试环境
  25. protectedvoidsetUp()throwsException{
  26. directory=newRAMDirectory();
  27. IndexWriterwriter=newIndexWriter(directory,newWhitespaceAnalyzer(),true);
  28. addPoint(writer,"ElCharro","restaurant",1,2);
  29. addPoint(writer,"CafePocaCosa","restaurant",5,9);
  30. addPoint(writer,"LosBetos","restaurant",9,6);
  31. addPoint(writer,"Nico'sTacoShop","restaurant",3,8);
  32. writer.close();
  33. searcher=newIndexSearcher(directory);
  34. query=newTermQuery(newTerm("type","restaurant"));
  35. }
  36. privatevoidaddPoint(IndexWriterwriter,Stringname,Stringtype,intx,inty)throwsIOException{
  37. Documentdoc=newDocument();
  38. doc.add(newField("name",name,Field.Store.YES,Field.Index.TOKENIZED));
  39. doc.add(newField("type",type,Field.Store.YES,Field.Index.TOKENIZED));
  40. doc.add(newField("location",x+","+y,Field.Store.YES,Field.Index.UN_TOKENIZED));
  41. writer.addDocument(doc);
  42. }
  43. publicvoidtestNearestRestaurantToHome()throwsException{
  44. //使用DistanceComparatorSource来构造一个SortField
  45. Sortsort=newSort(newSortField("location",newDistanceComparatorSource(0,0)));
  46. Hitshits=searcher.search(query,sort);//搜索
  47. //测试
  48. assertEquals("closest","ElCharro",hits.doc(0).get("name"));
  49. assertEquals("furthest","LosBetos",hits.doc(3).get("name"));
  50. }
  51. publicvoidtestNeareastRestaurantToWork()throwsException{
  52. Sortsort=newSort(newSortField("location",newDistanceComparatorSource(10,10)));//工作的坐标10,10
  53. //上面的测试实现了自定义排序,但是并不能访问自定义排序的更详细信息,利用
  54. //TopFieldDocs可以进一步访问相关信息
  55. TopFieldDocsdocs=searcher.search(query,null,3,sort);
  56. assertEquals(4,docs.totalHits);
  57. assertEquals(3,docs.scoreDocs.length);
  58. //取得FieldDoc利用FieldDoc可以取得关于排序的更详细信息请查看FieldDocDoc
  59. FieldDocfieldDoc=(FieldDoc)docs.scoreDocs[0];
  60. assertEquals("(10,10)->(9,6)=sqrt(17)",newFloat(Math.sqrt(17)),fieldDoc.fields[0]);
  61. Documentdocument=searcher.doc(fieldDoc.doc);
  62. assertEquals("LosBetos",document.get("name"));
  63. dumpDocs(sort,docs);//显示相关信息
  64. }
  65. //显示有关排序的信息
  66. privatevoiddumpDocs(Sortsort,TopFieldDocsdocs)throwsIOException{
  67. System.out.println("Sortedby:"+sort);
  68. ScoreDoc[]scoreDocs=docs.scoreDocs;
  69. for(inti=0;i<scoreDocs.length;i++){
  70. FieldDocfieldDoc=(FieldDoc)scoreDocs[i];
  71. Floatdistance=(Float)fieldDoc.fields[0];
  72. Documentdoc=searcher.doc(fieldDoc.doc);
  73. System.out.println(""+doc.get("name")+"@("+doc.get("location")+")->"+distance);
  74. }
  75. }
  76. }

你可能感兴趣的:(apache,工作,JUnit,Lucene,J#)