Solr 使用入门介绍

« 在 Grails 中使用 TinyMCE 富文本编辑器 安装 HAProxy 配置负载均衡 » Solr 使用入门介绍,以搜索论坛帖子为示例
发表于:2009年5月27日 | 分类:Search | 标签: solr | views(1,581)
版权信息: 可以任意转载, 转载时请务必以超链接形式标明文章原文出处, 即下面的声明.


原文出处:http://blog.chenlb.com/2009/05/apache-solr-quick-start-and-demo.html

前些日子做了个 apache solr 应用的入门介绍,也在博客记录下,方便新手看看。以搜索论坛帖子为示例。

1、先下载 Apache Solr 1.3 http://apache.etoak.com/lucene/solr/1.3.0/apache-solr-1.3.0.zip,解压到如 E:\apache-solr-1.3.0。

2、下载 Apache Tomcat 6.0.18 http://labs.xiaonei.com/apache-mirror/tomcat/tomcat-6/v6.0.18/bin/apache-tomcat-6.0.18.zip,解压到如 E:\apache-tomcat-6.0.18。

3、solr 安装到 tomcat。修改 E:\apache-tomcat-6.0.18\conf\server.xml,加个 URIEncoding="UTF-8",把 8080 的那一块改为:

<Connector port="8080" protocol="HTTP/1.1" 
           connectionTimeout="20000" 
           redirectPort="8443" URIEncoding="UTF-8"/> 

    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" URIEncoding="UTF-8"/>
把下面的内容保存到 E:\apache-tomcat-6.0.18\conf\Catalina\localhost\solr.xml,没有这个目录自行创建。

<Context docBase="E:/apache-solr-1.3.0/dist/apache-solr-1.3.0.war" reloadable="true" > 
    <Environment name="solr/home" type="java.lang.String" value="E:/apache-solr-1.3.0/example/solr" override="true" /> 
</Context> 

<Context docBase="E:/apache-solr-1.3.0/dist/apache-solr-1.3.0.war" reloadable="true" >
<Environment name="solr/home" type="java.lang.String" value="E:/apache-solr-1.3.0/example/solr" override="true" />
</Context>
solr 的更多方式请看:solr install

4、现在安装好,启动 tomcat,并打开 http://localhost:8080/solr/admin/ 看看界面。

5、为搜索论坛帖子应用设计索引结构:

字段 说明
id 帖子 id
user 发表用户名或UserId
title 标题
content 内容
timestamp 发表时间
text 把标题和内容放到这里,可以用同时搜索这些内容。

6、上面的索引结构告诉 solr,把下面的内容覆盖 E:\apache-solr-1.3.0\example\solr\conf\scheam.xml,(可以先备份这文件,方便以后看官方示例):

<?xml version="1.0" encoding="UTF-8" ?> 
 
<schema name="example" version="1.1"> 
 
  <types> 
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> 
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/> 
 
    <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and  
         is a more restricted form of the canonical representation of dateTime  
         http://www.w3.org/TR/xmlschema-2/#dateTime  
         The trailing "Z" designates UTC time and is mandatory.  
         Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z  
         All other components are mandatory.  
 
         Expressions can also be used to denote calculations that should be  
         performed relative to "NOW" to determine the value, ie...  
 
               NOW/HOUR  
                  ... Round to the start of the current hour  
               NOW-1DAY  
                  ... Exactly 1 day prior to now  
               NOW/DAY+6MONTHS+3DAYS  
                  ... 6 months and 3 days in the future from the start of  
                      the current day  
 
         Consult the DateField javadocs for more information.  
      --> 
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/> 
 
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> 
      <analyzer> 
        <tokenizer class="solr.CJKTokenizerFactory"/> 
      </analyzer> 
    </fieldType> 
 
</types> 
 
<fields> 
   <field name="id" type="sint" indexed="true" stored="true" required="true" /> 
   <field name="user" type="string" indexed="true" stored="true"/> 
   <field name="title" type="text" indexed="true" stored="true"/> 
   <field name="content" type="text" indexed="true" stored="true" /> 
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW"/> 
 
   <!-- catchall field, containing all other searchable text fields (implemented  
        via copyField further on in this schema  --> 
   <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> 
</fields> 
 
<!-- Field to use to determine and enforce document uniqueness.  
      Unless this field is marked with required="false", it will be a required field  
   --> 
<uniqueKey>id</uniqueKey> 
 
<!-- field for the QueryParser to use when an explicit fieldname is absent --> 
<defaultSearchField>text</defaultSearchField> 
 
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" --> 
<solrQueryParser defaultOperator="AND"/> 
 
  <!-- copyField commands copy one field to another at the time a document  
        is added to the index.  It's used either to index the same field differently,  
        or to add multiple fields to the same field for easier/faster searching.  --> 
<!-- --> 
   <copyField source="title" dest="text"/> 
   <copyField source="content" dest="text"/> 
 
</schema> 

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="example" version="1.1">

  <types>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>

    <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
         is a more restricted form of the canonical representation of dateTime
         http://www.w3.org/TR/xmlschema-2/#dateTime
         The trailing "Z" designates UTC time and is mandatory.
         Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
         All other components are mandatory.

         Expressions can also be used to denote calculations that should be
         performed relative to "NOW" to determine the value, ie...

               NOW/HOUR
                  ... Round to the start of the current hour
               NOW-1DAY
                  ... Exactly 1 day prior to now
               NOW/DAY+6MONTHS+3DAYS
                  ... 6 months and 3 days in the future from the start of
                      the current day

         Consult the DateField javadocs for more information.
      -->
    <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.CJKTokenizerFactory"/>
      </analyzer>
    </fieldType>

</types>

<fields>
   <field name="id" type="sint" indexed="true" stored="true" required="true" />
   <field name="user" type="string" indexed="true" stored="true"/>
   <field name="title" type="text" indexed="true" stored="true"/>
   <field name="content" type="text" indexed="true" stored="true" />
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW"/>

   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
</fields>

<!-- Field to use to determine and enforce document uniqueness.
      Unless this field is marked with required="false", it will be a required field
   -->
<uniqueKey>id</uniqueKey>

<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>text</defaultSearchField>

<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND"/>

  <!-- copyField commands copy one field to another at the time a document
        is added to the index.  It's used either to index the same field differently,
        or to add multiple fields to the same field for easier/faster searching.  -->
<!-- -->
   <copyField source="title" dest="text"/>
   <copyField source="content" dest="text"/>

</schema>
7、重启 tomcat,然后手动在 E:\apache-solr-1.3.0\example\exampledocs 创建两个 xml 数据文件。分别保存为 demo-doc1.xml 和 demo-doc2.xml:

<?xml version="1.0" encoding="UTF-8" ?> 
<add> 
    <doc> 
        <field name="id">1</field> 
        <field name="user">chenlb</field> 
        <field name="title">solr 应用演讲</field> 
        <field name="content">这一小节是讲提交数据给服务器做索引,这里有一些数据,如:服务器,可以试查找它。</field> 
    </doc> 
</add> 

<?xml version="1.0" encoding="UTF-8" ?>
<add>
<doc>
<field name="id">1</field>
<field name="user">chenlb</field>
<field name="title">solr 应用演讲</field>
<field name="content">这一小节是讲提交数据给服务器做索引,这里有一些数据,如:服务器,可以试查找它。</field>
</doc>
</add>
<?xml version="1.0" encoding="UTF-8" ?> 
<add> 
    <doc> 
        <field name="id">2</field> 
        <field name="user">bory.chan</field> 
        <field name="title">搜索引擎</field> 
        <field name="content">搜索服务器那边有很多数据。</field> 
        <field name="timestamp">2009-02-18T00:00:00Z</field> 
    </doc> 
    <doc> 
        <field name="id">3</field> 
        <field name="user">other</field> 
        <field name="title">这是什么</field> 
        <field name="content">你喜欢什么运动?篮球?</field> 
        <field name="timestamp">2009-02-18T12:33:05.123Z</field> 
    </doc> 
</add> 

<?xml version="1.0" encoding="UTF-8" ?>
<add>
<doc>
<field name="id">2</field>
<field name="user">bory.chan</field>
<field name="title">搜索引擎</field>
<field name="content">搜索服务器那边有很多数据。</field>
<field name="timestamp">2009-02-18T00:00:00Z</field>
</doc>
<doc>
<field name="id">3</field>
<field name="user">other</field>
<field name="title">这是什么</field>
<field name="content">你喜欢什么运动?篮球?</field>
<field name="timestamp">2009-02-18T12:33:05.123Z</field>
</doc>
</add>
8、提交数据做索引,到 E:\apache-solr-1.3.0\example\exampledocs,运行:

E:\apache-solr-1.3.0\example\exampledocs>java -Durl=http://localhost:8080/solr/update -Dcommit=yes -jar post.jar demo-doc*.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported
SimplePostTool: POSTing files to http://localhost:8080/solr/update..
SimplePostTool: POSTing file demo-doc1.xml
SimplePostTool: POSTing file demo-doc2.xml
SimplePostTool: COMMITting Solr index changes..9、查看搜索结果:

所有内容 http://localhost:8080/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">0</int> 
<lst name="params"> 
  <str name="indent">on</str> 
  <str name="start">0</str> 
  <str name="q">*:*</str> 
  <str name="rows">10</str> 
  <str name="version">2.2</str> 
</lst> 
</lst> 
<result name="response" numFound="3" start="0"> 
<doc> 
  <str name="content">这一小节是讲提交数据给服务器做索引,这里有一些数据,如:服务器,可以试查找它。</str> 
  <int name="id">1</int> 
  <date name="timestamp">2009-05-27T04:07:54.89Z</date> 
  <str name="title">solr 应用演讲</str> 
  <str name="user">chenlb</str> 
</doc> 
<doc> 
  <str name="content">搜索服务器那边有很多数据。</str> 
  <int name="id">2</int> 
  <date name="timestamp">2009-02-18T00:00:00Z</date> 
  <str name="title">搜索引擎</str> 
  <str name="user">bory.chan</str> 
</doc> 
<doc> 
  <str name="content">你喜欢什么运动?篮球?</str> 
  <int name="id">3</int> 
  <date name="timestamp">2009-02-18T12:33:05.123Z</date> 
  <str name="title">这是什么</str> 
  <str name="user">other</str> 
</doc> 
</result> 
</response> 

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
  <str name="indent">on</str>
  <str name="start">0</str>
  <str name="q">*:*</str>
  <str name="rows">10</str>
  <str name="version">2.2</str>
</lst>
</lst>
<result name="response" numFound="3" start="0">
<doc>
  <str name="content">这一小节是讲提交数据给服务器做索引,这里有一些数据,如:服务器,可以试查找它。</str>
  <int name="id">1</int>
  <date name="timestamp">2009-05-27T04:07:54.89Z</date>
  <str name="title">solr 应用演讲</str>
  <str name="user">chenlb</str>
</doc>
<doc>
  <str name="content">搜索服务器那边有很多数据。</str>
  <int name="id">2</int>
  <date name="timestamp">2009-02-18T00:00:00Z</date>
  <str name="title">搜索引擎</str>
  <str name="user">bory.chan</str>
</doc>
<doc>
  <str name="content">你喜欢什么运动?篮球?</str>
  <int name="id">3</int>
  <date name="timestamp">2009-02-18T12:33:05.123Z</date>
  <str name="title">这是什么</str>
  <str name="user">other</str>
</doc>
</result>
</response>
bory.chan 用户的:http://localhost:8080/solr/select/?q=user%3Abory.chan&version=2.2&start=0&rows=10&indent=on

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">0</int> 
<lst name="params"> 
  <str name="indent">on</str> 
  <str name="start">0</str> 
  <str name="q">user:bory.chan</str> 
  <str name="rows">10</str> 
  <str name="version">2.2</str> 
</lst> 
</lst> 
<result name="response" numFound="1" start="0"> 
<doc> 
  <str name="content">搜索服务器那边有很多数据。</str> 
  <int name="id">2</int> 
  <date name="timestamp">2009-02-18T00:00:00Z</date> 
  <str name="title">搜索引擎</str> 
  <str name="user">bory.chan</str> 
</doc> 
</result> 
</response> 

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
  <str name="indent">on</str>
  <str name="start">0</str>
  <str name="q">user:bory.chan</str>
  <str name="rows">10</str>
  <str name="version">2.2</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
  <str name="content">搜索服务器那边有很多数据。</str>
  <int name="id">2</int>
  <date name="timestamp">2009-02-18T00:00:00Z</date>
  <str name="title">搜索引擎</str>
  <str name="user">bory.chan</str>
</doc>
</result>
</response>
时间 http://localhost:8080/solr/select/?q=timestamp%3A%5B%222009-02-18T00%3A00%3A00Z%22+TO+%222009-02-19T00%3A00%3A00Z%22%5D&version=2.2&start=0&rows=10&indent=on

<?xml version="1.0" encoding="UTF-8"?> 
<response> 
 
<lst name="responseHeader"> 
<int name="status">0</int> 
<int name="QTime">16</int> 
<lst name="params"> 
  <str name="indent">on</str> 
  <str name="start">0</str> 
  <str name="q">timestamp:["2009-02-18T00:00:00Z" TO "2009-02-19T00:00:00Z"]</str> 
  <str name="rows">10</str> 
  <str name="version">2.2</str> 
</lst> 
</lst> 
<result name="response" numFound="2" start="0"> 
<doc> 
  <str name="content">搜索服务器那边有很多数据。</str> 
  <int name="id">2</int> 
  <date name="timestamp">2009-02-18T00:00:00Z</date> 
  <str name="title">搜索引擎</str> 
  <str name="user">bory.chan</str> 
</doc> 
<doc> 
  <str name="content">你喜欢什么运动?篮球?</str> 
  <int name="id">3</int> 
  <date name="timestamp">2009-02-18T12:33:05.123Z</date> 
  <str name="title">这是什么</str> 
  <str name="user">other</str> 
</doc> 
</result> 
</response> 

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">16</int>
<lst name="params">
  <str name="indent">on</str>
  <str name="start">0</str>
  <str name="q">timestamp:["2009-02-18T00:00:00Z" TO "2009-02-19T00:00:00Z"]</str>
  <str name="rows">10</str>
  <str name="version">2.2</str>
</lst>
</lst>
<result name="response" numFound="2" start="0">
<doc>
  <str name="content">搜索服务器那边有很多数据。</str>
  <int name="id">2</int>
  <date name="timestamp">2009-02-18T00:00:00Z</date>
  <str name="title">搜索引擎</str>
  <str name="user">bory.chan</str>
</doc>
<doc>
  <str name="content">你喜欢什么运动?篮球?</str>
  <int name="id">3</int>
  <date name="timestamp">2009-02-18T12:33:05.123Z</date>
  <str name="title">这是什么</str>
  <str name="user">other</str>
</doc>
</result>
</response>
常用的 solr 查询参数请看:solr 查询参数说明

简单的示例已经完成了,索引文件(默认)会在 CWD/solr/data/index 目录下,要改为 solr.home/data目录下,在 F:\apache-solr-1.3.0\example\solr\conf\solrconfig.xml 把 dataDir 注释掉,如:

<!-- 
<dataDir>${solr.data.dir:./solr/data}</dataDir> 
--> 

  <!--
  <dataDir>${solr.data.dir:./solr/data}</dataDir>
  -->
说明:上面没有使用中文分词,用官方的 CJK 分词,另外有 mmseg4j 中文分词的示例,请看:solr 中文分词 mmseg4j 使用例子

你可能感兴趣的:(apache,tomcat,应用服务器,搜索引擎,Solr)