利用Sphinx实现实时全文检索

Sphinx 0.9.9及以前的版本,原生不支持实时索引,一般的做法是通过主索引+增量索引的方式来实现“准实时”索引,最新的1.10.1(trunk中,尚未发 布)终于支持real-time index,查看SVN中文档,我们很容易利用Sphinx搭建一个按需索引(on demand index)的全文检索系统。

参考文章:http://filiptepper.com/2010/05/27/real-time-indexing-and-searching-with-sphinx-1-10-1-dev.html

首先,从sphinxsearch的SVN下载最新的代码,编译安装:

svn
 checkout http://
sphinxsearch.googlecode.com/
svn/
trunk sphinx
cd
 sphinx/

./
configure --prefix
=/
path/
to/
sphinx
make

make
 install

编译没问题的话,在sphinx安装目录下的etc,建立sphinx.conf的配置文件,记得一定指定中文编码方面的配置搜索,否则中文会有问题:

index rt {
    # 指定索引类型为real-time index
    type = rt
    # 指定utf-8编码
    charset_type  = utf-8
    # 指定utf-8的编码表
    charset_table  = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
    # 一元分词
    ngram_len = 1
    # 需要分词的字符
    ngram_chars   = U+3000..U+2FA1F
    # 索引文件保存地址
    path = /path/to/sphinx/data/rt
    # 索引列
    rt_field = message
    # 索引属性
    rt_attr_uint = message_id
}
 
searchd {
    log = /path/to/sphinx/log/searchd.log
    query_log = /path/to/sphinx/log/query.log
    pid_file = /path/to/sphinx/log/searchd.pid
    workers = threads
    # sphinx模拟mysql接口,不需要真正的mysql,mysql41表示支持mysql4.1~mysql5.1协议
    listen = 127.0.0.1:9527:mysql41
}

启动sphinx服务:

/
path/
to/
sphinx/
bin/
searchd --config
 /
path/
to/
sphinx/
etc/
sphinx.conf

插入几条数据看看:

ubuntu:chaoqun ~
:mysql -
h127.0.0.1 -
P9527
Welcome to
 the MySQL monitor.  Commands end
 with
 ;
 or
 \g.
Your MySQL connection
 id is
 1

Server version
: 1.10.1-
dev (
r2351)

 
Type
 'help;'
 or
 '\h
'
 for help
. Type
 '\c
'
 to
 clear the current input statement.
 
mysql>
 INSERT
 INTO
 rt VALUES
 (
1
,
 'this message has a body'
,
 1
)
;

Query OK,
 1
 row affected (
0.01
 sec)

 
mysql>
 INSERT
 INTO
 rt VALUES
 (
2
,
 '测试中文OK'
,
 2
)
;

Query OK,
 1
 row affected (
0.00
 sec)

 
mysql>

测试全文检索:

mysql>
 SELECT
 *
 FROM
 rt WHERE
 MATCH
(
'message'
)
;

+------+--------+------------+

|
 id   |
 weight |
 message_id |

+------+--------+------------+

|
    1
 |
   1643
 |
          1
 |

+------+--------+------------+

1
 row in
 set
 (
0.00
 sec)

 
mysql>
 SELECT
 *
 FROM
 rt WHERE
 MATCH
(
'OK'
)
;

+------+--------+------------+

|
 id   |
 weight |
 message_id |

+------+--------+------------+

|
    2
 |
   1643
 |
          2
 |

+------+--------+------------+

1
 row in
 set
 (
0.01
 sec)

 
mysql>
 SELECT
 *
 FROM
 rt WHERE
 MATCH
(
'中'
)
;

+------+--------+------------+

|
 id   |
 weight |
 message_id |

+------+--------+------------+

|
    2
 |
   1643
 |
          2
 |

+------+--------+------------+

1
 row in
 set
 (
0.00
 sec)

 
mysql>
 SELECT
 *
 FROM
 rt WHERE
 MATCH
(
'我'
)
;

Empty set
 (
0.00
 sec)

 
mysql>

简单方便,码完收工。

Tags: on demand index , real-time index , sphinx , 实时索引

你可能感兴趣的:(sphinx)