由于innodb表不能做全文索引,因此模糊查询的效率很低,几十万行的数据like查询一般是分钟级的,而sphinx则很好的解决了这个问题,可以在0.几秒内完成这个查询,下面看下如何给innodb表添加sphinx支持
一、安装sphinx
官方下载地址:http://sphinxsearch.com/downloads/archive/
由于coreseek4.1要求2.0.2的版本,我的操作系统是centos5.4,所以选择RHEL/CentOS 5.x.x86_64 RPM 2.0.2-beta这个版本下载,运行命令:
rpm -ivh sphinx-2.0.2-1.el5.x86_64.rpm
sphinx安装完毕
二、安装mmseg
下载coreseek4.1源码包:http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz
tar -zxvf coreseek-4.1-beta.tar.gz
cd coreseek-4.1-beta
cd mmseg-3.2.14
./configure --prefix=/usr/local/mmseg
编译如果出现错误:config.status: error: cannot find input file: src/Makefile.in 运行下面的命令:
aclocal
libtoolize --force
automake --add-missing
autoconf
autoheader
make clean
./configure --prefix=/usr/local/mmseg
make && make install
至此安装完毕
三、安装csft
coreseek4.1源码解压后在mmseg-3.2.14同目录下还有个csft-4.1目录
cd csft-4.1
sh buildconf.sh
./configure --prefix=/usr/local/coreseek --with-mysql=/usr/local/mysql/ --with-mmseg=/usr/local/mmseg/ --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/
make && make install
四、安装php的sphinxclient扩展
下载地址:http://pecl.php.net/get/sphinx-1.3.0.tgz
tar -xvf sphinx-1.3.0.tgz
cd sphinx-1.3.0
/usr/local/php/bin/phpize
./configure --with-php-config=/usr/local/php/bin/php-config
如果出现错误:configure error: cannot find libsphinxclient headers
cd ~/coreseek-4.1-beta/csft-4.1/api/libsphinxclient
./configure
make && make install
再回到sphinx-1.3.0目录下:
./configure --with-php-config=/usr/local/php/bin/php-config --with-sphinx
make && make install
结束后会生成一个sphinx.so文件,将这个文件添加到php.ini文件中
service php-fpm restart
至此,环境都已经搭建好了,接下来将mysql数据表生成sphinx索引,以提供搜索。
五、coreseek配置
cd /usr/local/coreseek/etc
cp sphinx-min.conf.dist csft.conf (coreseek默认加载csft.conf配置)
修改csft.conf如下:
source src
{
type = mysql
sql_host = host
sql_user = user
sql_pass = pass
sql_db = db
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = SELECT id,title,content FROM article
sql_ranged_throttle = 0
}
index idx
{
source = src
path = /usr/local/coreseek/var/data/test
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = zh_cn.utf-8
charset_dictpath = /usr/local/coreseek/dict #这个非常重要,要不然无法进行中文搜索
html_strip = 0
}
indexer
{
mem_limit = 1024M
}
searchd
{
listen = 3312
listen = 9306:mysql41
log = /usr/local/coreseek/var/log/searchd.log
query_log = /usr/local/coreseek/var/log/query.log
read_timeout = 5
client_timeout = 300
max_children = 30
pid_file = /usr/local/coreseek/var/log/searchd.pid
max_matches = 10000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
mva_updates_pool = 1M
max_packet_size = 8M
max_filters = 256
max_filter_values = 4096
max_batch_queries = 32
workers = threads # for RT to work
}
六、生成中文词库
charset_type = zh_cn.utf-8
charset_dictpath = /usr/local/coreseek/dict #这个非常重要,要不然无法进行中文搜索
上面这个配置中需要先生成中文词库,要不然无法按中文搜索,从mmseg安装目录下的etc目录中将uni.lib和mmseg.ini复制到/usr/loca/coreseek/dict目录中即可
七、开启搜索服务
好啦,现在可以生成索引和开启搜索服务了
1. 生成索引:
cd /usr/loca/coreseek/bin
./indexer --all
如果数据量大,这个过程需要等待一段时间
2.开启搜索服务
cd /usr/loca/coreseek/bin
./searchd
至此,我们已经给我们的表生成中文快速搜索服务了,可以用该命令测试: ./search 小说
八、php程序调用
现在来看下,如何在PHP里调用这个搜索服务,下面是简单的程序:
$s = new SphinxClient();
$s->setServer("127.0.0.1", 3312);
$s->setMatchMode(SPH_MATCH_ANY);
$s->setMaxQueryTime(5000);
$s->setLimits(0,10000, 10000);
$result = $s->query("小说");
?>
终于OK啦!