前阵子手贱,下载网上流传的某酒店2000w开房记录,顺手就给下载了。下载cvs格式,导入数据时好多失败,随后下载Sql-Server-2008-R2版本的记录,由于模糊查询非常慢,就开始改造mysql版本的(注:SQL-Server不熟而且太占内存了,4G的基本跑起来比较费力)。贴上SQL-Server的建表语句:
CREATE TABLE [dbo].[cdsgus]( [Name] [nvarchar](2000) NULL, [CardNo] [nvarchar](2000) NULL, [Descriot] [nvarchar](2000) NULL, [CtfTp] [nvarchar](2000) NULL, [CtfId] [nvarchar](2000) NULL, [Gender] [nvarchar](2000) NULL, [Birthday] [nvarchar](2000) NULL, [Address] [nvarchar](2000) NULL, [Zip] [nvarchar](2000) NULL, [Dirty] [nvarchar](2000) NULL, [District1] [nvarchar](2000) NULL, [District2] [nvarchar](2000) NULL, [District3] [nvarchar](2000) NULL, [District4] [nvarchar](2000) NULL, [District5] [nvarchar](2000) NULL, [District6] [nvarchar](2000) NULL, [FirstNm] [nvarchar](2000) NULL, [LastNm] [nvarchar](2000) NULL, [Duty] [nvarchar](2000) NULL, [Mobile] [nvarchar](2000) NULL, [Tel] [nvarchar](2000) NULL, [Fax] [nvarchar](2000) NULL, [EMail] [nvarchar](2000) NULL, [Nation] [nvarchar](2000) NULL, [Taste] [nvarchar](2000) NULL, [Education] [nvarchar](2000) NULL, [Company] [nvarchar](2000) NULL, [CTel] [nvarchar](2000) NULL, [CAddress] [nvarchar](2000) NULL, [CZip] [nvarchar](2000) NULL, [Family] [nvarchar](2000) NULL, [Version] [nvarchar](2000) NULL, [id] [int] IDENTITY(1,1) NOT NULL, CONSTRAINT [PK_cdsgus] PRIMARY KEY CLUSTERED ([id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
从建表语句看,这表建得实在是不太好!不管,先看看数据有多少:
select COUNT(*) FROM [shifenzheng].[dbo].[cdsgus]; # 查询非常慢的(5分钟左右)
mysql> show create table customer; +--------------------------+ | Table | Create Table +--------------------------+ | customer | CREATE TABLE `customer` ( `Name` varchar(80) NOT NULL, `CardNo` varchar(10) DEFAULT NULL, `Descriot` varchar(100) DEFAULT NULL, `CtfTp` varchar(4) DEFAULT NULL, `CtfId` varchar(40) DEFAULT NULL, `Gender` varchar(8) DEFAULT NULL, `Birthday` int(9) DEFAULT NULL, `Address` varchar(100) DEFAULT NULL, `Zip` int(10) DEFAULT NULL, `Dirty` varchar(20) DEFAULT NULL, `District1` varchar(6) DEFAULT NULL, `District2` varchar(4) DEFAULT NULL, `District3` varchar(6) DEFAULT NULL, `District4` varchar(6) DEFAULT NULL, `District5` varchar(8) DEFAULT NULL, `District6` varchar(20) DEFAULT NULL, `FirstNm` varchar(50) DEFAULT NULL, `LastNm` varchar(20) DEFAULT NULL, `Duty` varchar(20) DEFAULT NULL, `Mobile` varchar(40) DEFAULT NULL, `Tel` varchar(40) DEFAULT NULL, `Fax` varchar(40) DEFAULT NULL, `EMail` varchar(60) DEFAULT NULL, `Nation` varchar(25) DEFAULT NULL, `Taste` varchar(100) DEFAULT NULL, `Education` varchar(20) DEFAULT NULL, `Company` varchar(80) DEFAULT NULL, `CTel` varchar(20) DEFAULT NULL, `CAddress` varchar(60) DEFAULT NULL, `CZip` int(8) DEFAULT NULL, `Family` int(10) DEFAULT NULL, `Version` datetime DEFAULT NULL, `id` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `index_Name` (`Name`) USING BTREE, KEY `index_CtfId` (`CtfId`) USING BTREE ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | +--------------------------+ mysql> show create table test; +--------------------------+ | Table | Create Table +--------------------------+ | test | CREATE TABLE `test` ( `Name` varchar(80) NOT NULL, `CardNo` varchar(10) DEFAULT NULL, `Descriot` varchar(100) DEFAULT NULL, `CtfTp` varchar(4) DEFAULT NULL, `CtfId` varchar(40) DEFAULT NULL, `Gender` varchar(8) DEFAULT NULL, `Birthday` int(9) DEFAULT NULL, `Address` varchar(100) DEFAULT NULL, `Zip` int(10) DEFAULT NULL, `Dirty` varchar(20) DEFAULT NULL, `District1` varchar(6) DEFAULT NULL, `District2` varchar(4) DEFAULT NULL, `District3` varchar(6) DEFAULT NULL, `District4` varchar(6) DEFAULT NULL, `District5` varchar(8) DEFAULT NULL, `District6` varchar(20) DEFAULT NULL, `FirstNm` varchar(50) DEFAULT NULL, `LastNm` varchar(20) DEFAULT NULL, `Duty` varchar(20) DEFAULT NULL, `Mobile` varchar(40) DEFAULT NULL, `Tel` varchar(40) DEFAULT NULL, `Fax` varchar(40) DEFAULT NULL, `EMail` varchar(60) DEFAULT NULL, `Nation` varchar(25) DEFAULT NULL, `Taste` varchar(100) DEFAULT NULL, `Education` varchar(20) DEFAULT NULL, `Company` varchar(80) DEFAULT NULL, `CTel` varchar(20) DEFAULT NULL, `CAddress` varchar(60) DEFAULT NULL, `CZip` int(8) DEFAULT NULL, `Family` int(10) DEFAULT NULL, `Version` datetime DEFAULT NULL, `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 | +--------------------------+
表customer,test结构式一样的,数据也是一样的,只是customer多了两个索引:
KEY `index_Name` (`Name`) USING BTREE, KEY `index_CtfId` (`CtfId`) USING BTREE
终端编码统一设置成utf8:
mysql> show variables like '%char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+
先看看表空间的大小:
mysql> select table_name, data_length/(1024 * 1024), index_length/(1024 * 1024) from tables where table_schema='hotel'; +------------+---------------------------+----------------------------+ | table_name | data_length/(1024 * 1024) | index_length/(1024 * 1024) | +------------+---------------------------+----------------------------+ | customer | 2713.3337 | 566.7627 | | test | 2713.3337 | 196.2305 | +------------+---------------------------+----------------------------+ 2 rows in set (0.00 sec)多了两列的索引就多了370M左右。
条数\时间 | TABLE `test` | TABLE `customer` |
① | |
|
count(*) ... where name like '徐%' |
369670 (1 min 2.63 sec) |
369670 (0.45 sec) |
count(*) ... where name like 'xu%' |
1834 (1 min 2.32 sec) |
1834 (0.45 sec) |
name ... where name like '徐%' |
369670 (1 min 2.91 sec) |
369670 (0.30 sec) |
name ... where name like 'xu%' |
1834 (1 min 3.27 sec) |
1834 (0.15 sec) |
name ... where binary name like 'xu%' |
605 (1 min 2.89 sec) |
605 (7.62 sec) |
② | |
|
count(*) ... where name like '%徐' |
912(1 min 1.89 sec) |
912(7.45 sec) |
count(*) ... where name like '%xu' | 326(1 min 1.71 sec) |
326(7.25 sec) |
name ... where name like '%徐' |
912 (1 min 3.19 sec) |
912 (7.42 sec) |
name ... where name like '%xu' |
326 (1 min 3.69 sec) |
326 (7.48 sec) |
name ... where binary name like '%xu' |
117 (1 min 7.49 sec) |
117 (6.76 sec) |
③ | |
|
count(*) ... where name like '%徐%' |
373621(1 min 2.98 sec) |
373621(8.08 sec) |
count(*) ... where name like '%xu%' |
3347(1 min 3.26 sec) |
3347(7.98 sec) |
name ... where name like '%徐%' |
373621 (1 min 3.02 sec) |
373621 (7.49 sec) |
name ... where name like '%xu%' |
3347 (1 min 3.55 sec) |
3347 (7.29 sec) |
name ... where binary name like '%xu%' |
1059 (1 min 2.93 sec) |
1059 (6.94 sec) |
④ |
|
|
name,ctfid,id ... where name like '徐%' |
369670 (1 min 1.48 sec) |
369670 (大于40min) |
name,ctfid,id ... where name like 'xu%' |
1834 (1 min 1.40 sec) |
1834 (17.77, 0.02 sec) |
name,ctfid,id ... where name binary like 'xu%' |
605 (1 min 4.45 sec) |
605 (1min 12.59 sec) |
⑤ | |
|
name,ctfid,id ... where name like '%徐%' |
373621 (1 min 2.59 sec) |
373621 (1 min 2.18 sec) |
name,ctfid,id ... where name like '%xu%' |
3347 (1 min 5.52 sec) |
3347 (1min 1.77 sec) |
name,ctfid,id ... where binary name like '%xu%' |
1059 (1 min 2.83 sec) |
1059 (1min 0.38 sec) |
⑥ |
|
|
name ... where name = '徐' |
120 (1 min 2.70 sec) |
120 (0.16 sec) |
name ... where name = 'xu' |
38 (1 min 8.17 sec) |
38 (0.01 sec) |
⑦ | |
|
name,ctfid,id ... where name = '徐' |
120 (1 min 11.80 sec) |
120 (0.60 sec) |
name,ctfid,id ... where name = 'xu' |
38 (1 min 3.43 sec) |
38 (0.58 sec) |
⑧ |
|
|
name,ctfid,id ... where name like '%徐%' limit 1000 |
1000 (0.05 sec) |
1000 (0.06 sec) |
name,ctfid,id ... where name like '%徐%' limit 10000 |
10000 (0.40 sec) |
10000 (0.39 sec) |
name,ctfid,id ... where name like '%xu%' limit 1000 |
1000 (18.91 sec) |
1000 (19.28 sec) |
name,ctfid,id ... where name like '%xu%' limit 10000 |
10000 (1 min 1.74 sec) |
10000 (59.71 sec) |
⑨ |
|
|
name,ctfid,id ... where name like '%徐%' limit 1000, 1000 |
1000 (0.31 sec) |
1000 (0.31 sec) |
name,ctfid,id ... where name like '%徐%' limit 10000, 1000 |
1000 (1.48, 0.41 sec) |
1000 (1.52, 0.46 sec) |
name,ctfid,id ... where name like '%徐%' limit 100000, 1000 |
1000 (17.18, 17.01 sec) |
1000 (14.01, 13.61 sec) |
name,ctfid,id ... where name like '%xu%' limit 1000,1000 |
1000 (44.71 sec) |
1000 (41.81 sec) |
name,ctfid,id ... where name like '%xu%' limit 2000,1000 |
1000 (48.02 sec) |
1000 (49.56 sec) |
注:有索引的时候,模糊查询单列数据占有很大的优势,等于查询也是。对比①②发现,模糊查询还是开始字符不模糊很有优势的(基本可以了解索引的分组规则)。第④条,说明索引也是有问题的,大概46分钟的时间(可能是bug),第④的第二条说明索引查询过的数据会缓存起来,索引第二次查询明显很快。对比binary(区分大小写),明显更加耗时。第⑨条,说明越往后的数据,limit查询起来越费时。
limit关键字在数据量很多的性能瓶颈,其实就如这篇文章所说的:http://www.cnblogs.com/fjytzh/archive/2010/04/02/1702886.html,offset偏移比较大时,可以采用嵌套查询的方式来提高效率,但是目前我需要满足模糊查询的需要,这个方案有问题(确实测试了一下,不可行,比直接limit还要慢)。
试试http://grb12508.blog.163.com/blog/static/273784582009102448061/复合索引的方法(综上:如果对于有where 条件,又想走索引用limit的,必须设计一个索引,将where 放第一位,limit用到的主键放第2位,而且只能select 主键!),主要想通过name来查询id,故以name, id建复合索引(测试结果还可以)。
-- 添加复合索引 alter table test add index (name, id);
-- 查询的SQL语句(注意:这里如果用 id in (...)会很慢! http://codingstandards.iteye.com/blog/1344833 ) select name, ctfid, id from test t inner join (select id as i_id from test where name like '徐%' limit 20000, 2000) as i on t.id=i.i_id;
第一次查询比较慢,重复查询一下就很快(应该有缓存)。
CREATE TABLE `customer` ( `Name` varchar(80) NOT NULL, `CardNo` varchar(10) DEFAULT NULL, `Descriot` varchar(100) DEFAULT NULL, `CtfTp` varchar(4) DEFAULT NULL, `CtfId` varchar(40) DEFAULT NULL, `Gender` varchar(8) DEFAULT NULL, `Birthday` int(9) DEFAULT NULL, `Address` varchar(100) DEFAULT NULL, `Zip` int(10) DEFAULT NULL, `Dirty` varchar(20) DEFAULT NULL, `District1` varchar(6) DEFAULT NULL, `District2` varchar(4) DEFAULT NULL, `District3` varchar(6) DEFAULT NULL, `District4` varchar(6) DEFAULT NULL, `District5` varchar(8) DEFAULT NULL, `District6` varchar(20) DEFAULT NULL, `FirstNm` varchar(50) DEFAULT NULL, `LastNm` varchar(20) DEFAULT NULL, `Duty` varchar(20) DEFAULT NULL, `Mobile` varchar(40) DEFAULT NULL, `Tel` varchar(40) DEFAULT NULL, `Fax` varchar(40) DEFAULT NULL, `EMail` varchar(60) DEFAULT NULL, `Nation` varchar(25) DEFAULT NULL, `Taste` varchar(100) DEFAULT NULL, `Education` varchar(20) DEFAULT NULL, `Company` varchar(80) DEFAULT NULL, `CTel` varchar(20) DEFAULT NULL, `CAddress` varchar(60) DEFAULT NULL, `CZip` int(8) DEFAULT NULL, `Family` int(10) DEFAULT NULL, `Version` datetime DEFAULT NULL, `id` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `index_Name_Id` (`Name`, `id`) USING BTREE, KEY `index_CtfId_Id` (`CtfId`, `id`) USING BTREE ) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;