Mysql查询(基于某酒店2000w条数据)

1、仅仅是学习

前阵子手贱,下载网上流传的某酒店2000w开房记录,顺手就给下载了。下载cvs格式,导入数据时好多失败,随后下载Sql-Server-2008-R2版本的记录,由于模糊查询非常慢,就开始改造mysql版本的(注:SQL-Server不熟而且太占内存了,4G的基本跑起来比较费力)。贴上SQL-Server的建表语句:

Mysql查询(基于某酒店2000w条数据)

CREATE TABLE [dbo].[cdsgus](
    [Name] [nvarchar](2000) NULL,
    [CardNo] [nvarchar](2000) NULL,
    [Descriot] [nvarchar](2000) NULL,
    [CtfTp] [nvarchar](2000) NULL,
    [CtfId] [nvarchar](2000) NULL,
    [Gender] [nvarchar](2000) NULL,
    [Birthday] [nvarchar](2000) NULL,
    [Address] [nvarchar](2000) NULL,
    [Zip] [nvarchar](2000) NULL,
    [Dirty] [nvarchar](2000) NULL,
    [District1] [nvarchar](2000) NULL,
    [District2] [nvarchar](2000) NULL,
    [District3] [nvarchar](2000) NULL,
    [District4] [nvarchar](2000) NULL,
    [District5] [nvarchar](2000) NULL,
    [District6] [nvarchar](2000) NULL,
    [FirstNm] [nvarchar](2000) NULL,
    [LastNm] [nvarchar](2000) NULL,
    [Duty] [nvarchar](2000) NULL,
    [Mobile] [nvarchar](2000) NULL,
    [Tel] [nvarchar](2000) NULL,
    [Fax] [nvarchar](2000) NULL,
    [EMail] [nvarchar](2000) NULL,
    [Nation] [nvarchar](2000) NULL,
    [Taste] [nvarchar](2000) NULL,
    [Education] [nvarchar](2000) NULL,
    [Company] [nvarchar](2000) NULL,
    [CTel] [nvarchar](2000) NULL,
    [CAddress] [nvarchar](2000) NULL,
    [CZip] [nvarchar](2000) NULL,
    [Family] [nvarchar](2000) NULL,
    [Version] [nvarchar](2000) NULL,
    [id] [int] IDENTITY(1,1) NOT NULL,
    CONSTRAINT [PK_cdsgus] PRIMARY KEY CLUSTERED ([id] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]

从建表语句看,这表建得实在是不太好!不管,先看看数据有多少:

select COUNT(*) FROM [shifenzheng].[dbo].[cdsgus]; # 查询非常慢的(5分钟左右)

2、开始MySql之旅 

开始导入数据(开始mysql建表语句也和上面一样),然后几经折腾,最终确定表结构如下(内存不够,放弃ENGINE=MEMORY):
mysql> show create table customer;
+--------------------------+
| Table    | Create Table   
+--------------------------+
| customer | CREATE TABLE `customer` (
  `Name` varchar(80) NOT NULL,
  `CardNo` varchar(10) DEFAULT NULL,
  `Descriot` varchar(100) DEFAULT NULL,
  `CtfTp` varchar(4) DEFAULT NULL,
  `CtfId` varchar(40) DEFAULT NULL,
  `Gender` varchar(8) DEFAULT NULL,
  `Birthday` int(9) DEFAULT NULL,
  `Address` varchar(100) DEFAULT NULL,
  `Zip` int(10) DEFAULT NULL,
  `Dirty` varchar(20) DEFAULT NULL,
  `District1` varchar(6) DEFAULT NULL,
  `District2` varchar(4) DEFAULT NULL,
  `District3` varchar(6) DEFAULT NULL,
  `District4` varchar(6) DEFAULT NULL,
  `District5` varchar(8) DEFAULT NULL,
  `District6` varchar(20) DEFAULT NULL,
  `FirstNm` varchar(50) DEFAULT NULL,
  `LastNm` varchar(20) DEFAULT NULL,
  `Duty` varchar(20) DEFAULT NULL,
  `Mobile` varchar(40) DEFAULT NULL,
  `Tel` varchar(40) DEFAULT NULL,
  `Fax` varchar(40) DEFAULT NULL,
  `EMail` varchar(60) DEFAULT NULL,
  `Nation` varchar(25) DEFAULT NULL,
  `Taste` varchar(100) DEFAULT NULL,
  `Education` varchar(20) DEFAULT NULL,
  `Company` varchar(80) DEFAULT NULL,
  `CTel` varchar(20) DEFAULT NULL,
  `CAddress` varchar(60) DEFAULT NULL,
  `CZip` int(8) DEFAULT NULL,
  `Family` int(10) DEFAULT NULL,
  `Version` datetime DEFAULT NULL,
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_Name` (`Name`) USING BTREE,
  KEY `index_CtfId` (`CtfId`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
+--------------------------+

mysql> show create table test;
+--------------------------+
| Table    | Create Table   
+--------------------------+
| test | CREATE TABLE `test` (
  `Name` varchar(80) NOT NULL,
  `CardNo` varchar(10) DEFAULT NULL,
  `Descriot` varchar(100) DEFAULT NULL,
  `CtfTp` varchar(4) DEFAULT NULL,
  `CtfId` varchar(40) DEFAULT NULL,
  `Gender` varchar(8) DEFAULT NULL,
  `Birthday` int(9) DEFAULT NULL,
  `Address` varchar(100) DEFAULT NULL,
  `Zip` int(10) DEFAULT NULL,
  `Dirty` varchar(20) DEFAULT NULL,
  `District1` varchar(6) DEFAULT NULL,
  `District2` varchar(4) DEFAULT NULL,
  `District3` varchar(6) DEFAULT NULL,
  `District4` varchar(6) DEFAULT NULL,
  `District5` varchar(8) DEFAULT NULL,
  `District6` varchar(20) DEFAULT NULL,
  `FirstNm` varchar(50) DEFAULT NULL,
  `LastNm` varchar(20) DEFAULT NULL,
  `Duty` varchar(20) DEFAULT NULL,
  `Mobile` varchar(40) DEFAULT NULL,
  `Tel` varchar(40) DEFAULT NULL,
  `Fax` varchar(40) DEFAULT NULL,
  `EMail` varchar(60) DEFAULT NULL,
  `Nation` varchar(25) DEFAULT NULL,
  `Taste` varchar(100) DEFAULT NULL,
  `Education` varchar(20) DEFAULT NULL,
  `Company` varchar(80) DEFAULT NULL,
  `CTel` varchar(20) DEFAULT NULL,
  `CAddress` varchar(60) DEFAULT NULL,
  `CZip` int(8) DEFAULT NULL,
  `Family` int(10) DEFAULT NULL,
  `Version` datetime DEFAULT NULL,
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
+--------------------------+

表customer,test结构式一样的,数据也是一样的,只是customer多了两个索引:

KEY `index_Name` (`Name`) USING BTREE,
  KEY `index_CtfId` (`CtfId`) USING BTREE

终端编码统一设置成utf8:

mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

先看看表空间的大小:

mysql> select table_name, data_length/(1024 * 1024), index_length/(1024 * 1024) from tables where table_schema='hotel';
+------------+---------------------------+----------------------------+
| table_name | data_length/(1024 * 1024) | index_length/(1024 * 1024) |
+------------+---------------------------+----------------------------+
| customer   |                 2713.3337 |                   566.7627 |
| test       |                 2713.3337 |                   196.2305 |
+------------+---------------------------+----------------------------+
2 rows in set (0.00 sec)
多了两列的索引就多了370M左右。

3、MySql的测试数据:

条数\时间 TABLE `test` TABLE `customer`


count(*) ... where name like '徐%'
369670 (1 min 2.63 sec)
369670 (0.45 sec)
count(*) ... where name like 'xu%'           
1834 (1 min 2.32 sec)
 1834 (0.45 sec)
name ... where name like '徐%'   
369670 (1 min 2.91 sec)
369670 (0.30 sec)
name ... where name like 'xu%'
1834 (1 min 3.27 sec)  
1834 (0.15 sec)
name ... where binary name like 'xu%'
605 (1 min 2.89 sec)
 605 (7.62 sec)


count(*) ... where name like '%徐'
912(1 min 1.89 sec)
912(7.45 sec)
count(*) ... where name like '%xu'  326(1 min 1.71 sec)
 326(7.25 sec)
name ... where name like '%徐'
912 (1 min 3.19 sec)
 912 (7.42 sec)
name ... where name like '%xu' 
 326 (1 min 3.69 sec)  
326 (7.48 sec)
name ... where binary name like '%xu'
117 (1 min 7.49 sec) 
 117 (6.76 sec)


count(*) ... where name like '%徐%'
373621(1 min 2.98 sec)
 373621(8.08 sec)
count(*) ... where name like '%xu%'
 3347(1 min 3.26 sec)
3347(7.98 sec)  
name ... where name like '%徐%'
373621 (1 min 3.02 sec)
373621 (7.49 sec)
name ... where name like '%xu%'
 3347 (1 min 3.55 sec)
3347 (7.29 sec)
name ... where binary name like '%xu%'
1059 (1 min 2.93 sec) 
1059 (6.94 sec)



name,ctfid,id ... where name like '徐%'  
 369670 (1 min 1.48 sec) 
369670 (大于40min)
name,ctfid,id ... where name like 'xu%' 
 1834 (1 min 1.40 sec)  
 1834 (17.77, 0.02 sec)
name,ctfid,id ... where name binary like 'xu%'
605 (1 min 4.45 sec)  
605 (1min 12.59 sec)


name,ctfid,id ... where name like '%徐%' 
373621 (1 min 2.59 sec)
 373621 (1 min 2.18 sec)
name,ctfid,id ... where name like '%xu%'  
3347 (1 min 5.52 sec)
 3347 (1min 1.77 sec)
name,ctfid,id ... where binary name like '%xu%' 
1059 (1 min 2.83 sec)
 1059 (1min 0.38 sec)



name ... where name = '徐' 
120 (1 min 2.70 sec)
120 (0.16 sec)
name ... where name = 'xu'
 38 (1 min 8.17 sec) 
38 (0.01 sec)


name,ctfid,id ... where name = '徐'
120 (1 min 11.80 sec) 
120 (0.60 sec)
name,ctfid,id ... where name = 'xu' 
38 (1 min 3.43 sec)
38 (0.58 sec)



name,ctfid,id ... where name like '%徐%' limit 1000 
1000 (0.05 sec)
1000 (0.06 sec)
name,ctfid,id ... where name like '%徐%' limit 10000 
10000 (0.40 sec)
10000 (0.39 sec)
name,ctfid,id ... where name like '%xu%' limit 1000
1000 (18.91 sec)
1000 (19.28 sec)
name,ctfid,id ... where name like '%xu%' limit 10000
 10000 (1 min 1.74 sec) 
10000 (59.71 sec)



name,ctfid,id ... where name like '%徐%' limit 1000, 1000 
1000 (0.31 sec)
1000 (0.31 sec)
name,ctfid,id ... where name like '%徐%' limit 10000, 1000 
1000 (1.48, 0.41 sec)
1000 (1.52, 0.46 sec)
name,ctfid,id ... where name like '%徐%' limit 100000, 1000 
1000 (17.18, 17.01 sec)
1000 (14.01, 13.61 sec)
name,ctfid,id ... where name like '%xu%' limit 1000,1000
1000 (44.71 sec)
1000 (41.81 sec)
name,ctfid,id ... where name like '%xu%' limit 2000,1000
1000 (48.02 sec)
1000 (49.56 sec)
  

注:有索引的时候,模糊查询单列数据占有很大的优势,等于查询也是。对比①②发现,模糊查询还是开始字符不模糊很有优势的(基本可以了解索引的分组规则)。第④条,说明索引也是有问题的,大概46分钟的时间(可能是bug),第④的第二条说明索引查询过的数据会缓存起来,索引第二次查询明显很快。对比binary(区分大小写),明显更加耗时。第⑨条,说明越往后的数据,limit查询起来越费时。

4、待解决的问题

limit关键字在数据量很多的性能瓶颈,其实就如这篇文章所说的:http://www.cnblogs.com/fjytzh/archive/2010/04/02/1702886.html,offset偏移比较大时,可以采用嵌套查询的方式来提高效率,但是目前我需要满足模糊查询的需要,这个方案有问题(确实测试了一下,不可行,比直接limit还要慢)。

试试http://grb12508.blog.163.com/blog/static/273784582009102448061/复合索引的方法(综上:如果对于有where 条件,又想走索引用limit的,必须设计一个索引,将where 放第一位,limit用到的主键放第2位,而且只能select 主键!),主要想通过name来查询id,故以name, id建复合索引(测试结果还可以)。

-- 添加复合索引
 alter table test add index (name, id);
-- 查询的SQL语句(注意:这里如果用 id in (...)会很慢! http://codingstandards.iteye.com/blog/1344833 )
 select name, ctfid, id from test t  inner join  (select id as i_id from test where  name like '徐%'  limit 20000, 2000) as i on t.id=i.i_id;



第一次查询比较慢,重复查询一下就很快(应该有缓存)。

5、最终建表语句

 CREATE TABLE `customer` (
  `Name` varchar(80) NOT NULL,
  `CardNo` varchar(10) DEFAULT NULL,
  `Descriot` varchar(100) DEFAULT NULL,
  `CtfTp` varchar(4) DEFAULT NULL,
  `CtfId` varchar(40) DEFAULT NULL,
  `Gender` varchar(8) DEFAULT NULL,
  `Birthday` int(9) DEFAULT NULL,
  `Address` varchar(100) DEFAULT NULL,
  `Zip` int(10) DEFAULT NULL,
  `Dirty` varchar(20) DEFAULT NULL,
  `District1` varchar(6) DEFAULT NULL,
  `District2` varchar(4) DEFAULT NULL,
  `District3` varchar(6) DEFAULT NULL,
  `District4` varchar(6) DEFAULT NULL,
  `District5` varchar(8) DEFAULT NULL,
  `District6` varchar(20) DEFAULT NULL,
  `FirstNm` varchar(50) DEFAULT NULL,
  `LastNm` varchar(20) DEFAULT NULL,
  `Duty` varchar(20) DEFAULT NULL,
  `Mobile` varchar(40) DEFAULT NULL,
  `Tel` varchar(40) DEFAULT NULL,
  `Fax` varchar(40) DEFAULT NULL,
  `EMail` varchar(60) DEFAULT NULL,
  `Nation` varchar(25) DEFAULT NULL,
  `Taste` varchar(100) DEFAULT NULL,
  `Education` varchar(20) DEFAULT NULL,
  `Company` varchar(80) DEFAULT NULL,
  `CTel` varchar(20) DEFAULT NULL,
  `CAddress` varchar(60) DEFAULT NULL,
  `CZip` int(8) DEFAULT NULL,
  `Family` int(10) DEFAULT NULL,
  `Version` datetime DEFAULT NULL,
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `index_Name_Id` (`Name`, `id`) USING BTREE,
  KEY `index_CtfId_Id` (`CtfId`, `id`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;



你可能感兴趣的:(索引,msyql,limit优化,某酒店2000w数据)