摘要
- 本文内容基于10.4.8-MariaDB
Q: 假设一个订单的编号规则是AAAAOrder2020-0000001,AAAAOrder2020-0000002…后面的数字是自增长,如果订单号码达到AAAAOrder2020-1000000(100万),数据库中应该有100万条数据,此时我随机删除2条数据(物理删除,且不考虑日志和备份),请问怎么找到删掉的数据的编号?给出解题思路即可,答案需要在1秒内运行得到。
思路
- 其实查找丢失数据方法有很多,不过这里重点强调1秒内运行得到,就限制成了只能在数据库层面完成,实现方法基本上就是sql或者存储过程,而且要尽量减少表的扫描次数和扫描范围、尽可能的使用索引。
- 订单编号的序号是自增长,所以可以认为数据是按照编号自增的顺序插入数据表的,如果数据表存在单独的自增主键Id,则可以基于错位法得到缺失的主键Id,这样可以使用主键索引,而且主键自增索引的查询效率是最高的。
- 如果没有独立的主键Id,而是使用的orderId作为主键,则需要对orderId截断后进行比较,查询条件一定要使用完整的orderId,以便使用索引,也可以使用行号和截断后的订单编号进行比较的方法。具体实现见下文。
参考答案,作者水平有限,仅供学习交流
- 存在自增主键Id,与orderId序号一致,可以通过错位法计算不存在的Id值。
运行时间基本符合要求。
select id-1 as id_del from test_order where id not in (select id+1 from test_order);
+--------+
| id_del |
+--------+
| 233490 |
| 943220 |
| 0 | #这里去除0号记录,因为不存在id为0的记录
+--------+
3 rows in set (1.063 sec)
- 如果不存在自增主键ID,而是使用orderId作为主键,则也可以通过错位法计算不存在的Id值,主要比较时一定要用完整的orderId,否则不能使用索引。
运行时间超过要求。
SELECT RIGHT(orderId, 7) - 1 AS id_del FROM test_order
WHERE orderId NOT in(
SELECT concat( left(orderId, 14), LPAD( RIGHT(orderId, 7) + 1, 7, 0))
FROM test_order);
+--------+
| id_del |
+--------+
| 0 | #这里去除0号记录,因为不存在id为0的记录
| 233490 |
| 943220 |
+--------+
3 rows in set (2.999 sec)
- 如果不存在自增主键ID,而是使用orderId作为主键,也可以比较orderId的序号与行号,获取到不一致的第一条记录,然后在从该记录开始继续比较获得下一条不一致的记录,也可以通过存储过程实现。
运行时间符合要求。
sql示例1
-- 查询第一条丢失记录
select substring(a.orderId,15)-1 as orderId from (
SELECT @rownum:=@rownum+1 AS rownum, test_order.orderId FROM (SELECT @rownum:=0) r, test_order
) a where substring(a.orderId,15) != a.rownum limit 1;
+---------+
| orderId |
+---------+
| 233490 |
+---------+
1 row in set (0.341 sec)
-- 将上面查询结果作为第二条sql的参数,这里上一条查询结果为 233490,所以下面手工填写对应的值,
-- 这里因为limit不能写变量,所以需要手工填写,如果希望一次执行,可以参考后面的sql示例2和存储过程示例
select substring(a.orderId,15)-1 as orderId from (
SELECT @rownum:=@rownum+1 AS rownum, test_order.orderId FROM (SELECT @rownum:=0) r, test_order
limit 233489,1000000
) a where substring(a.orderId,15)-233490 != a.rownum limit 1;
+---------+
| orderId |
+---------+
| 943220 |
+---------+
1 row in set (0.372 sec)
-- 验证是否准确
select orderId from test_order limit 233488,2;
+----------------------+
| orderId |
+----------------------+
| aaaaorder2020-233489 |
| aaaaorder2020-233491 |
+----------------------+
2 rows in set (0.036 sec)
sql示例2
-- 可以直接粘贴到mysql终端执行,也可以保存到文件,然后在mysql终端执行source /xx/xx/xx.sql
-- 这里修改定界符,就是为了方便看到一个总的执行时间
delimiter ;;
select substring(a.orderId,15)-1 as orderId into @first_order_id from (
SELECT @rownum:=@rownum+1 AS rownum, test_order.orderId FROM (SELECT @rownum:=0) r, test_order
) a where substring(a.orderId,15) != a.rownum limit 1;
SET @second_limit = @first_order_id - 1;
SET @limitsql = CONCAT( 'SELECT @rownum:=@rownum+1 AS rownum, test_order.orderId FROM (SELECT @rownum:=0) r, test_order
limit ', @second_limit, ',1000000' );
SET @SQL = CONCAT( 'select substring(a.orderId,15)-1 as orderId into @second_order_id from (', @limitsql, '
) a where substring(a.orderId,15)-', @first_order_id, ' != a.rownum limit 1' );
PREPARE stmt FROM @SQL;
EXECUTE stmt;
SET @del_ids = concat( @first_order_id, ',', @second_order_id );
;;
delimiter ;
SELECT @del_ids;
source /Users/hanqf/Desktop/exec.sql
Query OK, 1 row affected (0.710 sec)
Query OK, 0 rows affected (0.710 sec)
Query OK, 0 rows affected (0.710 sec)
Query OK, 0 rows affected (0.710 sec)
Query OK, 0 rows affected (0.710 sec)
Statement prepared
Query OK, 1 row affected (0.710 sec)
Query OK, 0 rows affected (0.710 sec)
+---------------+
| @del_ids |
+---------------+
| 233490,943220 |
+---------------+
1 row in set (0.000 sec)
存储过程示例
DROP PROCEDURE IF EXISTS find_delete_order_id;
CREATE PROCEDURE `find_delete_order_id`(OUT del_ids varchar(255))
BEGIN
DECLARE first_order_id INT;
DECLARE second_order_id INT;
DECLARE second_limit INT;
select substring(a.orderId,15)-1 as orderId into first_order_id from (
SELECT @rownum:=@rownum+1 AS rownum, test_order.orderId FROM (SELECT @rownum:=0) r, test_order
) a where substring(a.orderId,15) != a.rownum limit 1;
set second_limit = first_order_id-1;
select substring(a.orderId,15)-1 as orderId into second_order_id from (
SELECT @rownum:=@rownum+1 AS rownum, test_order.orderId FROM (SELECT @rownum:=0) r, test_order
limit second_limit,1000000
) a where substring(a.orderId,15)-first_order_id != a.rownum limit 1;
set del_ids = concat(first_order_id,',',second_order_id);
-- 此处为了方便测试,所以在存储过程中就打印了结果
select del_ids;
END;
CALL `find_delete_order_id`(@del_ids);
+---------------+
| del_ids |
+---------------+
| 233490,943220 |
+---------------+
1 row in set (0.704 sec)
Query OK, 2 rows affected (0.704 sec)
select @del_ids;
+---------------+
| @del_ids |
+---------------+
| 233490,943220 |
+---------------+
1 row in set (0.000 sec)
表结构
DROP TABLE IF EXISTS `test_order`;
CREATE TABLE `test_order` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`orderId` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_order_id` (`orderId`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
数据初始化
说明:
直接将100w的数据插入数据表有点慢,实测130多秒吧,这里可以先将数据初始化到内存表中,然后再从内存表导入即可,总时间大概只需要12秒左右。
-
因为数据都是写入内存中,所以写入数据时可能会报内存不足,解决方法如下:
a.永久修改,在my.cnf中增加如下配置:
tmp_table_size=1G
max_heap_table_size=1Gb.临时修改,mysql终端执行:
set tmp_table_size = 1073741824;
set max_heap_table_size = 1073741824;
show variables like "%table_size%";
-- 创建内存表,重启数据库后,内存表中的数据会被清空,但是表结构依旧存在,不需要时可以drop掉
DROP TABLE IF EXISTS `test_order_memory`;
CREATE TABLE `test_order_memory` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`orderId` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_order_id` (`orderId`) USING HASH
) ENGINE=MEMORY AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
-- 初始化内存表
DROP PROCEDURE IF EXISTS add_test_order_memory;
CREATE PROCEDURE `add_test_order_memory`(IN n int)
BEGIN
DECLARE i INT DEFAULT 1;
WHILE (i <= n ) DO
INSERT INTO test_order_memory (orderId) VALUES (concat('aaaaorder2020-',LPAD(i, 7, 0)));
set i=i+1;
END WHILE;
END;
-- 创建1000000条数据
call add_test_order_memory(1000000);
Query OK, 1000000 rows affected (7.354 sec)
-- 将内存表中数据导入实际表中
INSERT into test_order SELECT * from test_order_memory;
Query OK, 1000000 rows affected (5.035 sec)
Records: 1000000 Duplicates: 0 Warnings: 0
-- 查看结果
select * from test_order limit 5;
+----+-----------------------+
| id | orderId |
+----+-----------------------+
| 1 | aaaaorder2020-0000001 |
| 2 | aaaaorder2020-0000002 |
| 3 | aaaaorder2020-0000003 |
| 4 | aaaaorder2020-0000004 |
| 5 | aaaaorder2020-0000005 |
+----+-----------------------+
5 rows in set (0.000 sec)
-- 删除两条数据,这里就随便填两个id号
delete from test_order where id in (233490,943220);
Query OK, 2 rows affected (0.001 sec)