利用MySQL的表实现树的构建
数据结构表结构介绍:
程序设计过程中,我们常常用树形结构来表征某些数据的关联关系,如企业上下级部门、栏目结构、商品,省份存储,分类等等,通常而言,这些树状结构需要借助于数据库完成持久化。然而目前的各种基于关系的数据库,都是以二维表的形式记录存储数据信息,因此是不能直接将Tree存入DBMS,设计合适的Schema及其对应的CRUD算法是实现关系型数据库中存储树形结构的关键。理想中树形结构应该具备如下特征:数据存储冗余度小、直观性强;检索遍历过程简单高效;节点增删改查CRUD操作高效。
这种方案的优点很明显:设计和实现自然而然,非常直观和方便。缺点当然也是非常的突出:由于直接地记录了节点之间的继承关系,因此对Tree的任何CRUD操作都将是低效的,这主要归根于频繁的“递归”操作,递归过程不断地访问数据库,每次数据库IO都会有时间开销。当然,这种方案并非没有用武之地,在Tree规模相对较小的情况下,我们可以借助于缓存机制来做优化,将Tree的信息载入内存进行处理,避免直接对数据库IO操作的性能开销。
在基于数据库的一般应用中,查询的需求总要大于删除和修改。为了避免对于树形结构查询时的“递归”过程,基于Tree的前序遍历设计一种全新的无递归查询、无限分组的左右值编码方案,来保存该树的数据。
第一次看见这种表结构,相信大部分人都不清楚左值(Lft)和右值(Rgt)是如何计算出来的,而且这种表设计似乎并没有保存父子节点的继承关系。但当你用手指指着表中的数字从1数到18,你应该会发现点什么吧。对,你手指移动的顺序就是对这棵树进行前序遍历的顺序,如下图所示。当我们从根节点Food左侧开始,标记为1,并沿前序遍历的方向,依次在遍历的路径上标注数字,最后我们回到了根节点Food,并在右边写上了18。
依据此设计,我们可以推断出所有左值大于2,并且右值小于11的节点都是Fruit的后续节点,整棵树的结构通过左值和右值存储了下来。然而,这还不够,我们的目的是能够对树进行CRUD操作,即需要构造出与之配套的相关算法。
实验
树的后续遍历的每个节点的左值、右值
树的使用:
1,建造表
CREATE TABLE `comment` (
`comment_id` int(11) DEFAULT NULL,
`left_num` int(11) DEFAULT NULL,
`right_num` int(11) DEFAULT NULL
);
2.插入数据
INSERT INTO `comment` VALUES
(1,1,14),
(2,2,5),
(3,3,4),
(4,6,13),
(5,7,8),
(6,9,12),
(7,10,11);
CREATE INDEX idx$comment$left_num$right_num ON `comment` (`left_num`, `right_num`);
#实现了树的拓扑的展示
3.查找节点4的所有子节点
思路:我们只要查找出 节点左值在 '节点4' 左值和右值之间的节点
通俗说法:能被 '节点4' 包住的节点,通过左节点和右节点来判断是否被 '节点4' 包住。
select c.* from comment AS p, comment AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.comment_id = 4;
+------------+----------+-----------+
| comment_id | left_num | right_num |
+------------+----------+-----------+
| 4 | 6 | 13 |
| 5 | 7 | 8 |
| 6 | 9 | 12 |
| 7 | 10 | 11 |
+------------+----------+-----------+
此sql:4号节点的left_num和right_num限定了4的子节点的范围,所以根据4号节点的范围获取所有子节点
4.找出6节点的额所有父节点:
思路: 找出 左值小于 '节点6' 并且 右值大于 '节点6' 的节点。
通俗说法: 找出那个节点能将 '节点6' 给包住。
select p.* from comment as p , comment as c
where c.left_num between p.left_num and p.right_num
and c.comment_id=6;
+------------+----------+-----------+
| comment_id | left_num | right_num |
+------------+----------+-----------+
| 1 | 1 | 14 |
| 4 | 6 | 13 |
| 6 | 9 | 12 |
+------------+----------+-----------+
5.计算节点4的深度
如果是MySQL5.7 需要修改sql_mode
思路:先求父节点,然后统计父节点的数目
SET SESSION sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
SELECT c.*,COUNT(c.comment_id) AS depth
FROM comment AS p, comment AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND c.comment_id = 4
GROUP BY c.comment_id;
+------------+----------+-----------+-------+
| comment_id | left_num | right_num | depth |
+------------+----------+-----------+-------+
| 4 | 6 | 13 | 2 |
+------------+----------+-----------+-------+
获取 '节点4' 的所有子节点, 和相关深度
SELECT sub_child.*,
(COUNT(sub_parent.comment_id) - 1) AS depth
FROM (
SELECT child.*
FROM comment AS parent, comment AS child
WHERE child.left_num BETWEEN parent.left_num AND parent.right_num
AND parent.comment_id = 4
) AS sub_child, (
SELECT child.*
FROM comment AS parent, comment AS child
WHERE child.left_num BETWEEN parent.left_num AND parent.right_num
AND parent.comment_id = 4
) AS sub_parent
WHERE sub_child.left_num BETWEEN sub_parent.left_num AND sub_parent.right_num
GROUP BY sub_child.comment_id
ORDER BY sub_child.left_num;
+------------+----------+-----------+-------+
| comment_id | left_num | right_num | depth |
+------------+----------+-----------+-------+
| 4 | 6 | 13 | 0 |
| 5 | 7 | 8 | 1 |
| 6 | 9 | 12 | 1 |
| 7 | 10 | 11 | 2 |
+------------+----------+-----------+-------+
插入数据:
数据的插入是一件相当麻烦的事,需要更新节点的所有父节点的右值和和所有孩子节点的 '左值、右值'
如上图,如果我们想为 '节点4' 添加一个孩子 '节点44'(为了不给自己挖坑,我们将添加的孩子放在父节点的最左边),就是将 '节点44' 放在 '节点5' 的左边。如下图
修改后如下图:
上图 '紫色' 的是节点需要变更的左值和右值,'绿色' 的是新增节点的值。
更新思路:
1、将左值大于 '节点4' 的左值的节点的左值 加2。
2、将右值大于 '节点4' 的左值的节点的右值 加2。
-- 获得 '节点4' 和 '节点4'的第一个孩子的(节点5)的左右值
SELECT c.*
FROM comment AS p, comment AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.comment_id = 4;
+------------+----------+-----------+
| comment_id | left_num | right_num |
+------------+----------+-----------+
| 4 | 6 | 13 |
| 5 | 7 | 8 |
| 6 | 9 | 12 |
| 7 | 10 | 11 |
+------------+----------+-----------+
-- 通过上面获得的信息更新 '节点4' 的父子几点的左右值
UPDATE comment SET left_num = left_num + 2 WHERE left_num > 6;
UPDATE comment SET right_num = right_num + 2 WHERE right_num > 6;
插入思路
1、将 '节点44' 的左值设置为 '节点4' 的左值 加1
2、将 '节点44' 的右值设置为 '节点4' 的左值 加2
插入44节点
INSERT INTO comment SELECT 44, left_num + 1, left_num + 2 FROM comment WHERE comment_id = 4;
检查
SELECT c.*
FROM comment AS p, comment AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.comment_id = 4;
+------------+----------+-----------+
| comment_id | left_num | right_num |
+------------+----------+-----------+
| 4 | 6 | 15 |
| 5 | 9 | 10 |
| 6 | 11 | 14 |
| 7 | 12 | 13 |
| 44 | 7 | 8 |
+------------+----------+-----------+
这种树结构一般会用在查询多增加修改少的场景中(比如地区表,类别表之类的)。
区域表的设计
建表:
CREATE TABLE `area` (
`area_id` int(11) NOT NULL AUTO_INCREMENT COMMENT '地区ID',
`name` varchar(40) NOT NULL DEFAULT 'unkonw' COMMENT '地区名称',
`area_code` varchar(10) NOT NULL DEFAULT 'unkonw' COMMENT '地区编码',
`pid` int(11) DEFAULT NULL COMMENT '父id',
`left_num` mediumint(8) unsigned NOT NULL COMMENT '节点左值',
`right_num` mediumint(8) unsigned NOT NULL COMMENT '节点右值',
PRIMARY KEY (`area_id`),
KEY `idx$area$pid` (`pid`),
KEY `idx$area$left_num` (`left_num`),
KEY `idx$area$right_num` (`right_num`)
)
插入数据
表的相关操作
SELECT * FROM area WHERE name LIKE '%广州%';
+---------+-----------+-----------+------+----------+-----------+
| area_id | name | area_code | pid | left_num | right_num |
+---------+-----------+-----------+------+----------+-----------+
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2904 |
+---------+-----------+-----------+------+----------+-----------+
查询广州的相关子节点
SELECT c.*
FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.area_id = 2148;
+---------+-----------+-----------+------+----------+-----------+
| area_id | name | area_code | pid | left_num | right_num |
+---------+-----------+-----------+------+----------+-----------+
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2904 |
| 2161 | 从化市 | 440184 | 2148 | 2880 | 2881 |
| 2160 | 增城市 | 440183 | 2148 | 2882 | 2883 |
| 2159 | 花都区 | 440114 | 2148 | 2884 | 2885 |
| 2158 | 番禺区 | 440113 | 2148 | 2886 | 2887 |
| 2157 | 黄埔区 | 440112 | 2148 | 2888 | 2889 |
| 2156 | 白云区 | 440111 | 2148 | 2890 | 2891 |
| 2154 | 天河区 | 440106 | 2148 | 2892 | 2893 |
| 2153 | 海珠区 | 440105 | 2148 | 2894 | 2895 |
| 2152 | 越秀区 | 440104 | 2148 | 2896 | 2897 |
| 2151 | 荔湾区 | 440103 | 2148 | 2898 | 2899 |
| 2150 | 东山区 | 230406 | 2148 | 2900 | 2901 |
| 2149 | 其它区 | 440189 | 2148 | 2902 | 2903 |
+---------+-----------+-----------+------+----------+-----------+
查看 '广州' 所有孩子 和 深度 并显示层级关系
SELECT sub_child.area_id,
(COUNT(sub_parent.name) - 1) AS depth,
CONCAT(REPEAT(' ', (COUNT(sub_parent.name) - 1)), sub_child.name) AS name
FROM (
SELECT child.*
FROM area AS parent, area AS child
WHERE child.left_num BETWEEN parent.left_num AND parent.right_num
AND parent.area_id = 2148
) AS sub_child, (
SELECT child.*
FROM area AS parent, area AS child
WHERE child.left_num BETWEEN parent.left_num AND parent.right_num
AND parent.area_id = 2148
) AS sub_parent
WHERE sub_child.left_num BETWEEN sub_parent.left_num AND sub_parent.right_num
GROUP BY sub_child.area_id
ORDER BY sub_child.left_num;
+---------+-------+-------------+
| area_id | depth | name |
+---------+-------+-------------+
| 2148 | 0 | 广州市 |
| 2161 | 1 | 从化市 |
| 2160 | 1 | 增城市 |
| 2159 | 1 | 花都区 |
| 2158 | 1 | 番禺区 |
| 2157 | 1 | 黄埔区 |
| 2156 | 1 | 白云区 |
| 2154 | 1 | 天河区 |
| 2153 | 1 | 海珠区 |
| 2152 | 1 | 越秀区 |
| 2151 | 1 | 荔湾区 |
| 2150 | 1 | 东山区 |
| 2149 | 1 | 其它区 |
+---------+-------+-------------+
显示 '广州' 的直系祖先(包括自己)
SELECT p.* FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND c.area_id = 2148;
+---------+-----------+-----------+------+----------+-----------+
| area_id | name | area_code | pid | left_num | right_num |
+---------+-----------+-----------+------+----------+-----------+
| 2147 | 广东省 | 440000 | 0 | 2580 | 2905 |
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2904 |
| 3611 | 中国 | 100000 | -1 | 1 | 7218 |
+---------+-----------+-----------+------+----------+-----------+
向 '广州' 插入一个地区 '南沙区'
-- 更新左右值
UPDATE area SET left_num = left_num + 2 WHERE left_num > 2879;
UPDATE area SET right_num = right_num + 2 WHERE right_num > 2879;
-- 插入 '南沙区' 信息
INSERT INTO area
SELECT NULL, '南沙区', '440115', 2148, left_num + 1, left_num + 2
FROM area WHERE area_id = 2148;
-- 查看是否满足要求
SELECT c.*
FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.area_id = 2148;
+---------+-----------+-----------+------+----------+-----------+
| area_id | name | area_code | pid | left_num | right_num |
+---------+-----------+-----------+------+----------+-----------+
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2906 |
| 3612 | 南沙区 | 440115 | 2148 | 2880 | 2881 |
| 2161 | 从化市 | 440184 | 2148 | 2882 | 2883 |
| 2160 | 增城市 | 440183 | 2148 | 2884 | 2885 |
...
区域表树改造
之前我们的地区表的层级结构,可以说是一颗数树的祖先是 '中国'。在一个树的结构下当数据量大的时候要更新或添加一个地区的时候跟新的数据量平均是半个表。这看以来显然是不合理的。
单树到多树的演变
原来我们的是以中国为粒度来维护整张表的层级关系。现在我们将变成以 '省' 的粒度来维护地区的层级关系。并且往往我们使用也都是以省来做最大的粒度。
结构改造:
由于我们的粒度变成了 '省',就代表我们之后的操作都是基于某个省下面所有地区进行的。因此我们需要为每个地区添加一个字段标识了他是属于哪个 '省' 的。
ALTER TABLE area
ADD top_layer_id INT NOT NULL DEFAULT 0;
找出所有的省
SELECT * FROM area WHERE pid = 0;
更新地区top_layer_id为自己的省ID
DROP PROCEDURE IF EXISTS set_top_layer_id;
DELIMITER //
CREATE PROCEDURE set_top_layer_id()
BEGIN
DECLARE num INT;
DECLARE cur_area_id INT;
DECLARE done INT DEFAULT FALSE;
DECLARE cur_area CURSOR FOR
SELECT area_id
FROM area
WHERE pid = 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur_area;
read_loop: LOOP
FETCH cur_area INTO cur_area_id;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE area ,(
SELECT c.area_id
FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.area_id = cur_area_id
) AS tmp
SET area.top_layer_id = cur_area_id
WHERE tmp.area_id = area.area_id;
END LOOP;
CLOSE cur_area;
COMMIT;
END //
DELIMITER ;
CALL set_top_layer_id;
DROP PROCEDURE IF EXISTS set_top_layer_id;
对表进行操作
查看广州的相关信息
SELECT * FROM area WHERE name LIKE '%广州%';
+---------+-----------+-----------+------+----------+-----------+--------------+
| area_id | name | area_code | pid | left_num | right_num | top_layer_id |
+---------+-----------+-----------+------+----------+-----------+--------------+
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2906 | 2147 |
+---------+-----------+-----------+------+----------+-----------+--------------+
查看广州的所有孩子
SELECT c.*
FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.area_id = 2148
AND p.top_layer_id = 2147;
+---------+-----------+-----------+------+----------+-----------+--------------+
| area_id | name | area_code | pid | left_num | right_num | top_layer_id |
+---------+-----------+-----------+------+----------+-----------+--------------+
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2906 | 2147 |
| 3612 | 南沙区 | 440115 | 2148 | 2880 | 2881 | 2147 |
| 2161 | 从化市 | 440184 | 2148 | 2882 | 2883 | 2147 |
| 2160 | 增城市 | 440183 | 2148 | 2884 | 2885 | 2147 |
| 2159 | 花都区 | 440114 | 2148 | 2886 | 2887 | 2147 |
| 2158 | 番禺区 | 440113 | 2148 | 2888 | 2889 | 2147 |
| 2157 | 黄埔区 | 440112 | 2148 | 2890 | 2891 | 2147 |
| 2156 | 白云区 | 440111 | 2148 | 2892 | 2893 | 2147 |
| 2154 | 天河区 | 440106 | 2148 | 2894 | 2895 | 2147 |
| 2153 | 海珠区 | 440105 | 2148 | 2896 | 2897 | 2147 |
| 2152 | 越秀区 | 440104 | 2148 | 2898 | 2899 | 2147 |
| 2151 | 荔湾区 | 440103 | 2148 | 2900 | 2901 | 2147 |
| 2150 | 东山区 | 230406 | 2148 | 2902 | 2903 | 2147 |
| 2149 | 其它区 | 440189 | 2148 | 2904 | 2905 | 2147 |
+---------+-----------+-----------+------+----------+-----------+--------------+
查看 '广州' 所有孩子 和 深度 并显示层级关系
SELECT sub_child.area_id,
(COUNT(sub_parent.name) - 1) AS depth,
CONCAT(REPEAT(' ', (COUNT(sub_parent.name) - 1)), sub_child.name) AS name
FROM (
SELECT child.*
FROM area AS parent, area AS child
WHERE child.left_num BETWEEN parent.left_num AND parent.right_num
AND parent.area_id = 2148
AND parent.top_layer_id = 2147
) AS sub_child, (
SELECT child.*
FROM area AS parent, area AS child
WHERE child.left_num BETWEEN parent.left_num AND parent.right_num
AND parent.area_id = 2148
AND parent.top_layer_id = 2147
) AS sub_parent
WHERE sub_child.left_num BETWEEN sub_parent.left_num AND sub_parent.right_num
GROUP BY sub_child.area_id
ORDER BY sub_child.left_num;
+---------+-------+-------------+
| area_id | depth | name |
+---------+-------+-------------+
| 2148 | 0 | 广州市 |
| 3612 | 1 | 南沙区 |
| 2161 | 1 | 从化市 |
| 2160 | 1 | 增城市 |
| 2159 | 1 | 花都区 |
| 2158 | 1 | 番禺区 |
| 2157 | 1 | 黄埔区 |
| 2156 | 1 | 白云区 |
| 2154 | 1 | 天河区 |
| 2153 | 1 | 海珠区 |
| 2152 | 1 | 越秀区 |
| 2151 | 1 | 荔湾区 |
| 2150 | 1 | 东山区 |
| 2149 | 1 | 其它区 |
+---------+-------+-------------+
显示 '广州' 的直系祖先(包括自己)
SELECT p.*
FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND c.area_id = 2148
AND p.top_layer_id = 2147;
+---------+-----------+-----------+------+----------+-----------+--------------+
| area_id | name | area_code | pid | left_num | right_num | top_layer_id |
+---------+-----------+-----------+------+----------+-----------+--------------+
| 2147 | 广东省 | 440000 | 0 | 2580 | 2907 | 2147 |
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2906 | 2147 |
+---------+-----------+-----------+------+----------+-----------+--------------+
向 '广州' 插入一个地区 '北沙区'
-- 更新左右值
--------------
-- 这边我们关注影响的行数,明细比之前全表更新的少。
--------------
UPDATE area
SET left_num = left_num + 2
WHERE left_num > 2879
AND top_layer_id = 2147;
Query OK, 13 rows affected (0.03 sec)
Rows matched: 13 Changed: 13 Warnings: 0
UPDATE area
SET right_num = right_num + 2
WHERE right_num > 2879
AND top_layer_id = 2147;
Query OK, 15 rows affected (0.01 sec)
Rows matched: 15 Changed: 15 Warnings: 0
-- 插入 '北沙区' 信息
INSERT INTO area
SELECT NULL, '北沙区', '440116', 2148, left_num + 1, left_num + 2, 2147
FROM area WHERE area_id = 2148;
--查看是否满足要求
SELECT c.*
FROM area AS p, area AS c
WHERE c.left_num BETWEEN p.left_num AND p.right_num
AND p.area_id = 2148;
+---------+-----------+-----------+------+----------+-----------+--------------+
| area_id | name | area_code | pid | left_num | right_num | top_layer_id |
+---------+-----------+-----------+------+----------+-----------+--------------+
| 2148 | 广州市 | 440100 | 2147 | 2879 | 2908 | 2147 |
| 3613 | 北沙区 | 440116 | 2148 | 2880 | 2881 | 2147 |
| 3612 | 南沙区 | 440115 | 2148 | 2882 | 2883 | 2147 |
| 2161 | 从化市 | 440184 | 2148 | 2884 | 2885 | 2147 |
| 2160 | 增城市 | 440183 | 2148 | 2886 | 2887 | 2147 |
| 2159 | 花都区 | 440114 | 2148 | 2888 | 2889 | 2147 |
| 2158 | 番禺区 | 440113 | 2148 | 2890 | 2891 | 2147 |
| 2157 | 黄埔区 | 440112 | 2148 | 2892 | 2893 | 2147 |
| 2156 | 白云区 | 440111 | 2148 | 2894 | 2895 | 2147 |
| 2154 | 天河区 | 440106 | 2148 | 2896 | 2897 | 2147 |
| 2153 | 海珠区 | 440105 | 2148 | 2898 | 2899 | 2147 |
| 2152 | 越秀区 | 440104 | 2148 | 2900 | 2901 | 2147 |
| 2151 | 荔湾区 | 440103 | 2148 | 2902 | 2903 | 2147 |
| 2150 | 东山区 | 230406 | 2148 | 2904 | 2905 | 2147 |
| 2149 | 其它区 | 440189 | 2148 | 2906 | 2907 | 2147 |
| 1997 | 湖南省 | 430000 | 0 | 2908 | 3209 | 1997 |
+---------+-----------+-----------+------+----------+-----------+--------------+