原文地址
MySQL的CTE有两种,一种是非递归的方式,另一种是递归的方式。
我们为什么需要使用CTE?
在同一个查询中不可能两次引用派生表。因此,派生表查询会计算两次或两次以上,这表明存在严重的性能问题。使用CTE,子查询只计算一次。
CTE 非递归方式
我们通常对派生表的使用是这样子的:
SELECT... FROM (subquery) AS derived, t1 ...
subquery这个子查询是放在FROM子句中。
CTE的语法如下所示:
SELECT... WITH derived AS (subquery) SELECT ... FROM derived, t1 ...
这个subquery子查询是放在WITH AS子句中的,放在SELECT/UPDATE/DELETE,包括WITH derived AS 之句之前。
假如你要找出每年的薪资同比上一年上涨的百分比,如果不使用CTE,你需要两个子查询,并且这两个子查询相同,MySQL并不能识别出这两个是相同的查询,从而导致查询两次:
mysql> SELECT q1.year, q2.year AS next_year, q1.sum, q2.sum AS next_sum, 100*(q2.sum-q1.sum)/q1.sum AS pct FROM (SELECT year(from_date) as year, sum(salary) as sum FROM salaries GROUP BY year) AS q1, (SELECT year(from_date) as year, sum(salary) as sum FROM salaries GROUP BY year) AS q2 WHERE q1.year = q2.year-1; +------+-----------+-------------+-------------+----- -----+ | year | next_year | sum | next_sum | pct |+ ------+-----------+-------------+-------------+----- -----+ | 1985 | 1986 | 972864875 | 2052895941 | 111.0155 | | 1986 | 1987 | 2052895941 | 3156881054 | 53.7770 | | 1987 | 1988 | 3156881054 | 4295598688 | 36.0710 | | 1988 | 1989 | 4295598688 | 5454260439 | 26.9732 | | 1989 | 1990 | 5454260439 | 6626146391 | 21.4857 | | 1990 | 1991 | 6626146391 | 7798804412 | 17.6974 | | 1991 | 1992 | 7798804412 | 9027872610 | 15.7597 | | 1992 | 1993 | 9027872610 | 10215059054 | 13.1502 | | 1993 | 1994 | 10215059054 | 11429450113 | 11.8882 | | 1994 | 1995 | 11429450113 | 12638817464 |
如果使用非递归CTE的方式就可以重用上次查询结果,那么就只需要查询一次即可:
mysql> WITH CTE AS (SELECT year(from_date) AS year, SUM(salary) AS sum FROM salaries GROUP BY year) SELECT q1.year, q2.year as next_year, q1.sum, q2.sum as next_sum, 100*(q2.sum-q1.sum)/q1.sum as pct FROM CTE AS q1, CTE AS q2 WHERE q1.year = q2.year-1; +------+-----------+-------------+-------------+----- -----+ | year | next_year | sum | next_sum | pct |+ ------+-----------+-------------+-------------+----- -----+ | 1985 | 1986 | 972864875 | 2052895941 | 111.0155 | | 1986 | 1987 | 2052895941 | 3156881054 | 53.7770 | | 1987 | 1988 | 3156881054 | 4295598688 | 36.0710 | | 1988 | 1989 | 4295598688 | 5454260439 | 26.9732 | | 1989 | 1990 | 5454260439 | 6626146391 | 21.4857 | | 1990 | 1991 | 6626146391 | 7798804412 | 17.6974 | | 1991 | 1992 | 7798804412 | 9027872610 | 15.7597 | | 1992 | 1993 | 9027872610 | 10215059054 | 13.1502 | | 1993 | 1994 | 10215059054 | 11429450113 | 11.8882 | | 1994 | 1995 | 11429450113 | 12638817464 | 10.5812 | | 1995 | 1996 | 12638817464 | 13888587737 | 9.8883 | | 1996 | 1997 | 13888587737 | 15056011781 | 8.4056 | | 1997 | 1998 | 15056011781 | 16220495471 | 7.7343 | | 1998 | 1999 | 16220495471 | 17360258862 | 7.0267 | | 1999 | 2000 | 17360258862 | 17535667603 | 1.0104 | | 2000 | 2001 | 17535667603 | 17507737308 | -0.1593 | | 2001 | 2002 | 17507737308 | 10243358658 | -41.4924 | +------+-----------+-------------+-------------+----- -----+ 17 rows in set (1.63 sec)
查询结果一样,性能提升近50%;
另外,派生查询是不可以相互引用的:
SELECT ... FROM (SELECT ... FROM ...) AS d1, (SELECT ... FROM d1 ...) AS d2 ... ERROR: 1146 (42S02): Table ‘db.d1’ doesn’t exist
上面先中一个查询标记为d1,然后在后面的查询中再次查询d1,这是不允许的。
而CTE的方式是可以相互引用的:
WITH d1 AS (SELECT ... FROM ...), d2 AS (SELECT ... FROM d1 ... ) SELECT FROM d1, d2 ...
d1和d2分别是两个子查询,但d2是查询d1的结果集的。
总结一下,非递归的CTE中,先使用WITH AS定义子查询,多个子查询之间用逗号分隔,然后再使用SELETE语句,并通过名称引用之前定义子查询。
CTE 递归方式
递归的方式是CTE的子查询可以引用其本身,使用递归方式时,WITH子句中要使用WITH RECURSIVE代替。递归CTE子句中必须包含两个部分,一个是种子查询(不可引用自身),另一个是递归查询,这两个子查询可以通过 UNION、UNION ALL或UNION DISTINCT 连接在一起。
种子SELECT只会执行一次,并得到初始的数据子集,而递归SELECT是会重复执行直到没有新的行产生为止,最终将所有的结果集都查询出来,这对于深层查询(如具有父子关系的查询)是非常有用的。
举个简单的例子,假如你要打印从1到5这5个数,使用递归CTE如下所示:
mysql> WITH RECURSIVE cte (n) AS ( SELECT 1 /* seed query */ UNION ALL SELECT n + 1 FROM cte WHERE n < 5 /* recursive query */ )SELECT * FROM cte; +---+ | n | +---+ | 1 | | 2 | | 3 | | 4 | | 5 | +---+ 5 rows in set (0.00 sec)
我们先来看下WITH RECURSIVE子句:
cte是子查询的名称,(n)是列,子查询语句为(SELECT 1 UNION ALL SELECT n+1 FROM cte WHERE n < 5),其中SELECT 1是种子SELECT,只执行一次,而SELECT n+1 FROM cte WHERE n<5是递归SELECT,也就是说这个递归查询会一直执行,直到n的值不小于5为止,注意在递归SELECT中引用于自身cte。子查询定义好后,再用一个SELECT来查询这个cte即可。
假如你要查询公司的组织架构数据,查询管理层级。
创建一个测试表:
mysql> CREATE TABLE employees_mgr ( id INT PRIMARY KEY NOT NULL, name VARCHAR(100) NOT NULL, manager_id INT NULL, INDEX (manager_id), FOREIGN KEY (manager_id) REFERENCES employees_mgr (id) );
插入样例数据:
mysql> INSERT INTO employees_mgr VALUES (333, "Yasmina", NULL), # Yasmina is the CEO (manager_id is NULL) (198, "John", 333), # John has ID 198 and reports to 333 (Yasmina) (692, "Tarek", 333), (29, "Pedro", 198), (4610, "Sarah", 29), (72, "Pierre", 29), (123, "Adil", 692);
执行递归CTE:
mysql> WITH RECURSIVE employee_paths (id, name, path) AS ( SELECT id, name, CAST(id AS CHAR(200)) FROM employees_mgr WHERE manager_id IS NULL UNION ALL SELECT e.id, e.name, CONCAT(ep.path, ',', e.id) FROM employee_paths AS ep JOIN employees_mgr AS e ON ep.id = e.manager_id )SELECT * FROM employee_paths ORDER BY path;
结果如下所示:
+------+---------+-----------------+ | id | name | path | +------+---------+-----------------+ | 333 | Yasmina | 333 | | 198 | John | 333,198 | | 29 | Pedro | 333,198,29 | | 4610 | Sarah | 333,198,29,4610 | | 72 | Pierre | 333,198,29,72 | | 692 | Tarek | 333,692 | | 123 | Adil | 333,692,123 | +------+---------+-----------------+ 7 rows in set (0.00 sec)
在path这一列就能看到管理层级的关系,333是最高的领导者,4610、72和123是小兵。
总结:
通常在查询树形结构时使用WITH RECURSIVE CTE查询,先定义子查询和数据列,再通过SELECT查询这个CTE子句即可。
如果只是简单的使用多个相同子查询就用非递归CTE,效率高哦~?
IT资源下载