left join 的一些问题,不一定以左表为准,数据行数减少或者数据增多

 

left join 完,行数一定等于左表行数吗? 不一定!!

实验如下:

SELECT * FROM hdd_test1 a LEFT JOIN hdd_test2 b ON a.a_id = b.b_id LIMIT 144

8 rows (134ms)

id created_at updated_at a_id id created_at updated_at b_id
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1 1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1
2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2 2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1 3 2018-11-05 20:44:00.0 2018-11-05 20:44:00.0 1
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1 4 2018-11-05 20:44:04.0 2018-11-05 20:44:04.0 1
3 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 3        
4 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 4        
5 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 5        
6 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 6        

hdd_test1有6条记录, hdd_test2 有4条记录,直接left join ,产生6条记录,别高兴太早,看下面的查询:

SELECT * FROM hdd_test1 a LEFT JOIN hdd_test2 b ON a.a_id = b.b_id WHERE b.b_id > 1 LIMIT 144

1 row (37ms)查看CSV

id created_at updated_at a_id id created_at updated_at b_id
2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2 2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2

 

如图所示,当where 条件在外,会先聚合后过滤,所以结果会变少

 

还有一种情况需要注意,left join 结果的行数也可能大于左表:

SELECT * FROM hdd_test1 LIMIT 144

2 rows (33ms)查看CSV

id created_at updated_at a_id
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1
2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2
SELECT * FROM hdd_test2 LIMIT 144

4 rows (34ms)查看CSV

id created_at updated_at b_id
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1
2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2
3 2018-11-05 20:44:00.0 2018-11-05 20:44:00.0 1
4 2018-11-05 20:44:04.0 2018-11-05 20:44:04.0 1
SELECT * FROM hdd_test1 a LEFT JOIN hdd_test2 b ON a.a_id = b.b_id LIMIT 144

4 rows (110ms)查看CSV

id created_at updated_at a_id id created_at updated_at b_id
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1 1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1
2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2 2 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 2
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1 3 2018-11-05 20:44:00.0 2018-11-05 20:44:00.0 1
1 2018-10-09 10:10:10.0 2018-10-09 10:10:10.0 1 4 2018-11-05 20:44:04.0 2018-11-05 20:44:04.0 1

看到没? 产生了四条记录,因为这个时候b 有重复的b_id,如果这时候计算a表值的count 或者sum ,结果会比预想的大。

虽然很简单,但是很容易犯错误

你可能感兴趣的:(问题bug)