JOIN关键字所起的效果最先并不是使用JOIN来做到的,ANSI SQL:1989所采用以下的格式:
FROM T1, T2 WHERE where_filter
同时LEFT JOIN和RIGHT JOIN是使用*=和=*来起到相同作用。
到了ANSI SQL:1992,就出现了JOIN:
FROM T1 <join_type> JOIN T2 ON <on_filter> WHERE where_filter
CROSS JOIN
返回的是2个表的笛卡尔积,左N-ROWS,右M-ROWS,即为M*N。
OUTER
OUTER一般和LEFT,RIGHT,JOIN在一起使用。
注意旧写法如果不注意,将会导致一种错误,下面是个例子(求没有订单的顾客)。
新写法:
SELECT C.CustomerID, CompanyName, OrderID FROM dbo.Customers AS C LEFT OUTER JOIN dbo.Orders AS O ON C.CustomerID = O.CustomerID WHERE O.CustomerID IS NULL;
返回了2个CUSTOMER,结果正确。
旧写法(*错误的写法):
SELECT C.CustomerID, CompanyName, OrderID FROM dbo.Customers AS C, dbo.Orders AS O WHERE C.CustomerID *= O.CustomerID AND O.CustomerID IS NULL;
在旧的写法中,返回了所有的91个顾客。为什么会这样呢,取决于WHERE和ON条件判断发生的时间的不同。在这里,WHERE里AND前,AND后是没有任何区别的,你不能控制在加入OUTER行之前到底执行那个过滤条件。
理论上,他是在加入OUTER行之前进行了过滤。实际上很明显,O.CustomerID IS NULL是肯定不满足WHERE条件的(在ORDER表里根本就没有Customer为空的数据),所以这个查询直接就进行左半的连接,返回了CUSTOMER的数据(91条)。
那如何修正旧的写法呢,可以按下面的方法:
SELECT C.CustomerID, CompanyName, OrderID FROM dbo.Customers AS C, dbo.Orders AS O WHERE C.CustomerID *= O.CustomerID GROUP BY C.CustomerID, CompanyName, OrderID HAVING OrderID IS NULL;
这样,就在加入OUTER ROW之前进行了条件的过滤。
CONTROLLING THE PHYSICAL JOIN EVALUATION ORDER
在JOIN过程中,SQLSERVER做了大量的内部优化,所以物理的执行顺序并不是真正的执行顺序。如果要强加于物理的顺序,可以
OPTION (FORCE ORDER);
但是,严重不推荐这样做,除非你对JOIN查询的情况完全的了解,同时去查询MSDN。
CONTROLLING THE LOGICAL JOIN EVALUATION ORDER
ON的使用:ON的位置的摆放很有讲究,请见下面的例子。
SELECT DISTINCT C.CompanyName AS customer, S.CompanyName AS supplier FROM dbo.Customers AS C LEFT OUTER JOIN dbo.Orders AS O ON O.CustomerID = C.CustomerID JOIN dbo.[Order Details] AS OD ON OD.OrderID = O.OrderID JOIN dbo.Products AS P ON P.ProductID = OD.ProductID JOIN dbo.Suppliers AS S ON S.SupplierID = P.SupplierID;
这个例子返回了1326行,可是我们想返回的是1328行,为什么会掉了2行,原因就是LEFT OUTER JOIN没有适用到后3个JOIN连接中。
可以修改这个例子,给后3个JOIN连接加上LEFT,那么如下所示:
SELECT DISTINCT C.CompanyName AS customer, S.CompanyName AS supplier FROM dbo.Customers AS C LEFT OUTER JOIN dbo.Orders AS O ON O.CustomerID = C.CustomerID LEFT OUTER JOIN dbo.[Order Details] AS OD ON OD.OrderID = O.OrderID LEFT OUTER JOIN dbo.Products AS P ON P.ProductID = OD.ProductID LEFT OUTER JOIN dbo.Suppliers AS S ON S.SupplierID = P.SupplierID;
这样,成功的返回了1328行。但是,有一种更好的写法,在最后对CUSTOMER进行RIGHT JOIN的组合,这样就能包含所有的客户:
SELECT DISTINCT C.CompanyName AS customer, S.CompanyName AS supplier FROM dbo.Orders AS O JOIN dbo.[Order Details] AS OD ON OD.OrderID = O.OrderID JOIN dbo.Products AS P ON P.ProductID = OD.ProductID JOIN dbo.Suppliers AS S ON S.SupplierID = P.SupplierID RIGHT OUTER JOIN dbo.Customers AS C ON O.CustomerID = C.CustomerID;
同样,也是返回了1328行。更好的更准确的写法如下,注意ON的位置变化:
SELECT DISTINCT C.CompanyName AS customer, S.CompanyName AS supplier FROM dbo.Customers AS C LEFT OUTER JOIN (dbo.Orders AS O JOIN dbo.[Order Details] AS OD ON OD.OrderID = O.OrderID JOIN dbo.Products AS P ON P.ProductID = OD.ProductID JOIN dbo.Suppliers AS S ON S.SupplierID = P.SupplierID) ON O.CustomerID = C.CustomerID;
这里使用()是为了更方便理解,其实不用()也可以。
SEMI JOINS
有些时候,使用EXISTS可以代替使用JOIN ON,而且使用EXISTS效率要高很多。
比如下面的查询:
SELECT DISTINCT C.CustomerID, C.CompanyName FROM dbo.Customers AS C JOIN dbo.Orders AS O ON O.CustomerID = C.CustomerID WHERE Country = N'Spain';
完全可以用下面的替换:
SELECT CustomerID, CompanyName FROM dbo.Customers AS C WHERE Country = N'Spain' AND EXISTS (SELECT * FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID);
JOIN ALGORITHMS
Index的位置放在那里,才是最有效率的(最优化的方案)?
差的性能<— 无索引(表审视每行的所有数据)—> 非聚集索引无覆盖(寻找+偏序审查+发现)—> 聚集索引(寻找+部分审查)—> 非聚集索引集覆盖(寻找+部分审查)—> 非聚集索引包含非KEY的列(寻找+部分审查)—> 最好的性能