开发随笔——NOT IN vs NOT EXISTS

原文: 开发随笔——NOT IN vs NOT EXISTS

原文出处: http://blog.csdn.net/dba_huangzj/article/details/31374037  转载请引用

之前在论坛中见到一个针对in/exists的讨论,原帖懒得找了,这里介绍一下最近的学习小结:

NOT IN和NOT EIXTS在对允许为null的列查询时会有一定的风险。特别是NOT IN,如果子查询包含了最少一个NULL,会出现非预期的结果。下面做一个演示。

 

IF OBJECT_ID('ShipmentItems', 'U') IS NOT NULL 

    DROP TABLE dbo.ShipmentItems; 

GO 

CREATE TABLE dbo.ShipmentItems 

    ( 

      ShipmentBarcode VARCHAR(30) NOT NULL , 

      Description VARCHAR(100) NULL , 

      Barcode VARCHAR(30) NOT NULL 

    ); 

GO 

INSERT  INTO dbo.ShipmentItems 

        ( ShipmentBarcode , 

          Barcode , 

          Description 

        ) 

        SELECT  '123456' , 

                '1010203' , 

                'Some cool widget' 

        UNION ALL 

        SELECT  '123654' , 

                '1010203' , 

                'Some cool widget' 

        UNION ALL 

        SELECT  '123654' , 

                '1010204' , 

                'Some cool stuff for some gadget'; 

GO 

-- retrieve all the items from shipment 123654 

-- that are not shipped in shipment 123456 

SELECT  Barcode 

FROM    dbo.ShipmentItems 

WHERE   ShipmentBarcode = '123654' 

        AND Barcode NOT IN ( SELECT Barcode 

                             FROM   dbo.ShipmentItems 

                             WHERE  ShipmentBarcode = '123456' ); 

/* 

Barcode 

------------------------------ 

1010204

*/


 

可以看出得到了期待结果。下面看看修改表结构,允许列为null的情况:

ALTER TABLE dbo.ShipmentItems 

ALTER COLUMN Barcode VARCHAR(30) NULL; 

INSERT  INTO dbo.ShipmentItems 

        ( ShipmentBarcode , 

          Barcode , 

          Description 

        ) 

        SELECT  '123456' , 

                NULL , 

                'Users manual for some gadget'; 

GO 

SELECT  Barcode 

FROM    dbo.ShipmentItems 

WHERE   ShipmentBarcode = '123654' 

        AND Barcode NOT IN ( SELECT Barcode 

                             FROM   dbo.ShipmentItems 

                             WHERE  ShipmentBarcode = '123456' ); 

/* 

Barcode 

------------------------------

*/


很多人会觉得这是一个bug,有时候能查出数据,有时候却不能。但是实际上不是bug,当NOT IN子句返回最少一个NULL时,查询会返回空,下面的语句能更好地说明这个想法:

SELECT  CASE WHEN 1 NOT IN ( 2, 3 ) THEN 'True' 

             ELSE 'Unknown or False' 

        END , 

        CASE WHEN 1 NOT IN ( 2, 3, NULL ) THEN 'True' 

             ELSE 'Unknown or False' 

        END; 

/* 

---- ---------------- 

True Unknown or False 

*/


实际上,由于IN的本质是OR操作,所以:


SELECT  CASE WHEN 1 IN ( 1, 2, NULL ) THEN 'True' 

             ELSE 'Unknown or False' 

        END ;

中,1 in 1,也就是为TRUE,所以返回true,这个语句的逻辑实际上是:

SELECT  CASE WHEN ( 1 = 1 ) 

                  OR ( 1 = 2 ) 

                  OR ( 1 = NULL ) THEN 'True' 

             ELSE 'Unknown or False' 

        END ;

 


当使用NOT IN 时,如下面的语句:

SELECT  CASE WHEN 1 NOT IN ( 1, 2, NULL ) THEN 'True' 

             ELSE 'Unknown or False' 

        END ;


会转变成:

SELECT  CASE WHEN NOT ( ( 1 = 1 ) 

                        OR ( 1 = 2 ) 

                        OR ( 1 = NULL ) 

                      ) THEN 'True' 

             ELSE 'Unknown or False' END ;


根据离散数学的概念,可以转换为:

SELECT  CASE WHEN ( ( 1 <> 1 ) 

                    AND ( 1 <> 2 ) 

                    AND ( 1 <> NULL ) 

                  ) THEN 'True' 

             ELSE 'Unknown or False' 

        END ;


谓词有短路特性,即在AND条件中,只要有一个条件为false,整个条件都为false,而1<>1是为false,所以后面的也不需要判断了,直接返回else部分。即使是1<>null,根据集合论的特性,NULL和实际数据的对比总是返回unknown,所以也是为false。如果你非要用NOT IN ,请确保子查询永远不会有NULL返回。或者需要额外处理去除NULL,比如:

SELECT  Barcode 

FROM    dbo.ShipmentItems 

WHERE   ShipmentBarcode = '123654' 

  AND Barcode NOT IN ( SELECT Barcode 

                       FROM   dbo.ShipmentItems 

                       WHERE  ShipmentBarcode = '123456' 

                         AND Barcode IS NOT NULL ) ;


还有一种方法就是改写语句,用NOT EXISTS来等价替换:

SELECT  i.Barcode 

FROM    dbo.ShipmentItems AS i 

WHERE   i.ShipmentBarcode = '123654' 

        AND NOT EXISTS ( SELECT * 

                         FROM   dbo.ShipmentItems AS i1 

                         WHERE  i1.ShipmentBarcode = '123456' 

                                AND i1.Barcode = i.Barcode ); 

/* 

Barcode 

------------------------------ 

1010204

*/


另外,基于SARG要求,一般不建议用NOT IN/NOT EXISTS这种反向扫描,避免影响性能。还有一个选择使用IN/EXISTS的要点,就是多列匹配的问题,在T-SQL中,多列同时匹配要用EXISTS,而单列匹配可以用EXISTS/IN。可能可以用其他写法来实现IN的多列匹配,但是一般我个人会选择使用EXISTS来匹配多列。


原文出自:CSDN博客:黄钊吉的博客

你可能感兴趣的:(not exists)