Functionally, they are the same (as compared to NOT IN vs NOT EXISTS which are functionally different in one scenario - read this post for the differences between NOT IN and NOT EXISTS clauses : http://decipherinfosys.wordpress.com/2007/01/21/32/ ). However, there are performance implications of using one over the other that one needs to be aware of. Assume that we have two tables : TABLE_A and TABLE_B and the match is being done on TABLE_A.col1 = TABLE_B.col2. In that scenario, an in statement like:
select <select column list> from TABLE_A where col1 in (Select col2 from TABLE_B)
will get processes in this way:
1) The sub-query gets evaluated first and the results are distinct’ed and indexed,
2) The output from it is then joined with TABLE_A.
Re-writing the above query using the EXISTS clause will give:
Select <select column list> from TABLE_A
where exists (select 1 from Table_B where Table_B.col2 = Table_A.col1)
This gets evaluated in this order:
1) For every value of Table_A.col1, loop through and match the values in Table_B.col2.
2) If we get a match, select that value and move on to the next one. If there is no match, discard that value.
So, where should one use an IN vs the EXISTS clause? If the result of the sub-query “Select col2 from TABLE_B” is huge and the TABLE_A is a relatively small set and executing “select 1 from Table_B where Table_B.col2 = Table_A.col1″ is very fast because of proper index on Table_B.col2, then an exists clause will be better since the optimizer can do a FTS on Table_A and then use the index to do the probe/seek operations for Table_B.
If the result of the sub-query is small, then the IN clause is much faster. If the results of the both the sub-query as well as the outer query is large, then either IN or EXISTS would work the same - it depends upon your indexing scheme.
Please do note that the example used above is a very simplistic one in order to illustrate the point - in real world, you would have queries that have additional filter criteria on those tables that narrows down the result sets. As a generic rule, if the result of the outer query is small and the result set of the inner sub-query is large, then use EXISTS - if it is the other way around, then use the IN clause.
Tips:
SQL: Where column > (subquery)
1.This is known as a correlated subquery because the subquery references the outer query in the subqueries WHERE clause. Internally, correlated subqueries are very expensive to process because the inner query must be executed for every row returned by the outer query.
Where exists (subquery)
2.The WHERE EXISTS subquery is used when we want to display all rows where we have a matching column in both tables. In most cases, this type of subquery can be re-written with a standard join to improve performance.
Where not exists (subquery)
3.As a general rule, the use of the NOT EXISTS subqueries are discouraged because the query can often be re-written as a standard join with much faster performance.
Where column not in (subquery)
4.There are serious problems with subqueries that may return NULL values. It is a good idea to discourage the use of the NOT IN clause (which invokes a subquery) and to prefer NOT EXISTS (which invokes a correlated subquery), since the query returns no rows if any rows returned by the subquery contain null values.
not exists不存在,也就是说后面的括号中只要返回了数据那么这个条件就不存在了,可以理解为括号前的not
exists是一个左表达式 ,括号后的查询是一个右表达式,只有当右表达式返回的也是not exists(即后面的查询出来的结果是非空的)时,等式才成立。
IN适合于外表大而内表小的情况;EXISTS适合于外表小而内表大的情况。