Login

This issue came up when I got different records counts for what I thought were identical queries one using a `not in` `where` constraint and the other a `left join`. The table in the `not in` constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.

To state it simply, why does query A return a result but B doesn't?

A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)

This was on SQL Server 2005. I also found that calling `set ansi_nulls off` causes B to return a result.

In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)

In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.

Compare to null is undefined, unless you use IS NULL.

So, when comparing 3 to NULL (query A), it returns undefined.

I.e. SELECT 'true' where 3 in (1,2,null)
and
SELECT 'true' where 3 not in (1,2,null)

will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE

Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.

Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned

In the second however you can reduce it to

*select 'true' where 3 not in (null)*

So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.

also this might be of use to know the logical difference between join, exists and in

[To see links please register here]

Query A is the same as:

select 'true' where 3 = 1 or 3 = 2 or 3 = 3 or 3 = null

Since `3 = 3` is true, you get a result.

Query B is the same as:

select 'true' where 3 <> 1 and 3 <> 2 and 3 <> null

When `ansi_nulls` is on, `3 <> null` is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.

When `ansi_nulls` is off, `3 <> null` is true, so the predicate evaluates to true, and you get a row.

The title of this question at the time of writing is

> SQL NOT IN constraint and NULL values

From the text of the question it appears that the problem was occurring in a SQL DML `SELECT` query, rather than a SQL DDL `CONSTRAINT`.

However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)

> When the predicate evaluates to UNKNOWN you don't get any rows.

Although this is the case for SQL DML, when considering constraints the effect is different.

Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by @Brannon):

DECLARE @T TABLE
(
true CHAR(4) DEFAULT 'true' NOT NULL,
CHECK ( 3 IN (1, 2, 3, NULL )),
CHECK ( 3 NOT IN (1, 2, NULL ))
);

INSERT INTO @T VALUES ('true');

SELECT COUNT(*) AS tally FROM @T;

As per @Brannon's answer, the first constraint (using `IN`) evaluates to TRUE and the second constraint (using `NOT IN`) evaluates to UNKNOWN. **However**, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result.

The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec

> **7.6 where clause**
>
> The result of the <where clause> is a table of those rows of T for
> which the result of the search condition is true.
>
>
> **4.10 Integrity constraints**
>
> A table check constraint is satisfied if and only if the specified
> search condition is not false for any row of a table.

In other words:

In SQL DML, rows are removed from the result when the `WHERE` evaluates to UNKNOWN because it **does not** satisfy the condition "is true".

In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it **does** satisfy the condition "is not false".

Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)

p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!

IF you want to filter with NOT IN for a subquery containg NULLs justcheck for not null

SELECT blah FROM t WHERE blah NOT IN
(SELECT someotherBlah FROM t2 WHERE someotherBlah IS NOT NULL )

`NOT IN` returns 0 records when compared against an unknown value
-----------------------------------------------------------------

Since `NULL` is an unknown, a `NOT IN` query containing a `NULL` or `NULL`s in the list of possible values will always return `0` records since there is no way to be sure that the `NULL` value is not the value being tested.

It may be concluded from answers here that `NOT IN (subquery)` doesn't handle nulls correctly and should be avoided in favour of `NOT EXISTS`. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is `NOT IN` that handles nulls correctly and returns the correct result, rather than `NOT EXISTS`.

Consider a table `sp` to represent suppliers (`sno`) who are known to supply parts (`pno`) in quantity (`qty`). The table currently holds the following values:

VALUES ('S1', 'P1', NULL),
('S2', 'P1', 200),
('S3', 'P1', 1000)

Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.

The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.

The following uses `NOT IN` to correctly identify supplier 'S2' only:

WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND 1000 NOT IN (
SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
);

However, the below query uses the same general structure but with `NOT EXISTS` but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):

WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND NOT EXISTS (
SELECT *
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
AND spy.qty = 1000
);

So `NOT EXISTS` is not the silver bullet it may have appeared!

Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.

This can be achieved (among other possible designs) using two tables:

- `sp` suppliers known to supply parts
- `spq` suppliers known to supply parts in known quantities

noting there should probably be a foreign key constraint where `spq` references `sp`.

The result can then be obtained using the 'minus' relational operator (being the `EXCEPT` keyword in Standard SQL) e.g.

WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1' ),
( 'S2', 'P1' ),
( 'S3', 'P1' ) )
AS T ( sno, pno )
),
spq AS
( SELECT *
FROM ( VALUES ( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT sno
FROM spq
WHERE pno = 'P1'
EXCEPT
SELECT sno
FROM spq
WHERE pno = 'P1'
AND qty = 1000;

cathedralic209231

laguerre725252

flechette70937

pteridospermjyta

orthodontia390

aydenh

blepharocarcinomarkalee

Profabienzdbfc

spilikin80749

manidae723087