Skip to content

fix: handle IS TRUE correctly in EliminateOuterJoin#22444

Open
neilconway wants to merge 1 commit into
apache:mainfrom
neilconway:neilc/fix-outer-join-is-true
Open

fix: handle IS TRUE correctly in EliminateOuterJoin#22444
neilconway wants to merge 1 commit into
apache:mainfrom
neilconway:neilc/fix-outer-join-is-true

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 21, 2026

Which issue does this PR close?

Rationale for this change

EliminateOuterJoin needs to identify "null-rejecting" columns; a column is null-rejecting with respect to an expression if a NULL value in the column yields a NULL or false value for the expression.

This analysis was unsound with respect to IS TRUE, IS FALSE, and IS NOT UNKNOWN operators: those operators are null-rejecting at the toplevel of the WHERE clause, but they may not be null-rejecting when nested inside an expression tree. The analysis checked this correctly for IS NOT NULL but neglected to apply similar logic for these other three operators. This resulted in incorrectly converting outer joins to inner joins in some cases, producing incorrect query results.

As part of fixing this, this PR also makes a bunch of improvements to the null-rejection analysis, enumerated below, resulting in more accurate null-rejection analysis.

What changes are included in this PR?

  • Rename extract_non_nullable_columns to extract_null_rejecting_columns: "non_nullable" is a property of a column, "null-rejecting" is a more complex property describing the relationship between a column and an expression.
  • Use Operator::returns_null_on_null() instead of maintaining a hand-rolled and very incomplete list of null-propagating binary operators. We now compute null-rejection correctly for arithmetic, bitwise, and regex operators, for example.
  • Handle null-rejection correctly for Expr::Negative
  • Rewrite the logic for handling OR and nested AND operators to be more clear, and also more efficient
  • Rewrite and expand comments throughout for clarity
  • Add unit and SLT tests

Are these changes tested?

Yes; new unit and SLT tests added.

Are there any user-facing changes?

Yes, query result correctness fix.

The null-rejection analysis used by `EliminateOuterJoin` to decide
whether an outer join can be tightened had a soundness gap:

`IS TRUE` / `IS FALSE` / `IS NOT UNKNOWN` recursed into their operand
at any depth. These predicates return a definite FALSE on NULL input
(never NULL), so they're null-rejecting at the top of a WHERE clause
but not under a surrounding `NOT` (or any context that resets
`top_level`), where the FALSE inverts to TRUE and accepts NULL rows.
The existing `IS NOT NULL` arm already gated on `top_level`; the
three others now share the same gate.

While here, clean up several other rough edges:

* Defer to `Operator::returns_null_on_null()` instead of maintaining
  a hand-rolled list of null-propagating binary operators in
  `extract_non_nullable_columns`. The hand list previously only
  covered the six comparison operators; the upstream method is the
  single source of truth and also picks up arithmetic / bitwise /
  regex / SIMILAR TO / array-containment operators that were missed.
* Rename `extract_non_nullable_columns` to
  `extract_null_rejecting_columns`. The returned columns are columns
  the predicate rejects NULL on (a predicate property), not columns
  that are themselves non-nullable (a schema property). The two are
  distinct.
* Rewrite the OR / nested-AND merge logic from an O(n*m) double-loop
  with break into bucketed per-side lookups.
* Handle `Expr::Negative` alongside `Expr::Not` (both propagate NULL
  on their operand).
* Unify `IsNotNull` / `IsTrue` / `IsFalse` / `IsNotUnknown` into a
  single match arm now that they share the same `top_level` gate.
* Rewrite the function-level doc and inline comments.

Tests:

* 7 new unit tests covering the soundness fix (three
  `NOT(... IS TRUE/FALSE/NOT UNKNOWN)` regressions), the IS DISTINCT
  FROM / IS NOT DISTINCT FROM negative cases, and positive coverage
  for the arithmetic / unary-minus shapes that the operator-list
  extension enables.
* 1 new sqllogictest case asserting both the logical plan
  (`Left Join` preserved) and the row set (LEFT-padded NULL rows
  survive) for `WHERE NOT((t2.y > 150) IS TRUE)` — the row-set
  assertion is what catches a future regression that an explain-only
  test would let through under snapshot updates.
@github-actions github-actions Bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 21, 2026
@neilconway neilconway changed the title fix: handle IS TRUE properly in EliminateOuterJoin fix: handle IS TRUE correctly in EliminateOuterJoin May 21, 2026
@neilconway
Copy link
Copy Markdown
Contributor Author

@SubhamSinghal @2010YOUY01 FYI, I'd love feedback on this if you have a moment (related to #21549)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Outer join elimination does not handle IS TRUE correctly

1 participant