Skip to content

[SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in InsertIntoStatement/V2WriteCommand table slot#56024

Closed
haoyangeng-db wants to merge 2 commits into
apache:masterfrom
haoyangeng-db:spark-46625-followup-resolve-identifier
Closed

[SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in InsertIntoStatement/V2WriteCommand table slot#56024
haoyangeng-db wants to merge 2 commits into
apache:masterfrom
haoyangeng-db:spark-46625-followup-resolve-identifier

Conversation

@haoyangeng-db
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER placeholder in command name slot").

SPARK-46625 moved PlanWithUnresolvedIdentifier from wrapping the whole command into the command's identifier slot at parse time. For InsertIntoStatement and V2WriteCommand the placeholder now lives in .table, which is a non-child LogicalPlan slot (override def child = query). That PR correctly added explicit recursion for that slot in BindParameters (parameter binding) and ResolveIdentifierClause (materializing the placeholder once identifierExpr is resolved), but the same recursion was missing from ResolveReferences, which owns column / variable resolution.

This PR adds two cases at the top of ResolveReferences.doApply that mirror the existing pattern: when InsertIntoStatement.table or V2WriteCommand.table is an unresolved PlanWithUnresolvedIdentifier, resolve identifierExpr via resolveExpressionByPlanChildren(..., includeLastResort = true) (which runs the resolveColsLastResortpath:resolveVariables compose resolveOuterRef). The !identifierExpr.resolved` guard keeps the cases idempotent under bottom-up traversal.

Why are the changes needed?

Without this, INSERT INTO IDENTIFIER(<sql-variable>) ... fails analysis: the UnresolvedAttribute for the variable name sitting inside PlanWithUnresolvedIdentifier.identifierExpr is never rewritten to a VariableReference. Since ResolveIdentifierClause only fires when identifierExpr.resolved && childrenResolved, the placeholder never materializes; the plan reaches PreprocessTableInsertion with an unresolved attribute and errors out (e.g. UNSUPPORTED_INSERT.RDD_BASED).

Repro on master before this fix:

CREATE TABLE t (a INT) USING PARQUET;
DECLARE OR REPLACE VARIABLE target_table STRING;
SET VAR target_table = 't';
INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a;

The same shape applies to OverwriteByExpression.table (e.g. REPLACE WHERE, REPLACE ON, REPLACE USING variants of INSERT) - fixed by the same V2WriteCommand case.

Does this PR introduce any user-facing change?

No. Bug-fix only.

How was this patch tested?

New test added.

Was this patch authored or co-authored using generative AI tooling?

Co-authored with Claude Code.

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the followup. The fix is well-targeted: parent SPARK-46625 moved PlanWithUnresolvedIdentifier into the command's identifier slot, and for InsertIntoStatement.table / V2WriteCommand.table that slot is non-child (child = query), so the default ResolveReferences traversal never visits identifierExpr. BindParameters and ResolveIdentifierClause already recurse explicitly into the slot; this PR adds the matching recursion in ResolveReferences so an UnresolvedAttribute (a SQL-variable name) inside identifierExpr is rewritten to VariableReference before ResolveIdentifierClause materializes the placeholder. End-to-end test placement in ParametersSuite is consistent with the existing SPARK-46625 tests.

One design comment on resolution scope, one cleanup, and a test-coverage suggestion inline.

Resolution scope. The new cases call resolveExpressionByPlanChildren(p.identifierExpr, i / w, includeLastResort = true). Because i.children = Seq(query) (and same for w), the resolver searches query.output when resolving nested UnresolvedAttributes in identifierExpr. That differs from how a PlanWithUnresolvedIdentifier is resolved when it sits in a regular child slot: the generic fallback at Analyzer.scala:1865 runs q.mapExpressions(resolveExpressionByPlanChildren(_, q, includeLastResort = true)) with q = PlanWithUnresolvedIdentifier, whose children are Nil for the INSERT/OverwriteByExpression path built by buildWriteTableSlot (AstBuilder.scala:1186-1193). With children = Nil, column resolution is a no-op and only the last-resort variable resolution fires. The narrow consequence of the wider scope here is that a SQL variable whose name matches a query output column resolves to the column instead, and IdentifierResolution.evalIdentifierExpr then throws NOT_A_CONSTANT_STRING.NOT_CONSTANT. Passing p instead of i / w mirrors the regular flow and avoids the regression.

Cleanup. The InsertIntoStatement case wraps the result in resolveColumnDefaultInCommandInputQuery(...), which short-circuits unless i.table.resolved. Here i.table is still a PlanWithUnresolvedIdentifier, so the call is a no-op; the column-default work is picked up later by the existing fallback case i: InsertIntoStatement once ResolveIdentifierClause has materialized the table. The V2WriteCommand case doesn't wrap, so the asymmetry is worth removing.

Comment phrasing (nit). "Mirror the recursion that BindParameters and ResolveIdentifierClause already do for the same shape" reads as if those rules share resolution semantics with this one. What's actually shared is the structural recursion into the non-child slot — those rules bind parameters / materialize on an already-resolved expression rather than performing attribute resolution.

!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
val resolvedExpr = resolveExpressionByPlanChildren(
p.identifierExpr, i, includeLastResort = true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this resolve against p rather than i? Passing i widens the resolution scope to i.children = Seq(query), so a nested UnresolvedAttribute in identifierExpr can match a query output column. With a colliding name like

DECLARE VARIABLE a STRING DEFAULT 't_shadow';
CREATE TABLE t_shadow (a INT) USING PARQUET;
INSERT INTO IDENTIFIER(a) SELECT 42 AS a;

the column wins and IdentifierResolution.evalIdentifierExpr then throws NOT_A_CONSTANT_STRING.NOT_CONSTANT (since an AttributeReference is not foldable). Resolving against p (whose children are Nil for the INSERT/OverwriteByExpression path built by buildWriteTableSlot) mirrors how PlanWithUnresolvedIdentifier is resolved in regular child slots via the generic fallback at line 1865 — column resolution becomes a no-op and only the last-resort variable resolution fires.

Same point applies to the V2WriteCommand case below (line 1561).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
val resolvedExpr = resolveExpressionByPlanChildren(
p.identifierExpr, i, includeLastResort = true)
resolveColumnDefaultInCommandInputQuery(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrap looks like a no-op in this iteration: ResolveColumnDefaultInCommandInputQuery.apply short-circuits unless i.table.resolved, and i.table here is still a PlanWithUnresolvedIdentifier (just with the identifier expression now resolved). The column-default work happens later via the existing case i: InsertIntoStatement => resolveColumnDefaultInCommandInputQuery(i) once ResolveIdentifierClause has materialized the table. The V2WriteCommand case below doesn't wrap — consider dropping the wrap here too for symmetry:

Suggested change
resolveColumnDefaultInCommandInputQuery(
i.copy(table = p.copy(identifierExpr = resolvedExpr))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
…tion scope

Address review feedback on apache#56024:

- Resolve identifierExpr against the PlanWithUnresolvedIdentifier itself
  (whose children are Nil on the INSERT / OverwriteByExpression path built
  by buildWriteTableSlot) instead of the surrounding InsertIntoStatement /
  V2WriteCommand. This keeps query output columns out of scope, so a SQL
  variable name in IDENTIFIER(<name>) cannot be shadowed by a column from
  the SELECT and only the last-resort variable resolution path fires.
- Drop the no-op resolveColumnDefaultInCommandInputQuery wrap on the
  InsertIntoStatement case for symmetry with the V2WriteCommand case
  (it short-circuits while table is still a PlanWithUnresolvedIdentifier;
  the existing fallback case handles column-default work afterwards).
- Add companion test where the variable name collides with a query output
  column to lock in the narrowed scope.
- Clarify the code comment to distinguish structural recursion into the
  non-child slot from resolution semantics.
@haoyangeng-db haoyangeng-db requested a review from cloud-fan May 21, 2026 16:05
@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in 96f43d9 May 22, 2026
cloud-fan pushed a commit that referenced this pull request May 22, 2026
…ntoStatement/V2WriteCommand table slot

### What changes were proposed in this pull request?

This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER placeholder in command name slot").

SPARK-46625 moved `PlanWithUnresolvedIdentifier` from wrapping the whole command into the command's identifier slot at parse time. For `InsertIntoStatement` and `V2WriteCommand` the placeholder now lives in `.table`, which is a non-child `LogicalPlan` slot (`override def child = query`). That PR correctly added explicit recursion for that slot in `BindParameters` (parameter binding) and `ResolveIdentifierClause` (materializing the placeholder once `identifierExpr` is resolved), but the same recursion was missing from `ResolveReferences`, which owns column / variable resolution.

This PR adds two cases at the top of `ResolveReferences.doApply` that mirror the existing pattern: when `InsertIntoStatement.table` or `V2WriteCommand.table` is an unresolved `PlanWithUnresolvedIdentifier`, resolve `identifierExpr` via `resolveExpressionByPlanChildren(..., includeLastResort = true)` (which runs the resolveColsLastResort` path: `resolveVariables compose resolveOuterRef`). The `!identifierExpr.resolved` guard keeps the cases idempotent under bottom-up traversal.

### Why are the changes needed?

Without this, `INSERT INTO IDENTIFIER(<sql-variable>) ...` fails analysis: the `UnresolvedAttribute` for the variable name sitting inside `PlanWithUnresolvedIdentifier.identifierExpr` is never rewritten to a `VariableReference`. Since `ResolveIdentifierClause` only fires when `identifierExpr.resolved && childrenResolved`, the placeholder never materializes; the plan reaches `PreprocessTableInsertion` with an unresolved attribute and errors out (e.g. `UNSUPPORTED_INSERT.RDD_BASED`).

Repro on master before this fix:

```sql
CREATE TABLE t (a INT) USING PARQUET;
DECLARE OR REPLACE VARIABLE target_table STRING;
SET VAR target_table = 't';
INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a;
```

The same shape applies to `OverwriteByExpression.table` (e.g. `REPLACE WHERE`, `REPLACE ON`, `REPLACE USING` variants of INSERT) - fixed by the same `V2WriteCommand` case.

### Does this PR introduce _any_ user-facing change?

No.  Bug-fix only.

### How was this patch tested?

New test added.

### Was this patch authored or co-authored using generative AI tooling?

Co-authored with Claude Code.

Closes #56024 from haoyangeng-db/spark-46625-followup-resolve-identifier.

Authored-by: haoyangeng-db <haoyan.geng@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 96f43d9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 22, 2026
…ntoStatement/V2WriteCommand table slot

### What changes were proposed in this pull request?

This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER placeholder in command name slot").

SPARK-46625 moved `PlanWithUnresolvedIdentifier` from wrapping the whole command into the command's identifier slot at parse time. For `InsertIntoStatement` and `V2WriteCommand` the placeholder now lives in `.table`, which is a non-child `LogicalPlan` slot (`override def child = query`). That PR correctly added explicit recursion for that slot in `BindParameters` (parameter binding) and `ResolveIdentifierClause` (materializing the placeholder once `identifierExpr` is resolved), but the same recursion was missing from `ResolveReferences`, which owns column / variable resolution.

This PR adds two cases at the top of `ResolveReferences.doApply` that mirror the existing pattern: when `InsertIntoStatement.table` or `V2WriteCommand.table` is an unresolved `PlanWithUnresolvedIdentifier`, resolve `identifierExpr` via `resolveExpressionByPlanChildren(..., includeLastResort = true)` (which runs the resolveColsLastResort` path: `resolveVariables compose resolveOuterRef`). The `!identifierExpr.resolved` guard keeps the cases idempotent under bottom-up traversal.

### Why are the changes needed?

Without this, `INSERT INTO IDENTIFIER(<sql-variable>) ...` fails analysis: the `UnresolvedAttribute` for the variable name sitting inside `PlanWithUnresolvedIdentifier.identifierExpr` is never rewritten to a `VariableReference`. Since `ResolveIdentifierClause` only fires when `identifierExpr.resolved && childrenResolved`, the placeholder never materializes; the plan reaches `PreprocessTableInsertion` with an unresolved attribute and errors out (e.g. `UNSUPPORTED_INSERT.RDD_BASED`).

Repro on master before this fix:

```sql
CREATE TABLE t (a INT) USING PARQUET;
DECLARE OR REPLACE VARIABLE target_table STRING;
SET VAR target_table = 't';
INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a;
```

The same shape applies to `OverwriteByExpression.table` (e.g. `REPLACE WHERE`, `REPLACE ON`, `REPLACE USING` variants of INSERT) - fixed by the same `V2WriteCommand` case.

### Does this PR introduce _any_ user-facing change?

No.  Bug-fix only.

### How was this patch tested?

New test added.

### Was this patch authored or co-authored using generative AI tooling?

Co-authored with Claude Code.

Closes #56024 from haoyangeng-db/spark-46625-followup-resolve-identifier.

Authored-by: haoyangeng-db <haoyan.geng@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 96f43d9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants