[SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in InsertIntoStatement/V2WriteCommand table slot#56024
Conversation
…ntoStatement/V2WriteCommand table slot
cloud-fan
left a comment
There was a problem hiding this comment.
Thanks for the followup. The fix is well-targeted: parent SPARK-46625 moved PlanWithUnresolvedIdentifier into the command's identifier slot, and for InsertIntoStatement.table / V2WriteCommand.table that slot is non-child (child = query), so the default ResolveReferences traversal never visits identifierExpr. BindParameters and ResolveIdentifierClause already recurse explicitly into the slot; this PR adds the matching recursion in ResolveReferences so an UnresolvedAttribute (a SQL-variable name) inside identifierExpr is rewritten to VariableReference before ResolveIdentifierClause materializes the placeholder. End-to-end test placement in ParametersSuite is consistent with the existing SPARK-46625 tests.
One design comment on resolution scope, one cleanup, and a test-coverage suggestion inline.
Resolution scope. The new cases call resolveExpressionByPlanChildren(p.identifierExpr, i / w, includeLastResort = true). Because i.children = Seq(query) (and same for w), the resolver searches query.output when resolving nested UnresolvedAttributes in identifierExpr. That differs from how a PlanWithUnresolvedIdentifier is resolved when it sits in a regular child slot: the generic fallback at Analyzer.scala:1865 runs q.mapExpressions(resolveExpressionByPlanChildren(_, q, includeLastResort = true)) with q = PlanWithUnresolvedIdentifier, whose children are Nil for the INSERT/OverwriteByExpression path built by buildWriteTableSlot (AstBuilder.scala:1186-1193). With children = Nil, column resolution is a no-op and only the last-resort variable resolution fires. The narrow consequence of the wider scope here is that a SQL variable whose name matches a query output column resolves to the column instead, and IdentifierResolution.evalIdentifierExpr then throws NOT_A_CONSTANT_STRING.NOT_CONSTANT. Passing p instead of i / w mirrors the regular flow and avoids the regression.
Cleanup. The InsertIntoStatement case wraps the result in resolveColumnDefaultInCommandInputQuery(...), which short-circuits unless i.table.resolved. Here i.table is still a PlanWithUnresolvedIdentifier, so the call is a no-op; the column-default work is picked up later by the existing fallback case i: InsertIntoStatement once ResolveIdentifierClause has materialized the table. The V2WriteCommand case doesn't wrap, so the asymmetry is worth removing.
Comment phrasing (nit). "Mirror the recursion that BindParameters and ResolveIdentifierClause already do for the same shape" reads as if those rules share resolution semantics with this one. What's actually shared is the structural recursion into the non-child slot — those rules bind parameters / materialize on an already-resolved expression rather than performing attribute resolution.
| !i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved => | ||
| val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier] | ||
| val resolvedExpr = resolveExpressionByPlanChildren( | ||
| p.identifierExpr, i, includeLastResort = true) |
There was a problem hiding this comment.
Should this resolve against p rather than i? Passing i widens the resolution scope to i.children = Seq(query), so a nested UnresolvedAttribute in identifierExpr can match a query output column. With a colliding name like
DECLARE VARIABLE a STRING DEFAULT 't_shadow';
CREATE TABLE t_shadow (a INT) USING PARQUET;
INSERT INTO IDENTIFIER(a) SELECT 42 AS a;the column wins and IdentifierResolution.evalIdentifierExpr then throws NOT_A_CONSTANT_STRING.NOT_CONSTANT (since an AttributeReference is not foldable). Resolving against p (whose children are Nil for the INSERT/OverwriteByExpression path built by buildWriteTableSlot) mirrors how PlanWithUnresolvedIdentifier is resolved in regular child slots via the generic fallback at line 1865 — column resolution becomes a no-op and only the last-resort variable resolution fires.
Same point applies to the V2WriteCommand case below (line 1561).
| val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier] | ||
| val resolvedExpr = resolveExpressionByPlanChildren( | ||
| p.identifierExpr, i, includeLastResort = true) | ||
| resolveColumnDefaultInCommandInputQuery( |
There was a problem hiding this comment.
This wrap looks like a no-op in this iteration: ResolveColumnDefaultInCommandInputQuery.apply short-circuits unless i.table.resolved, and i.table here is still a PlanWithUnresolvedIdentifier (just with the identifier expression now resolved). The column-default work happens later via the existing case i: InsertIntoStatement => resolveColumnDefaultInCommandInputQuery(i) once ResolveIdentifierClause has materialized the table. The V2WriteCommand case below doesn't wrap — consider dropping the wrap here too for symmetry:
| resolveColumnDefaultInCommandInputQuery( | |
| i.copy(table = p.copy(identifierExpr = resolvedExpr)) |
…tion scope Address review feedback on apache#56024: - Resolve identifierExpr against the PlanWithUnresolvedIdentifier itself (whose children are Nil on the INSERT / OverwriteByExpression path built by buildWriteTableSlot) instead of the surrounding InsertIntoStatement / V2WriteCommand. This keeps query output columns out of scope, so a SQL variable name in IDENTIFIER(<name>) cannot be shadowed by a column from the SELECT and only the last-resort variable resolution path fires. - Drop the no-op resolveColumnDefaultInCommandInputQuery wrap on the InsertIntoStatement case for symmetry with the V2WriteCommand case (it short-circuits while table is still a PlanWithUnresolvedIdentifier; the existing fallback case handles column-default work afterwards). - Add companion test where the variable name collides with a query output column to lock in the narrowed scope. - Clarify the code comment to distinguish structural recursion into the non-child slot from resolution semantics.
|
thanks, merging to master/4.x/4.2! |
…ntoStatement/V2WriteCommand table slot ### What changes were proposed in this pull request? This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER placeholder in command name slot"). SPARK-46625 moved `PlanWithUnresolvedIdentifier` from wrapping the whole command into the command's identifier slot at parse time. For `InsertIntoStatement` and `V2WriteCommand` the placeholder now lives in `.table`, which is a non-child `LogicalPlan` slot (`override def child = query`). That PR correctly added explicit recursion for that slot in `BindParameters` (parameter binding) and `ResolveIdentifierClause` (materializing the placeholder once `identifierExpr` is resolved), but the same recursion was missing from `ResolveReferences`, which owns column / variable resolution. This PR adds two cases at the top of `ResolveReferences.doApply` that mirror the existing pattern: when `InsertIntoStatement.table` or `V2WriteCommand.table` is an unresolved `PlanWithUnresolvedIdentifier`, resolve `identifierExpr` via `resolveExpressionByPlanChildren(..., includeLastResort = true)` (which runs the resolveColsLastResort` path: `resolveVariables compose resolveOuterRef`). The `!identifierExpr.resolved` guard keeps the cases idempotent under bottom-up traversal. ### Why are the changes needed? Without this, `INSERT INTO IDENTIFIER(<sql-variable>) ...` fails analysis: the `UnresolvedAttribute` for the variable name sitting inside `PlanWithUnresolvedIdentifier.identifierExpr` is never rewritten to a `VariableReference`. Since `ResolveIdentifierClause` only fires when `identifierExpr.resolved && childrenResolved`, the placeholder never materializes; the plan reaches `PreprocessTableInsertion` with an unresolved attribute and errors out (e.g. `UNSUPPORTED_INSERT.RDD_BASED`). Repro on master before this fix: ```sql CREATE TABLE t (a INT) USING PARQUET; DECLARE OR REPLACE VARIABLE target_table STRING; SET VAR target_table = 't'; INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a; ``` The same shape applies to `OverwriteByExpression.table` (e.g. `REPLACE WHERE`, `REPLACE ON`, `REPLACE USING` variants of INSERT) - fixed by the same `V2WriteCommand` case. ### Does this PR introduce _any_ user-facing change? No. Bug-fix only. ### How was this patch tested? New test added. ### Was this patch authored or co-authored using generative AI tooling? Co-authored with Claude Code. Closes #56024 from haoyangeng-db/spark-46625-followup-resolve-identifier. Authored-by: haoyangeng-db <haoyan.geng@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 96f43d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ntoStatement/V2WriteCommand table slot ### What changes were proposed in this pull request? This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER placeholder in command name slot"). SPARK-46625 moved `PlanWithUnresolvedIdentifier` from wrapping the whole command into the command's identifier slot at parse time. For `InsertIntoStatement` and `V2WriteCommand` the placeholder now lives in `.table`, which is a non-child `LogicalPlan` slot (`override def child = query`). That PR correctly added explicit recursion for that slot in `BindParameters` (parameter binding) and `ResolveIdentifierClause` (materializing the placeholder once `identifierExpr` is resolved), but the same recursion was missing from `ResolveReferences`, which owns column / variable resolution. This PR adds two cases at the top of `ResolveReferences.doApply` that mirror the existing pattern: when `InsertIntoStatement.table` or `V2WriteCommand.table` is an unresolved `PlanWithUnresolvedIdentifier`, resolve `identifierExpr` via `resolveExpressionByPlanChildren(..., includeLastResort = true)` (which runs the resolveColsLastResort` path: `resolveVariables compose resolveOuterRef`). The `!identifierExpr.resolved` guard keeps the cases idempotent under bottom-up traversal. ### Why are the changes needed? Without this, `INSERT INTO IDENTIFIER(<sql-variable>) ...` fails analysis: the `UnresolvedAttribute` for the variable name sitting inside `PlanWithUnresolvedIdentifier.identifierExpr` is never rewritten to a `VariableReference`. Since `ResolveIdentifierClause` only fires when `identifierExpr.resolved && childrenResolved`, the placeholder never materializes; the plan reaches `PreprocessTableInsertion` with an unresolved attribute and errors out (e.g. `UNSUPPORTED_INSERT.RDD_BASED`). Repro on master before this fix: ```sql CREATE TABLE t (a INT) USING PARQUET; DECLARE OR REPLACE VARIABLE target_table STRING; SET VAR target_table = 't'; INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a; ``` The same shape applies to `OverwriteByExpression.table` (e.g. `REPLACE WHERE`, `REPLACE ON`, `REPLACE USING` variants of INSERT) - fixed by the same `V2WriteCommand` case. ### Does this PR introduce _any_ user-facing change? No. Bug-fix only. ### How was this patch tested? New test added. ### Was this patch authored or co-authored using generative AI tooling? Co-authored with Claude Code. Closes #56024 from haoyangeng-db/spark-46625-followup-resolve-identifier. Authored-by: haoyangeng-db <haoyan.geng@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 96f43d9) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER placeholder in command name slot").
SPARK-46625 moved
PlanWithUnresolvedIdentifierfrom wrapping the whole command into the command's identifier slot at parse time. ForInsertIntoStatementandV2WriteCommandthe placeholder now lives in.table, which is a non-childLogicalPlanslot (override def child = query). That PR correctly added explicit recursion for that slot inBindParameters(parameter binding) andResolveIdentifierClause(materializing the placeholder onceidentifierExpris resolved), but the same recursion was missing fromResolveReferences, which owns column / variable resolution.This PR adds two cases at the top of
ResolveReferences.doApplythat mirror the existing pattern: whenInsertIntoStatement.tableorV2WriteCommand.tableis an unresolvedPlanWithUnresolvedIdentifier, resolveidentifierExprviaresolveExpressionByPlanChildren(..., includeLastResort = true)(which runs the resolveColsLastResortpath:resolveVariables compose resolveOuterRef). The!identifierExpr.resolved` guard keeps the cases idempotent under bottom-up traversal.Why are the changes needed?
Without this,
INSERT INTO IDENTIFIER(<sql-variable>) ...fails analysis: theUnresolvedAttributefor the variable name sitting insidePlanWithUnresolvedIdentifier.identifierExpris never rewritten to aVariableReference. SinceResolveIdentifierClauseonly fires whenidentifierExpr.resolved && childrenResolved, the placeholder never materializes; the plan reachesPreprocessTableInsertionwith an unresolved attribute and errors out (e.g.UNSUPPORTED_INSERT.RDD_BASED).Repro on master before this fix:
The same shape applies to
OverwriteByExpression.table(e.g.REPLACE WHERE,REPLACE ON,REPLACE USINGvariants of INSERT) - fixed by the sameV2WriteCommandcase.Does this PR introduce any user-facing change?
No. Bug-fix only.
How was this patch tested?
New test added.
Was this patch authored or co-authored using generative AI tooling?
Co-authored with Claude Code.