Skip to content

fix: custom_datasource example ignores projection pushdown in execute()#22417

Open
kumarUjjawal wants to merge 3 commits into
apache:mainfrom
kumarUjjawal:fix/custom-datasource-example
Open

fix: custom_datasource example ignores projection pushdown in execute()#22417
kumarUjjawal wants to merge 3 commits into
apache:mainfrom
kumarUjjawal:fix/custom-datasource-example

Conversation

@kumarUjjawal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The custom_data_source example computes a projected_schema from the projection it receives in scan, advertising projection-pushdown support but its CustomExec::execute then always emits both source columns. As soon as
the planner pushes a non-identity projection (e.g. SELECT 1 FROM tSome([]), SELECT COUNT(id) FROM tSome([0])), the RecordBatch's column count diverges from projected_schema and the query fails.

What changes are included in this PR?

  • Store the projection on CustomExec so execute can apply it.
  • In execute, build a RecordBatch over the full source schema and use RecordBatch::project to drop the columns the planner didn't ask for.project preserves row count, which matters when the projection selects zero columns (SELECT 1 FROM t).
  • Extend custom_datasource() to register the table with a SessionContext and run SELECT 1 AS a FROM accounts and SELECT COUNT(id) FROM accounts, which are the two shapes reported in the issue.

Are these changes tested?

Yes

Are there any user-facing changes?

No

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumarUjjawal
Thanks for working on this!
Everything looks good overall. I just left one small suggestion around strengthening the regression coverage.

// - `SELECT COUNT(id) ...` requests a single column (projection: Some([0]))
let ctx = SessionContext::new();
ctx.register_table("accounts", Arc::new(db))?;
ctx.sql("SELECT 1 AS a FROM accounts")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition to the regression coverage 👍

Right now the test only checks that the queries execute successfully. Since this example already runs in CI, it could be helpful to also assert the returned shape or values.

For example, SELECT 1 AS a FROM accounts could assert that it returns 3 rows and 1 column, and SELECT COUNT(id) could assert that the result is 3.

That would make the test a bit stronger against future changes that accidentally preserve the schema contract while changing the query semantics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added the actual values check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Built-in/Custom UDAF not working on custom datasource example

2 participants