fix: custom_datasource example ignores projection pushdown in execute()#22417
fix: custom_datasource example ignores projection pushdown in execute()#22417kumarUjjawal wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
@kumarUjjawal
Thanks for working on this!
Everything looks good overall. I just left one small suggestion around strengthening the regression coverage.
| // - `SELECT COUNT(id) ...` requests a single column (projection: Some([0])) | ||
| let ctx = SessionContext::new(); | ||
| ctx.register_table("accounts", Arc::new(db))?; | ||
| ctx.sql("SELECT 1 AS a FROM accounts") |
There was a problem hiding this comment.
Nice addition to the regression coverage 👍
Right now the test only checks that the queries execute successfully. Since this example already runs in CI, it could be helpful to also assert the returned shape or values.
For example, SELECT 1 AS a FROM accounts could assert that it returns 3 rows and 1 column, and SELECT COUNT(id) could assert that the result is 3.
That would make the test a bit stronger against future changes that accidentally preserve the schema contract while changing the query semantics.
There was a problem hiding this comment.
Thanks! Added the actual values check.
Which issue does this PR close?
Rationale for this change
The
custom_data_sourceexample computes aprojected_schemafrom the projection it receives inscan, advertising projection-pushdown support but itsCustomExec::executethen always emits both source columns. As soon asthe planner pushes a non-identity projection (e.g.
SELECT 1 FROM t→Some([]),SELECT COUNT(id) FROM t→Some([0])), the RecordBatch's column count diverges fromprojected_schemaand the query fails.What changes are included in this PR?
CustomExecsoexecutecan apply it.execute, build aRecordBatchover the full source schema and useRecordBatch::projectto drop the columns the planner didn't ask for.projectpreserves row count, which matters when the projection selects zero columns (SELECT 1 FROM t).custom_datasource()to register the table with aSessionContextand runSELECT 1 AS a FROM accountsandSELECT COUNT(id) FROM accounts, which are the two shapes reported in the issue.Are these changes tested?
Yes
Are there any user-facing changes?
No