fix(bendpy): register_csv/register_tsv fails with column position error#19444
fix(bendpy): register_csv/register_tsv fails with column position error#19444
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed9758317f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cd10f5373
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let select_clause = match file_format { | ||
| "csv" | "tsv" => self.build_column_select(&file_path, file_format, connection, py)?, | ||
| _ => "*".to_string(), |
There was a problem hiding this comment.
Skip TSV schema inference until infer_schema supports TSV
register_tsv() now routes through build_column_select, which emits infer_schema(..., file_format => 'TSV'); however the infer-schema table function currently rejects TSV formats (it only accepts Parquet/CSV/NDJSON in src/query/service/src/table_functions/infer_schema/infer_schema_table.rs), so TSV registration still fails on every call. This means the patch does not actually fix the TSV path and users still cannot register TSV files.
Useful? React with 👍 / 👎.
3e7a038 to
62966a3
Compare
9e0390a to
b3c5792
Compare
…r_csv/register_tsv Fixes #19443
- Extract resolve_file_path() and extract_string_column() as standalone helpers - Replace imperative loop with functional iterator chain - Rename infer_column_names to build_column_select for clarity - Deduplicate mock logic in test_connections.py via _register_delimited()
…tion The test_bendpy job runs on a bare self-hosted ARM64 runner without clang/build-essential installed. Add a setup step that runs dev_setup.sh to install build dependencies and sets JEMALLOC env vars for Linux development builds, matching what the macOS path already does.
infer_schema only supports Parquet, CSV, and NDJSON formats. Routing TSV through build_column_select would fail at runtime. Fall back to SELECT * for TSV and remove the TSV integration test accordingly.
…nux CI - Remove braces from single-expression match arm to satisfy cargo fmt - Replace dev_setup.sh with direct apt-get install to avoid perl errors on the self-hosted ARM64 runner
The .cargo/config.toml uses -fuse-ld=mold on Linux targets, so mold must be installed on the self-hosted runner alongside clang.
infer_schema requires the stage system (system.stage table) which is not available in bendpy's embedded context. Replace with direct file reading of the CSV header line to extract column names for building $1 AS col1, $2 AS col2, ... select clauses.
The embedded bendpy context does not have the system.stage table, so file-based queries (SELECT FROM 'fs://...') cannot work. Remove the integration test; SQL generation is already covered by mock tests in test_connections.py.
b3c5792 to
776a4cd
Compare
🤖 CI Job Analysis
📊 Summary
❌ NO RETRY NEEDEDAll failures appear to be code/test issues requiring manual fixes. 🔍 Job Details
🤖 AboutAutomated analysis using job annotations to distinguish infrastructure issues (auto-retried) from code/test issues (manual fixes needed). |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
register_csv()andregister_tsv()generateSELECT *in the underlyingCREATE VIEW, which fails because CSV/TSV files require explicit column positions ($1, $2, ...).Fix: call
infer_schema()first to detect column names, then generateSELECT $1 AS col1, $2 AS col2, ...instead ofSELECT *.Tests
Type of change
This change is