Skip to content

Commit 77a58c5

Browse files
authored
refactor(sql): improve eager aggregation rewrites (#19559)
1 parent abd5c38 commit 77a58c5

37 files changed

+2419
-1007
lines changed

AGENTS.md

Lines changed: 26 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,28 @@
11
# Repository Guidelines
22

3-
## Project Structure & Module Organization
4-
Databend is a Rust workspace rooted at `Cargo.toml`. The query engine (`src/query/`), meta service (`src/meta/`), and shared crates (`src/common/`) compile into binaries stored under `src/binaries/`. Tests live in `tests/` with SQL suites (`suites/`), sqllogic cases, and meta-specific harnesses, while performance experiments sit in `benchmark/`. Tooling and deployment helpers are collected in `scripts/` and `docker/`, fixtures in `_data/`, and the `Makefile` lists every supported task.
5-
6-
## Build, Test, and Development Commands
7-
- `make setup` – install cargo components plus linters (taplo, shfmt, typos, machete, ruff).
8-
- `make build` / `make build-release` – compile debug or optimized `databend-{query,meta,metactl}` into `target/`.
9-
- `make run-debug` / `make kill` – launch or stop a local standalone deployment using the latest build.
10-
- `make unit-test`, `make stateless-test`, `make sqllogic-test`, `make metactl-test` – run focused suites; `make test` expands to the default CI matrix.
11-
- `make fmt` and `make lint` – enforce formatting and clippy checks before committing.
12-
13-
## Coding Style & Naming Conventions
14-
Rust sources follow `rustfmt.toml` (4-space indent, 100-column width) and must pass `cargo clippy -- -D warnings`. Keep modules and files in `snake_case`, exposed types and traits in `CamelCase`, and SQL suite files prefixed with the numeric order already in `tests/suites/`. Favor `common/exception` helpers plus `tracing` spans for errors and observability. Python utilities in `tests/` should satisfy Ruff defaults, and shell scripts must round-trip through `shfmt -l -w`.
15-
16-
## Debug Guidelines
17-
Always use `cargo clippy` to make sure there are no compilation errors. Fully verifying the whole project may take too long, partial verification can speed things up, but eventually a full verification will be needed.
18-
19-
## Testing Guidelines
20-
Unit tests stay close to the affected crate (`#[cfg(test)]` modules), and integration behavior belongs in the relevant SQL suites or meta harness (`tests/metactl`, `tests/meta-kvapi`). Every planner, executor, or storage change should add at least one regression SQL file plus expected output when deterministic. Use cluster variants (`make stateless-cluster-test` and TLS mode) whenever coordination, transactions, or auth are involved. Document new fixtures or configs in `tests/README.md` (or inline comments) so CI remains reproducible.
21-
22-
23-
## Commit & Pull Request Guidelines
24-
25-
History follows a Conventional-style subject such as `fix(storage): avoid stale snapshot (#19174)` or `feat: support self join elimination (#19169)`; keep the first line imperative and under 72 characters. Commits should stay scoped to a logical change set and include formatting/linting updates in the same patch. PRs must outline motivation, implementation notes, and validation commands, plus link issues or RFCs, and the final description should follow `PULL_REQUEST_TEMPLATE.md` (checkboxes, verification, screenshots when needed). Attach screenshots or sample queries when UI, SQL plans, or system tables change, and call out rollout risks (migrations, config toggles, backfills) so reviewers can plan accordingly.
26-
27-
There is the example of pull requests:
28-
29-
````
30-
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
31-
32-
## Summary
33-
34-
- Enable table functions like `generate_series` and `range` to accept scalar subqueries as arguments
35-
- Return NULL for empty scalar subqueries to align with existing scalar subquery semantics
36-
37-
## Changes
38-
39-
This PR enables SQL like:
40-
41-
```sql
42-
SELECT generate_series AS install_date
43-
FROM generate_series(
44-
(SELECT count() FROM numbers(10))::int,
45-
(SELECT count() FROM numbers(39))::int
46-
);
47-
````
48-
49-
Previously, table function arguments could only be constants. Now they can be scalar subqueries that return a single value.
50-
51-
## Implementation
52-
53-
1. Added `contains_subquery()` function to detect subqueries in AST expressions
54-
2. Added `execute_subquery_for_scalar()` to execute and extract scalar values from subqueries
55-
3. Modified `bind_table_args` to try constant folding first, then fall back to subquery execution
56-
4. The subquery executor is passed from the binder to enable runtime evaluation
57-
5. Returns `Scalar::Null` for empty subquery results (aligns with LeftSingleJoin behavior)
58-
59-
## Tests
60-
61-
- [x] Unit Test
62-
- [x] Logic Test
63-
- [ ] Benchmark Test
64-
- [ ] No Test - Pair with the reviewer to explain why
65-
66-
Added tests in `02_0063_function_generate_series.test` for:
67-
68-
- `generate_series` with subquery arguments
69-
- `range` with subquery arguments
70-
71-
## Type of change
72-
73-
- [x] New feature (non-breaking change which adds functionality)
74-
75-
<!-- Reviewable:start -->
76-
77-
---
78-
79-
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/databendlabs/databend/19213)
80-
81-
<!-- Reviewable:end -->
82-
83-
```
84-
85-
86-
Pull request mus be pushed into fork and create pr into origin.
87-
You can use gh tools to do it.
88-
```
89-
3+
This file is the entry point for repository-specific working rules. Keep it focused on overall direction; read the matching detail documents in `agents/` before making changes in that area.
4+
5+
## Core Workflow
6+
- Build context from the codebase first. Databend is a multi-crate Rust workspace, so understand the affected module boundaries before editing.
7+
- Validate incrementally. Run the smallest relevant checks early, and apply the verification standard that matches the task type.
8+
- Treat tests as part of the change. Planner, executor, storage, and behavior changes should come with regression coverage.
9+
- Keep contributions reviewable. Commits should stay scoped, and pull requests should follow the repository's expected collaboration workflow.
10+
11+
## Task Types
12+
- Implementation tasks: the result is intended to stay in the branch, be reviewed, or be submitted. Follow the normal quality bar for edits, validation, tests, commits, and PR readiness.
13+
- Exploration tasks: the result is mainly temporary, used for investigation, debugging, measurement, or option evaluation, and is not intended to be submitted as-is. Prioritize speed and useful conclusions over polish.
14+
- If the output you plan to keep is only notes, logs, ad hoc scripts, or temporary experiments, treat it as exploration work.
15+
- If the output includes code, tests, or docs that are expected to remain in the branch for review or submission, treat it as implementation work.
16+
- For exploration tasks that do not produce submit-worthy changes, avoid spending time on low-value checks, exhaustive formatting passes, or broad test runs unless they are needed to answer the question correctly.
17+
- If a task starts as exploration and turns into a real code change that should be kept, switch back to the implementation-task standard before handoff.
18+
19+
## Detail Index
20+
- [`agents/repository-structure.md`](agents/repository-structure.md) for workspace layout and where code, tests, tooling, and fixtures live.
21+
- [`agents/development-commands.md`](agents/development-commands.md) for setup, build, run, test, format, and lint commands.
22+
- [`agents/coding-style.md`](agents/coding-style.md) for Rust, Python, shell, naming, error handling, and observability conventions.
23+
- [`agents/debug-and-validation.md`](agents/debug-and-validation.md) for clippy expectations and testing strategy.
24+
- [`agents/commit-and-pr.md`](agents/commit-and-pr.md) for commit format, PR requirements, and the Databend PR template example.
25+
- If you discover high-value information that would materially help development or testing but cannot be found through this Detail Index and the linked documents beneath it, call that gap out explicitly instead of assuming the current documentation path is sufficient.
26+
27+
## Default Rule
28+
If guidance in a detail file is relevant to the task, follow it. Determine the task type first, then apply the matching validation and testing rules. Exploration-task guidance is an explicit exception to the default implementation-task quality bar.

agents/coding-style.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Coding Style & Naming Conventions
2+
3+
## Rust
4+
- Follow `rustfmt.toml` with 4-space indentation and 100-column width.
5+
- Pass `cargo clippy -- -D warnings`.
6+
- Use `snake_case` for modules and files.
7+
- Use `CamelCase` for exposed types and traits.
8+
- Prefer helpers from `common/exception` for error handling.
9+
- Use `tracing` spans when observability matters.
10+
11+
## Tests and Fixtures
12+
- Keep SQL suite file prefixes consistent with the existing numeric ordering in `tests/suites/`.
13+
14+
## Python
15+
- Python utilities in `tests/` should satisfy Ruff defaults.
16+
17+
## Shell
18+
- Shell scripts should round-trip through `shfmt -l -w`.

agents/commit-and-pr.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Commit & Pull Request Guidelines
2+
3+
## Commits
4+
- Use a Conventional-style subject such as `fix(storage): avoid stale snapshot (#19174)` or `feat: support self join elimination (#19169)`.
5+
- Keep the first line imperative and under 72 characters.
6+
- Keep each commit scoped to one logical change set.
7+
- Include formatting and linting updates in the same patch when they belong to the change.
8+
9+
## Pull Requests
10+
- Outline motivation, implementation notes, and validation commands.
11+
- Link issues or RFCs when relevant.
12+
- Follow `PULL_REQUEST_TEMPLATE.md`, including checkboxes, verification, and screenshots when needed.
13+
- Attach screenshots or sample queries when UI, SQL plans, or system tables change.
14+
- Call out rollout risks such as migrations, config toggles, or backfills.
15+
- Push the branch to your fork and create the PR into `origin`.
16+
- You can use `gh` tooling for the PR flow.
17+
18+
## Example PR Description
19+
20+
```md
21+
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
22+
23+
## Summary
24+
25+
- Enable table functions like `generate_series` and `range` to accept scalar subqueries as arguments
26+
- Return NULL for empty scalar subqueries to align with existing scalar subquery semantics
27+
28+
## Changes
29+
30+
This PR enables SQL like:
31+
32+
```sql
33+
SELECT generate_series AS install_date
34+
FROM generate_series(
35+
(SELECT count() FROM numbers(10))::int,
36+
(SELECT count() FROM numbers(39))::int
37+
);
38+
```
39+
40+
Previously, table function arguments could only be constants. Now they can be scalar subqueries that return a single value.
41+
42+
## Implementation
43+
44+
1. Added `contains_subquery()` function to detect subqueries in AST expressions
45+
2. Added `execute_subquery_for_scalar()` to execute and extract scalar values from subqueries
46+
3. Modified `bind_table_args` to try constant folding first, then fall back to subquery execution
47+
4. The subquery executor is passed from the binder to enable runtime evaluation
48+
5. Returns `Scalar::Null` for empty subquery results (aligns with LeftSingleJoin behavior)
49+
50+
## Tests
51+
52+
- [x] Unit Test
53+
- [x] Logic Test
54+
- [ ] Benchmark Test
55+
- [ ] No Test - Pair with the reviewer to explain why
56+
57+
Added tests in `02_0063_function_generate_series.test` for:
58+
59+
- `generate_series` with subquery arguments
60+
- `range` with subquery arguments
61+
62+
## Type of change
63+
64+
- [x] New feature (non-breaking change which adds functionality)
65+
66+
<!-- Reviewable:start -->
67+
68+
---
69+
70+
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/databendlabs/databend/19213)
71+
72+
<!-- Reviewable:end -->
73+
```

agents/debug-and-validation.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Debug Guidelines
2+
3+
- For implementation tasks that touch Rust code, use `cargo clippy` to confirm there are no compilation or lint errors.
4+
- For implementation tasks, start with partial verification when a full workspace pass is too expensive, but move toward full verification before handoff when practical.
5+
- For exploration tasks that do not need to be submitted, keep validation minimal and purpose-driven. Only run checks that are necessary to establish the conclusion or unblock the investigation.
6+
- If an exploration task begins producing branch-worthy code or tests, reclassify it as implementation work and apply the implementation-task validation bar.
7+
8+
# Testing Guidelines
9+
10+
## Implementation Tasks
11+
12+
- Keep unit tests close to the affected crate with `#[cfg(test)]` modules.
13+
- Put integration behavior in the relevant SQL suites or meta harnesses such as `tests/metactl` and `tests/meta-kvapi`.
14+
- Every planner, executor, or storage change should add at least one regression SQL file plus expected output when deterministic.
15+
- Use cluster variants such as `make stateless-cluster-test` and TLS mode when coordination, transactions, or auth are involved.
16+
- Document new fixtures or configs in `tests/README.md` or inline comments so CI stays reproducible.
17+
18+
## Exploration Tasks
19+
20+
- Do not apply the full implementation-task test bar by default.
21+
- Run only the tests needed to establish the conclusion, compare options, or unblock the investigation.
22+
- If the exploration output is being converted into a real change for submission, switch to the implementation-task test bar before handoff.

agents/development-commands.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Build, Test, and Development Commands
2+
3+
- `make setup`: install cargo components plus linters such as `taplo`, `shfmt`, `typos`, `machete`, and `ruff`.
4+
- `make build` / `make build-release`: compile debug or optimized `databend-{query,meta,metactl}` binaries into `target/`.
5+
- `make run-debug` / `make kill`: launch or stop a local standalone deployment using the latest build.
6+
- `make unit-test`, `make stateless-test`, `make sqllogic-test`, `make metactl-test`: run focused suites.
7+
- `make test`: run the default CI test matrix.
8+
- `make fmt`: apply repository formatting rules.
9+
- `make lint`: run lint checks before committing.

agents/repository-structure.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Repository Structure & Module Organization
2+
3+
Databend is a Rust workspace rooted at `Cargo.toml`. Start by deciding whether the change belongs to the query engine or the metadata system, then read the closest module README before editing.
4+
5+
## Start Here: Meta vs Query
6+
7+
- `src/query/` is the main query engine: SQL parsing, planning, optimization, expression evaluation, execution pipeline, query service integration, and table/storage-facing behavior all converge here. If the task changes SQL behavior, planner output, execution, storage access, or query-side tests, start with `src/query/README.md`.
8+
- `src/meta/` is the metadata side of the system: metadata types, protocol definitions, compatibility layers, supporting tooling, and the Databend meta-service binaries live here. If the task touches catalog/schema metadata, protobuf compatibility, meta APIs, or `databend-meta` tooling, start with `src/meta/README.md`.
9+
- `src/meta/README.md` also notes an important boundary: core meta-service implementation has moved to the separate `databend-meta` repository. Read it early if you suspect the code you need is no longer in this workspace.
10+
11+
## Query Module Guide
12+
13+
- Use `src/query/README.md` as the index for query-side crates. It explains the role of modules such as `ast`, `catalog`, `expression`, `functions`, `pipeline`, `service`, `storages`, and related support crates.
14+
- Prefer the nearest module README when one exists instead of inferring structure from directory names alone.
15+
- Read `src/query/ast/README.md` when the task is about SQL parser structure, grammar internals, or parser-specific development workflow.
16+
- Read `src/query/functions/README.md` when adding or debugging scalar or aggregate functions, function registration, or function-specific tests.
17+
- Read `src/query/ee_features/README.md` when the task involves enterprise query features and you need to confirm which feature layer owns the behavior.
18+
- Read `src/query/sql/README.md` when the task lives in SQL planning, binding, optimizer behavior, or SQL-side planner tests.
19+
20+
## Other Workspace Landmarks
21+
22+
- Shared crates live in `src/common/`.
23+
- Binaries are stored under `src/binaries/`.
24+
- Tests live in `tests/`, including SQL suites under `tests/suites/`, sqllogic cases, and meta-specific harnesses.
25+
- Performance experiments live in `benchmark/`.
26+
- Tooling and deployment helpers live in `scripts/` and `docker/`.
27+
- Fixtures live in `_data/`.
28+
- The `Makefile` lists supported top-level tasks.

src/query/README.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,21 @@
22

33
Databend Query is a Distributed Query Engine at scale.
44

5-
- [`ast`](./ast/), new version of sql parser based on `nom-rule`.
6-
- [`catalog`](./catalog/) contains structures and traits for catalogs management, `Catalog`, `Database`, `Table` and `TableContext`.
7-
- [`codegen`](./codegen/) is used to generate the arithmetic result type.
8-
- [`config`](./config/) provides config support for databend query.
9-
- [`datavalues`](./datavalues/), the definition of each type of Column, which represents the layout of data in memory, will be gradually migrated to `expressions`.
10-
- [`expression`](./expression/), the new scalar expression framework with expression definition (AST), type checking, and evaluation runtime.
11-
- [`formats`](./formats/), the serialization and deserialization of data in various formats to the outside.
12-
- [`functions`](./functions/), scalar functions and aggregate functions, etc.
13-
- [`management`](./management/) for clusters, quotas, etc.
14-
- [`pipeline`](./pipeline/) implements the scheduling framework for physical operators.
15-
- [`service`](./service/) -> `databend-query`, the query service library of Databend.
16-
- [`settings`](./settings/), global and session level settings.
17-
- [`storages`](./storages/) relates to table engines, including the commonly used fuse engine and indexes etc.
18-
- [`users`](./users/), role-based access and control.
19-
- [`ee`](ee/) contains enterprise functionalities.
20-
- [`ee-features`](ee_features/) contains enterprise features.
5+
- [`ast`](./ast/) contains the SQL parser and related parser tooling.
6+
- [`catalog`](./catalog/) defines catalog-facing abstractions such as `Catalog`, `Database`, `Table`, and `TableContext`.
7+
- [`codegen`](./codegen/) contains code generators used by query-side crates, including arithmetic result-type generation.
8+
- [`config`](./config/) provides query-service configuration types and parsing.
9+
- [`datavalues`](./datavalues/) is the legacy in-memory value/column representation layer; newer work continues to move toward [`expression`](./expression/).
10+
- [`expression`](./expression/) is the core scalar expression framework, including expression representation, type checking, and evaluation.
11+
- [`formats`](./formats/) handles data serialization and deserialization for external formats.
12+
- [`functions`](./functions/) implements scalar functions, aggregate functions, and related function infrastructure.
13+
- [`management`](./management/) contains query-side management utilities such as cluster and quota support.
14+
- [`pipeline`](./pipeline/) implements the execution pipeline and scheduling framework for physical operators.
15+
- [`script`](./script/) contains script execution support used by the query service.
16+
- [`service`](./service/) is the `databend-query` service crate and runtime integration layer.
17+
- [`settings`](./settings/) defines global and session settings.
18+
- [`sql`](./sql/) contains SQL-side planning, binding, optimization, and related execution support. Read `./sql/README.md` for crate structure and shared optimizer test-support entry points.
19+
- [`storages`](./storages/) contains table engines and storage-layer integrations, including Fuse and related indexing components.
20+
- [`users`](./users/) implements user, role, and access-control support.
21+
- [`ee`](./ee/) contains enterprise query functionality.
22+
- [`ee_features`](./ee_features/) contains enterprise feature modules used by the query layer.

src/query/service/tests/it/sql/planner/optimizer/optimizers/rule/agg_rules/mod.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,4 @@
1313
// limitations under the License.
1414

1515
mod agg_index_query_rewrite;
16-
mod eager_aggregation;
1716
use crate::sql::planner::optimizer::test_utils;

0 commit comments

Comments
 (0)