perf: optimize arrays_zip perfect list zips#22285
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a fast-path to arrays_zip that can reuse existing offsets and value buffers when inputs are “perfectly aligned” list arrays, and updates benchmarks accordingly.
Changes:
- Introduce
try_perfect_list_zipfast-path and invoke it early fromarrays_zip_inner - Add unit tests covering reuse behavior and fallback conditions
- Rename the no-nulls benchmark case to reflect the new “perfect zip” scenario
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| datafusion/functions-nested/src/arrays_zip.rs | Adds perfect-zip fast-path plus tests validating reuse and fallback behavior |
| datafusion/functions-nested/benches/arrays_zip.rs | Renames benchmark to better represent the new fast-path scenario |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
comphead
left a comment
There was a problem hiding this comment.
Thanks @puneetdixit200 would you mind sharing bench details for perfect list, with and without the fix?
|
I ran the benchmark on the PR head (
So the perfect-list path is about 179x faster. The 10% null case is effectively unchanged in this run, about 2.8% lower, because it still takes the general path. |
|
Thanks @puneetdixit200 I'll check it today |
|
Looks like we have a duplicate PR for this: Perhaps we can get numbers from both of these to compare which approach is better? cc @BipashaBi |
|
I re-ran the same microbenchmark against both PR heads on the same machine. Command: CARGO_TARGET_DIR=/tmp/datafusion-pr-target cargo bench -p datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 --measurement-time 2 --sample-size 10Environment: macOS arm64,
So this branch is about 4.2x faster for the perfect-list fast path on this run. The 10% null case is effectively in the same range, which is expected because this branch falls back to the general path there. The implementation difference I see is that #22245 currently reuses the first concrete input null bitmap for the output. This branch builds the output null bitmap using the existing semantics: a row is null only when all concrete inputs are null, and it rejects null rows with hidden values before taking the fast path. That should avoid the mixed null/empty-list issue called out on #22245. |
|
run benchmark array_zip |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing codex/arrays-zip-perfect-zip (0356891) to 66f82af (merge-base) diff File an issue against this benchmark runner |
|
Benchmark for this request failed. Last 20 lines of output: Click to expandFile an issue against this benchmark runner |
|
run benchmark arrays_zip |
|
Hi @puneetdixit200, thanks for the request (#22285 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
run benchmark arrays_zip |
1 similar comment
|
run benchmark arrays_zip |
|
Hi @puneetdixit200, thanks for the request (#22285 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas. File an issue against this benchmark runner |
|
🤖 Criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing codex/arrays-zip-perfect-zip (0356891) to 66f82af (merge-base) diff File an issue against this benchmark runner |
|
🤖 Criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Pushed 6ea6184 as a follow-up to the fast-path review. Aligned list null bitmaps now use NullBuffer::union_many; mixed null-bitmaps keep explicit all-null-row construction to preserve arrays_zip semantics; mixed null rows with hidden values still fall back to the general path. Verified locally with cargo test -p datafusion-functions-nested arrays_zip::tests -- --nocapture, cargo fmt --check, git diff --check, and cargo test -p datafusion-sqllogictest --test sqllogictests -- array/arrays_zip.slt. |
|
Could a committer please trigger the Rust workflow for latest head |
|
Follow-up: latest head is now |
Which issue does this PR close?
arrays_zipto avoid row-by-row copying in the perfect-zip case #22225.Rationale for this change
arrays_zipcurrently uses the generalMutableArrayDatapath even when all regularListArrayinputs are already perfectly aligned. In that case, the output list offsets match the inputs and each struct child column can reuse the corresponding input values array instead of copying one row at a time.What changes are included in this PR?
ListArrayzips that reuses the first input's offsets and clones the input values arrays into the output struct children.LargeList,FixedSizeList,Nullinputs, and null rows that would require padding.arrays_zip_perfect_zip_8192.Are these changes tested?
Yes. Local checks run:
cargo fmt --allcargo test -p datafusion-functions-nestedcargo clippy -p datafusion-functions-nested --all-targets --all-features -- -D warningsCARGO_TARGET_DIR=C:\df-target cargo clippy --all-targets --all-features -- -D warningsCARGO_TARGET_DIR=C:\df-target cargo bench -p datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 --measurement-time 2 --sample-size 10Latest local benchmark sample:
arrays_zip_perfect_zip_8192:11.234 ?s 11.600 ?s 12.045 ?sarrays_zip_10pct_nulls_8192:4.3463 ms 4.5531 ms 4.7898 msAre there any user-facing changes?
No. This is an internal performance optimization with the same
arrays_zipoutput semantics.Review follow-up validation
2026-05-21:
cargo fmt --all --checkpassed after formatting.cargo test -p datafusion-functions-nested perfect_zip_uses_supplied_field_names --libcould not compile locally because this Windows GNU toolchain is missingdlltool.exe.cargo clippy -p datafusion-functions-nested --all-targets -- -D warningshit the same localdlltool.exetoolchain blocker before checking the crate.