Skip to content

perf: optimize arrays_zip perfect list zips#22285

Open
puneetdixit200 wants to merge 9 commits into
apache:mainfrom
puneetdixit200:codex/arrays-zip-perfect-zip
Open

perf: optimize arrays_zip perfect list zips#22285
puneetdixit200 wants to merge 9 commits into
apache:mainfrom
puneetdixit200:codex/arrays-zip-perfect-zip

Conversation

@puneetdixit200
Copy link
Copy Markdown

@puneetdixit200 puneetdixit200 commented May 16, 2026

Which issue does this PR close?

Rationale for this change

arrays_zip currently uses the general MutableArrayData path even when all regular ListArray inputs are already perfectly aligned. In that case, the output list offsets match the inputs and each struct child column can reuse the corresponding input values array instead of copying one row at a time.

What changes are included in this PR?

  • Add a fast path for perfect regular ListArray zips that reuses the first input's offsets and clones the input values arrays into the output struct children.
  • Keep the existing general path for ragged inputs, LargeList, FixedSizeList, Null inputs, and null rows that would require padding.
  • Add unit coverage for offset/value reuse, zero-length null rows, and null rows with hidden values falling back to the general path.
  • Rename the no-null benchmark case to arrays_zip_perfect_zip_8192.

Are these changes tested?

Yes. Local checks run:

  • cargo fmt --all
  • cargo test -p datafusion-functions-nested
  • cargo clippy -p datafusion-functions-nested --all-targets --all-features -- -D warnings
  • CARGO_TARGET_DIR=C:\df-target cargo clippy --all-targets --all-features -- -D warnings
  • CARGO_TARGET_DIR=C:\df-target cargo bench -p datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 --measurement-time 2 --sample-size 10

Latest local benchmark sample:

  • arrays_zip_perfect_zip_8192: 11.234 ?s 11.600 ?s 12.045 ?s
  • arrays_zip_10pct_nulls_8192: 4.3463 ms 4.5531 ms 4.7898 ms

Are there any user-facing changes?

No. This is an internal performance optimization with the same arrays_zip output semantics.

Review follow-up validation

2026-05-21:

  • cargo fmt --all --check passed after formatting.
  • cargo test -p datafusion-functions-nested perfect_zip_uses_supplied_field_names --lib could not compile locally because this Windows GNU toolchain is missing dlltool.exe.
  • cargo clippy -p datafusion-functions-nested --all-targets -- -D warnings hit the same local dlltool.exe toolchain blocker before checking the crate.

Copilot AI review requested due to automatic review settings May 16, 2026 23:01
@github-actions github-actions Bot added the functions Changes to functions implementation label May 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a fast-path to arrays_zip that can reuse existing offsets and value buffers when inputs are “perfectly aligned” list arrays, and updates benchmarks accordingly.

Changes:

  • Introduce try_perfect_list_zip fast-path and invoke it early from arrays_zip_inner
  • Add unit tests covering reuse behavior and fallback conditions
  • Rename the no-nulls benchmark case to reflect the new “perfect zip” scenario

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
datafusion/functions-nested/src/arrays_zip.rs Adds perfect-zip fast-path plus tests validating reuse and fallback behavior
datafusion/functions-nested/benches/arrays_zip.rs Renames benchmark to better represent the new fast-path scenario

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated
Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated
Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @puneetdixit200 would you mind sharing bench details for perfect list, with and without the fix?

@puneetdixit200
Copy link
Copy Markdown
Author

I ran the benchmark on the PR head (973e3dc17) and the merge base (66f82af69) on the same machine:

case merge base PR head
perfect / no-null lists 916.69 us (arrays_zip_no_nulls_8192) 5.1096 us (arrays_zip_perfect_zip_8192)
10% null lists 1.0153 ms 986.94 us

So the perfect-list path is about 179x faster. The 10% null case is effectively unchanged in this run, about 2.8% lower, because it still takes the general path.

@comphead
Copy link
Copy Markdown
Contributor

Thanks @puneetdixit200 I'll check it today

@Jefffrey
Copy link
Copy Markdown
Contributor

Looks like we have a duplicate PR for this:

Perhaps we can get numbers from both of these to compare which approach is better?

cc @BipashaBi

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated
@puneetdixit200
Copy link
Copy Markdown
Author

I re-ran the same microbenchmark against both PR heads on the same machine.

Command:

CARGO_TARGET_DIR=/tmp/datafusion-pr-target cargo bench -p datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 --measurement-time 2 --sample-size 10

Environment: macOS arm64, rustc 1.95.0.

branch head perfect/no-null case 10% null case
this PR 035689147f61 5.0013 us 5.0086 us 5.0147 us (arrays_zip_perfect_zip_8192) 1.0152 ms 1.0324 ms 1.0416 ms
#22245 9e63ba262601 20.870 us 20.928 us 20.987 us (arrays_zip_no_nulls_8192) 994.76 us 1.0028 ms 1.0148 ms

So this branch is about 4.2x faster for the perfect-list fast path on this run. The 10% null case is effectively in the same range, which is expected because this branch falls back to the general path there.

The implementation difference I see is that #22245 currently reuses the first concrete input null bitmap for the output. This branch builds the output null bitmap using the existing semantics: a row is null only when all concrete inputs are null, and it rejects null rows with hidden values before taking the fast path. That should avoid the mixed null/empty-list issue called out on #22245.

@Jefffrey
Copy link
Copy Markdown
Contributor

run benchmark array_zip

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541159838-321-57bss 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/arrays-zip-perfect-zip (0356891) to 66f82af (merge-base) diff
BENCH_NAME=array_zip
BENCH_COMMAND=cargo bench --features=parquet --bench array_zip
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
Cloning into '/workspace/datafusion-branch'...
From https://github.com/apache/datafusion
 * [new ref]         refs/pull/22285/head -> codex/arrays-zip-perfect-zip
 * branch            main                 -> FETCH_HEAD
Switched to branch 'codex/arrays-zip-perfect-zip'
66f82af6983cf2d931eb06bbadd7b2f93698bfab
Cloning into '/workspace/datafusion-base'...
HEAD is now at 66f82af perf: bypass values.value(i) for inline strings in ArrowBytesViewMap (#22172)
rustc 1.95.0 (59807616e 2026-04-14)
035689147f610fe7d793723651848ac0323475c2
66f82af6983cf2d931eb06bbadd7b2f93698bfab
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
error: no bench target named `array_zip` in default-run packages

help: a target with a similar name exists: `arrays_zip`

File an issue against this benchmark runner

@puneetdixit200
Copy link
Copy Markdown
Author

run benchmark arrays_zip

@adriangbot
Copy link
Copy Markdown

@Jefffrey
Copy link
Copy Markdown
Contributor

run benchmark arrays_zip

1 similar comment
@puneetdixit200
Copy link
Copy Markdown
Author

run benchmark arrays_zip

@adriangbot
Copy link
Copy Markdown

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541213973-322-srh78 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/arrays-zip-perfect-zip (0356891) to 66f82af (merge-base) diff
BENCH_NAME=arrays_zip
BENCH_COMMAND=cargo bench --features=parquet --bench arrays_zip
BENCH_FILTER=
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

group                          codex_arrays-zip-perfect-zip           main
-----                          ----------------------------           ----
arrays_zip_10pct_nulls_8192    1.04    599.0±2.27µs        ? ?/sec    1.00    577.7±1.87µs        ? ?/sec
arrays_zip_no_nulls_8192                                              1.00    531.6±2.43µs        ? ?/sec
arrays_zip_perfect_zip_8192    1.00      2.9±0.01µs        ? ?/sec  

Resource Usage

base (merge-base)

Metric Value
Wall time 25.0s
Peak memory 4.3 GiB
Avg memory 4.2 GiB
CPU user 26.8s
CPU sys 0.7s
Peak spill 0 B

branch

Metric Value
Wall time 20.0s
Peak memory 4.3 GiB
Avg memory 4.3 GiB
CPU user 25.0s
CPU sys 0.2s
Peak spill 0 B

File an issue against this benchmark runner

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated
Comment thread datafusion/functions-nested/src/arrays_zip.rs
@puneetdixit200
Copy link
Copy Markdown
Author

Pushed 6ea6184 as a follow-up to the fast-path review. Aligned list null bitmaps now use NullBuffer::union_many; mixed null-bitmaps keep explicit all-null-row construction to preserve arrays_zip semantics; mixed null rows with hidden values still fall back to the general path. Verified locally with cargo test -p datafusion-functions-nested arrays_zip::tests -- --nocapture, cargo fmt --check, git diff --check, and cargo test -p datafusion-sqllogictest --test sqllogictests -- array/arrays_zip.slt.

Comment thread datafusion/functions-nested/src/arrays_zip.rs
Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated
@puneetdixit200
Copy link
Copy Markdown
Author

Could a committer please trigger the Rust workflow for latest head 6ea618462? The latest head only shows the labeler check so far.

@puneetdixit200
Copy link
Copy Markdown
Author

Follow-up: latest head is now 1aaae2c18355f58bac7ecb8e1a402e5a81b49b18 after the null-buffer review adjustment. Could a committer please trigger the Rust workflow for this head? The PR still only shows the labeler check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize arrays_zip to avoid row-by-row copying in the perfect-zip case

5 participants