perf: optimize arrays_zip perfect list zips by puneetdixit200 · Pull Request #22285 · apache/datafusion

puneetdixit200 · 2026-05-16T23:01:34Z

Which issue does this PR close?

Closes Optimize arrays_zip to avoid row-by-row copying in the perfect-zip case #22225.

Rationale for this change

arrays_zip currently uses the general MutableArrayData path even when all regular ListArray inputs are already perfectly aligned. In that case, the output list offsets match the inputs and each struct child column can reuse the corresponding input values array instead of copying one row at a time.

What changes are included in this PR?

Add a fast path for perfect regular ListArray zips that reuses the first input's offsets and clones the input values arrays into the output struct children.
Keep the existing general path for ragged inputs, LargeList, FixedSizeList, Null inputs, and null rows that would require padding.
Add unit coverage for offset/value reuse, zero-length null rows, and null rows with hidden values falling back to the general path.
Rename the no-null benchmark case to arrays_zip_perfect_zip_8192.

Are these changes tested?

Yes. Local checks run:

cargo fmt --all
cargo test -p datafusion-functions-nested
cargo clippy -p datafusion-functions-nested --all-targets --all-features -- -D warnings
CARGO_TARGET_DIR=C:\df-target cargo clippy --all-targets --all-features -- -D warnings
CARGO_TARGET_DIR=C:\df-target cargo bench -p datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 --measurement-time 2 --sample-size 10

Latest local benchmark sample:

arrays_zip_perfect_zip_8192: 11.234 ?s 11.600 ?s 12.045 ?s
arrays_zip_10pct_nulls_8192: 4.3463 ms 4.5531 ms 4.7898 ms

Are there any user-facing changes?

No. This is an internal performance optimization with the same arrays_zip output semantics.

Review follow-up validation

2026-05-21:

cargo fmt --all --check passed after formatting.
cargo test -p datafusion-functions-nested perfect_zip_uses_supplied_field_names --lib could not compile locally because this Windows GNU toolchain is missing dlltool.exe.
cargo clippy -p datafusion-functions-nested --all-targets -- -D warnings hit the same local dlltool.exe toolchain blocker before checking the crate.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a fast-path to arrays_zip that can reuse existing offsets and value buffers when inputs are “perfectly aligned” list arrays, and updates benchmarks accordingly.

Changes:

Introduce try_perfect_list_zip fast-path and invoke it early from arrays_zip_inner
Add unit tests covering reuse behavior and fallback conditions
Rename the no-nulls benchmark case to reflect the new “perfect zip” scenario

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
datafusion/functions-nested/src/arrays_zip.rs	Adds perfect-zip fast-path plus tests validating reuse and fallback behavior
datafusion/functions-nested/benches/arrays_zip.rs	Renames benchmark to better represent the new fast-path scenario

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

comphead

Thanks @puneetdixit200 would you mind sharing bench details for perfect list, with and without the fix?

puneetdixit200 · 2026-05-19T06:36:00Z

I ran the benchmark on the PR head (973e3dc17) and the merge base (66f82af69) on the same machine:

case	merge base	PR head
perfect / no-null lists	916.69 us (`arrays_zip_no_nulls_8192`)	5.1096 us (`arrays_zip_perfect_zip_8192`)
10% null lists	1.0153 ms	986.94 us

So the perfect-list path is about 179x faster. The 10% null case is effectively unchanged in this run, about 2.8% lower, because it still takes the general path.

comphead · 2026-05-19T15:37:09Z

Thanks @puneetdixit200 I'll check it today

Jefffrey · 2026-05-20T02:31:57Z

Looks like we have a duplicate PR for this:

Optimize arrays_zip for perfectly aligned arrays #22245

Perhaps we can get numbers from both of these to compare which approach is better?

cc @BipashaBi

puneetdixit200 · 2026-05-25T22:43:50Z

I re-ran the same microbenchmark against both PR heads on the same machine.

Command:

CARGO_TARGET_DIR=/tmp/datafusion-pr-target cargo bench -p datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 --measurement-time 2 --sample-size 10

Environment: macOS arm64, rustc 1.95.0.

branch	head	perfect/no-null case	10% null case
this PR	`035689147f61`	`5.0013 us 5.0086 us 5.0147 us` (`arrays_zip_perfect_zip_8192`)	`1.0152 ms 1.0324 ms 1.0416 ms`
#22245	`9e63ba262601`	`20.870 us 20.928 us 20.987 us` (`arrays_zip_no_nulls_8192`)	`994.76 us 1.0028 ms 1.0148 ms`

So this branch is about 4.2x faster for the perfect-list fast path on this run. The 10% null case is effectively in the same range, which is expected because this branch falls back to the general path there.

The implementation difference I see is that #22245 currently reuses the first concrete input null bitmap for the output. This branch builds the output null bitmap using the existing semantics: a row is null only when all concrete inputs are null, and it rejects null rows with hidden values before taking the fast path. That should avoid the mixed null/empty-list issue called out on #22245.

Jefffrey · 2026-05-26T06:31:46Z

run benchmark array_zip

adriangbot · 2026-05-26T06:34:54Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541159838-321-57bss 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/arrays-zip-perfect-zip (0356891) to 66f82af (merge-base) diff
BENCH_NAME=array_zip
BENCH_COMMAND=cargo bench --features=parquet --bench array_zip
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:34:58Z

Benchmark for this request failed.

Last 20 lines of output:

Click to expand

Cloning into '/workspace/datafusion-branch'...
From https://github.com/apache/datafusion
 * [new ref]         refs/pull/22285/head -> codex/arrays-zip-perfect-zip
 * branch            main                 -> FETCH_HEAD
Switched to branch 'codex/arrays-zip-perfect-zip'
66f82af6983cf2d931eb06bbadd7b2f93698bfab
Cloning into '/workspace/datafusion-base'...
HEAD is now at 66f82af perf: bypass values.value(i) for inline strings in ArrowBytesViewMap (#22172)
rustc 1.95.0 (59807616e 2026-04-14)
035689147f610fe7d793723651848ac0323475c2
66f82af6983cf2d931eb06bbadd7b2f93698bfab
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
error: no bench target named `array_zip` in default-run packages

help: a target with a similar name exists: `arrays_zip`

File an issue against this benchmark runner

puneetdixit200 · 2026-05-26T06:35:54Z

run benchmark arrays_zip

adriangbot · 2026-05-26T06:35:56Z

Hi @puneetdixit200, thanks for the request (#22285 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas.

File an issue against this benchmark runner

Jefffrey · 2026-05-26T06:36:13Z

run benchmark arrays_zip

puneetdixit200 · 2026-05-26T06:36:16Z

run benchmark arrays_zip

adriangbot · 2026-05-26T06:36:17Z

Hi @puneetdixit200, thanks for the request (#22285 (comment)). Only whitelisted users can trigger benchmarks. Allowed users: Dandandan, Fokko, Jefffrey, Omega359, adriangb, alamb, asubiotto, brunal, buraksenn, cetra3, codephage2020, coderfender, comphead, erenavsarogullari, etseidl, friendlymatthew, gabotechs, geoffreyclaude, grtlr, haohuaijin, jonathanc-n, kevinjqliu, klion26, kosiew, kumarUjjawal, kunalsinghdadhwal, liamzwbao, mbutrovich, mkleen, mzabaluev, neilconway, rluvaton, sdf-jkl, timsaucer, xudong963, zhuqi-lucas.

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:37:16Z

🤖 Criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4541213973-322-srh78 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing codex/arrays-zip-perfect-zip (0356891) to 66f82af (merge-base) diff
BENCH_NAME=arrays_zip
BENCH_COMMAND=cargo bench --features=parquet --bench arrays_zip
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-26T06:43:22Z

🤖 Criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                          codex_arrays-zip-perfect-zip           main
-----                          ----------------------------           ----
arrays_zip_10pct_nulls_8192    1.04    599.0±2.27µs        ? ?/sec    1.00    577.7±1.87µs        ? ?/sec
arrays_zip_no_nulls_8192                                              1.00    531.6±2.43µs        ? ?/sec
arrays_zip_perfect_zip_8192    1.00      2.9±0.01µs        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	25.0s
Peak memory	4.3 GiB
Avg memory	4.2 GiB
CPU user	26.8s
CPU sys	0.7s
Peak spill	0 B

branch

Metric	Value
Wall time	20.0s
Peak memory	4.3 GiB
Avg memory	4.3 GiB
CPU user	25.0s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

puneetdixit200 · 2026-05-26T07:22:06Z

Pushed 6ea6184 as a follow-up to the fast-path review. Aligned list null bitmaps now use NullBuffer::union_many; mixed null-bitmaps keep explicit all-null-row construction to preserve arrays_zip semantics; mixed null rows with hidden values still fall back to the general path. Verified locally with cargo test -p datafusion-functions-nested arrays_zip::tests -- --nocapture, cargo fmt --check, git diff --check, and cargo test -p datafusion-sqllogictest --test sqllogictests -- array/arrays_zip.slt.

puneetdixit200 · 2026-05-26T07:29:28Z

Could a committer please trigger the Rust workflow for latest head 6ea618462? The latest head only shows the labeler check so far.

puneetdixit200 · 2026-05-26T08:17:33Z

Follow-up: latest head is now 1aaae2c18355f58bac7ecb8e1a402e5a81b49b18 after the null-buffer review adjustment. Could a committer please trigger the Rust workflow for this head? The PR still only shows the labeler check.

perf: optimize arrays_zip perfect list zips

ce21df8

Copilot AI review requested due to automatic review settings May 16, 2026 23:01

github-actions Bot added the functions Changes to functions implementation label May 16, 2026

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated

puneetdixit200 and others added 4 commits May 17, 2026 04:37

Potential fix for pull request finding

7795270

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

perf: reject non-perfect arrays_zip offsets earlier

61459f1

docs: explain arrays_zip perfect-list fast path

b02aad1

Document arrays zip fast path constraints

973e3dc

comphead reviewed May 18, 2026

View reviewed changes

comphead reviewed May 20, 2026

View reviewed changes

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated

perf: pass arrays_zip field names to fast path

0356891

Jefffrey reviewed May 26, 2026

View reviewed changes

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated

Comment thread datafusion/functions-nested/src/arrays_zip.rs

puneetdixit200 added 2 commits May 26, 2026 12:38

perf: refine arrays_zip fast path nulls

b184e03

perf: use null buffer union for aligned list nulls

6ea6184

Jefffrey reviewed May 26, 2026

View reviewed changes

Comment thread datafusion/functions-nested/src/arrays_zip.rs

Comment thread datafusion/functions-nested/src/arrays_zip.rs Outdated

perf: avoid redundant null buffer union

1aaae2c

Conversation

puneetdixit200 commented May 16, 2026 • edited by Jefffrey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Review follow-up validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

puneetdixit200 commented May 19, 2026

Uh oh!

comphead commented May 19, 2026

Uh oh!

Jefffrey commented May 20, 2026

Uh oh!

Uh oh!

puneetdixit200 commented May 25, 2026

Uh oh!

Jefffrey commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

puneetdixit200 commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

Jefffrey commented May 26, 2026

Uh oh!

puneetdixit200 commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

adriangbot commented May 26, 2026

Uh oh!

Uh oh!

Uh oh!

puneetdixit200 commented May 26, 2026

Uh oh!

Uh oh!

Uh oh!

puneetdixit200 commented May 26, 2026

Uh oh!

puneetdixit200 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

puneetdixit200 commented May 16, 2026 •

edited by Jefffrey

Loading