Skip to content

GH-46856: [C++][Python] Add binary view comparison kernels#49964

Merged
pitrou merged 4 commits into
apache:mainfrom
Periecle:fix-binary-view-compare-kernels
Jun 9, 2026
Merged

GH-46856: [C++][Python] Add binary view comparison kernels#49964
pitrou merged 4 commits into
apache:mainfrom
Periecle:fix-binary-view-compare-kernels

Conversation

@Periecle

@Periecle Periecle commented May 12, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

pyarrow.compute.equal fails for pa.binary_view() arrays because C++ compute has no registered comparison kernel for (binary_view, binary_view).

This fixes that missing kernel path and also enables the same comparisons for utf8_view.

What changes are included in this PR?

This adds comparison kernel support for binary_view and utf8_view.

The following functions now work for same-type inputs:

  • equal
  • not_equal
  • greater
  • greater_equal
  • less
  • less_equal

Are these changes tested?

Added C++ tests covering:

  • inline and out-of-line values
  • nulls
  • sliced arrays
  • array-array comparisons
  • array-scalar and scalar-array comparisons
  • all six comparison functions

Added Python regression tests for pa.binary_view() and pa.string_view().

Verified the same cases fail before this patch at a0d2885b101acb439f7f79ec2237028974e74e64 with ArrowNotImplementedError: no kernel matching input types.

Are there any user-facing changes?

pyarrow.compute comparison functions now work for pa.binary_view() and pa.string_view() arrays where they previously failed with a missing kernel error.

AI Usage

Tests were generated by LLM agents along with part or PR summary

Addresses: GH-46856
Partially addresses: GH-44336

@Periecle Periecle requested review from AlenkaF, raulcd and rok as code owners May 12, 2026 08:04
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #46856 has been automatically assigned in GitHub to PR creator.

@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #46856 has been automatically assigned in GitHub to PR creator.

3 similar comments
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #46856 has been automatically assigned in GitHub to PR creator.

@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #46856 has been automatically assigned in GitHub to PR creator.

@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #46856 has been automatically assigned in GitHub to PR creator.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds missing comparison kernel registrations for Arrow C++ compute so that binary_view and utf8_view arrays can be compared (and thus pyarrow.compute.* comparison functions work for pa.binary_view() / pa.string_view() inputs).

Changes:

  • Register (binary_view, binary_view) and (utf8_view, utf8_view) comparison kernels in C++ compute.
  • Extend internal kernel codegen iteration utilities to support binary-view-like arrays.
  • Add C++ and Python tests covering comparisons for view types (including nulls and slicing).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
python/pyarrow/tests/test_compute.py Adds Python regression tests ensuring view-type arrays work with the 6 comparison functions.
cpp/src/arrow/compute/kernels/scalar_compare.cc Registers comparison kernels for binary_view and utf8_view.
cpp/src/arrow/compute/kernels/scalar_compare_test.cc Adds C++ test coverage for array/array, sliced arrays, and array/scalar comparisons on view types.
cpp/src/arrow/compute/kernels/codegen_internal.h Adds ArrayIterator support for binary-view-like arrays so generated kernels can read values correctly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +436 to +441
for (const std::shared_ptr<DataType>& ty : {binary_view(), utf8_view()}) {
auto exec =
GenerateVarBinaryViewBase<applicator::ScalarBinaryEqualTypes, BooleanType, Op>(
*ty);
DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks valid, applied

@Periecle

Copy link
Copy Markdown
Contributor Author

@AlenkaF @rok @raulcd would you like to address something regarding PR?

@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 26, 2026

@raulcd raulcd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter CI job is currently failing, could you take a look please?

@Periecle

Copy link
Copy Markdown
Contributor Author

@raulcd I have fixed linter violations, should be all good.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment on lines +1253 to +1263
CheckScalarBinary("equal", arr, scalar,
ArrayFromJSON(boolean(), R"([false, true, false, null])"));
CheckScalarBinary("equal", scalar, arr,
ArrayFromJSON(boolean(), R"([false, true, false, null])"));
CheckScalarBinary("greater", arr, scalar,
ArrayFromJSON(boolean(), R"([false, false, true, null])"));
CheckScalarBinary("less", scalar, arr,
ArrayFromJSON(boolean(), R"([false, false, true, null])"));
CheckScalarBinary("equal", arr, null_scalar,
ArrayFromJSON(boolean(), R"([null, null, null, null])"));
}
@Periecle

Periecle commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Seems like latest C++/AMD64 Windows Mingw CLAGN64 C++ job failure is not related to changes.

@raulcd raulcd requested a review from pitrou June 2, 2026 09:17
}
}

TEST(TestBinaryViewCompareKernel, SlicedArrays) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckScalarBinary already checks sliced inputs, this test is not necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CheckScalarBinary already checks sliced inputs, this test is not necessary.

removed redundant test.

Comment thread python/pyarrow/tests/test_compute.py Outdated
assert result.equals(con([False, True, True, None]))


def test_equal_binary_view():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's useful to add these tests on the Python side, since we're already exercising this in C++ tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. I thought it would be cleaner to address initial issue by writing python tests covering that.

I can remove it, if you think that is redundant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's redundant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it's redundant.

@pitrou removed redundant tests.

@pitrou

pitrou commented Jun 4, 2026

Copy link
Copy Markdown
Member

@github-actions crossbow submit -g cpp

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Revision: c5f8316

Submitted crossbow builds: ursacomputing/crossbow @ actions-dab3a0e97c

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou merged commit d11916f into apache:main Jun 9, 2026
58 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jun 9, 2026
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit d11916f.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants