feat: support new ComparisonJoinKey field in HashJoin/MergeJoin#845
Open
nielspardon wants to merge 2 commits into
Open
feat: support new ComparisonJoinKey field in HashJoin/MergeJoin#845nielspardon wants to merge 2 commits into
nielspardon wants to merge 2 commits into
Conversation
The deprecated `left_keys`/`right_keys` fields of `HashJoinRel` and `MergeJoinRel` are being replaced upstream by `repeated ComparisonJoinKey keys`. Migrate substrait-java to consume both the old and new proto fields while only producing the new `keys` field, so it keeps working today and is ready once the deprecated fields are removed. - Add a `ComparisonJoinKey` POJO modelling the proto message, including the `ComparisonType` oneof (`SimpleComparison`/`CustomComparison`) and the `SimpleComparisonType` enum. - `HashJoin`/`MergeJoin` now expose `getKeys()` as the source of truth; `getLeftKeys()`/`getRightKeys()` are kept as `@Deprecated` derived views. - Writer emits only `keys`; reader prefers `keys` and falls back to the deprecated fields (paired with an EQ comparison) when `keys` is empty. - Update `RelCopyOnWriteVisitor` and the `SubstraitBuilder` DSL accordingly. - Add round-trip, legacy-consumption, precedence and full-fidelity tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Writing only the new `keys` field breaks consumers that have not yet adopted it. Follow the same lossless dual-write policy used elsewhere: keep `keys` as the canonical representation, but additionally populate the deprecated `left_keys`/`right_keys` when every key is a plain `EQ` comparison those fields can represent without loss. Keys using `IS_NOT_DISTINCT_FROM`, `MIGHT_EQUAL`, or custom comparisons are intentionally left out of the deprecated fields, since an old consumer would silently misinterpret them as equality. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
Author
|
Updated the writer to follow the lossless dual-write policy raised in substrait-go#256, which applies equally here. Previously the writer emitted only the new
The reader side already preferred 🤖 Generated with Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The deprecated
left_keys/right_keysfields ofHashJoinRelandMergeJoinRelare being replaced upstream byrepeated ComparisonJoinKey keys = 8. This PR migrates substrait-java to consume both the old and new proto fields, always produce the newkeysfield, and additionally produce the deprecated fields when doing so is lossless (plainEQjoins). This keeps it working against the current proto, preserves compatibility with consumers that have not yet adoptedkeys, and is ready once the deprecated fields are removed.The fields were deprecated in Jan 2024 via substrait-io/substrait#585 and thus deprecated in Substrait since v0.42.0.
Changes
ComparisonJoinKeyPOJO modelling the proto message, including theComparisonTypeoneof (SimpleComparison/CustomComparison) and theSimpleComparisonTypeenum, following the existingWindowBoundoneof andJoin.JoinTypeenum conventions.HashJoin/MergeJoinnow exposegetKeys()as the source of truth.getLeftKeys()/getRightKeys()are kept as@Deprecatedderived convenience views so existing consumers keep compiling.RelProtoConverter): always emitskeys, and also emits the deprecatedleft_keys/right_keyswhen every key is a plainEQcomparison those fields can represent without loss. Keys usingIS_NOT_DISTINCT_FROM,MIGHT_EQUAL, or a custom comparison are intentionally left out of the deprecated fields, so an old consumer cannot silently misinterpret them as equality.ProtoRelConverter): preferskeyswhen present, falling back to the deprecated fields (paired with aSIMPLE_COMPARISON_TYPE_EQcomparison) whenkeysis empty.RelCopyOnWriteVisitorand theSubstraitBuilderDSL accordingly (public DSL signatures unchanged).This dual-write-when-lossless policy mirrors the approach taken for the URI → URN migration, and matches the feedback on the analogous substrait-go#256.
Tests
New
HashMergeJoinKeysTestcovering:EQjoin emitskeysand matching deprecated fields),EQkey emits onlykeys),keyswins when both old and new are set),🤖 Generated with Claude Code