Add parameterized forward index compression codecs#18097
Add parameterized forward index compression codecs#18097xiangfu0 wants to merge 3 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18097 +/- ##
============================================
+ Coverage 63.14% 63.17% +0.03%
- Complexity 1616 1635 +19
============================================
Files 3213 3216 +3
Lines 195730 195973 +243
Branches 30240 30281 +41
============================================
+ Hits 123592 123814 +222
- Misses 62261 62272 +11
- Partials 9877 9887 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
39e9169 to
938028b
Compare
There was a problem hiding this comment.
Pull request overview
Adds a structured, canonicalized compressionCodec specification (including optional compression level) and threads it through forward-index creation and segment metadata so level-only changes can trigger rewrites on reload.
Changes:
- Introduce
CompressionCodecSpec+ parser/capabilities to supportCODECandCODEC(level)syntax with alias canonicalization (e.g.,ZSTD→ZSTANDARD). - Plumb optional compression levels into raw/forward-index writers and chunk compressor construction while preserving legacy behavior for non-parameterized codecs.
- Persist explicit forward-index compression spec in segment metadata and enhance reload logic to detect level-only changes; add/extend unit and integration tests.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/test/java/org/apache/pinot/spi/config/table/CompressionCodecSpecTest.java | Adds parser/serde tests for canonicalization, level ranges, and invalid specs. |
| pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java | Switches compressionCodec to CompressionCodecSpec while retaining enum-based compatibility paths. |
| pinot-spi/src/main/java/org/apache/pinot/spi/config/table/CompressionCodecSpecParser.java | Implements parsing/validation for CODEC and CODEC(level) syntax. |
| pinot-spi/src/main/java/org/apache/pinot/spi/config/table/CompressionCodecSpec.java | Adds canonical spec model with Jackson (de)serialization via config string. |
| pinot-spi/src/main/java/org/apache/pinot/spi/config/table/CompressionCodecCapabilities.java | Centralizes alias canonicalization and per-codec supported level ranges. |
| pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/index/ForwardIndexConfigTest.java | Adds coverage for parameterized compression codec parsing in forward-index config. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/V1Constants.java | Adds a metadata key for persisting forward-index compression spec. |
| pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/ForwardIndexConfig.java | Adds CompressionCodecSpec support, legacy interop, and equality/hash updates. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java | Updates FieldConfig construction to match new compressionCodec type. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfigTest.java | Verifies parameterized codec is propagated through index loading config. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandlerTest.java | Tests rewrite detection and metadata persistence for level-only codec changes. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexTypeTest.java | Ensures legacy field-config path supports parameterized codec specs. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/io/compression/TestCompression.java | Adds compressor roundtrip tests with explicit levels and rejection for unsupported codecs. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/InvertedIndexAndDictionaryBasedForwardIndexCreator.java | Persists forward-index compression spec into metadata for raw forward indexes. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/ForwardIndexHandler.java | Uses persisted spec to detect level-only changes and updates metadata on rewrite. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexType.java | Switches FieldConfig-to-ForwardIndexConfig mapping to use CompressionCodecSpec. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/forward/ForwardIndexCreatorFactory.java | Threads optional compression level into raw index creator selection. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java | Adds compression-level plumbing into var-byte raw forward index writers. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueFixedByteRawIndexCreator.java | Adds compression-level plumbing into fixed-byte raw forward index writers. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueVarByteRawIndexCreator.java | Adds compression-level plumbing into MV var-byte raw forward index writers. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreator.java | Adds compression-level plumbing into MV fixed-byte raw forward index writers. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/BaseSegmentCreator.java | Writes canonical explicit forward-index compression spec to segment metadata. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkForwardIndexWriterV6.java | Adds constructors to pass optional compression level to base writer. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkForwardIndexWriterV5.java | Adds constructors to pass optional compression level to base writer. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkForwardIndexWriterV4.java | Uses level-aware compressor factory when building chunk compressor. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/VarByteChunkForwardIndexWriter.java | Adds compression-level support in base var-byte writer constructor path. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/FixedByteChunkForwardIndexWriter.java | Adds compression-level support in fixed-byte writer constructor path. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/writer/impl/BaseChunkForwardIndexWriter.java | Passes optional level to compressor factory for chunk writers. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ZstandardCompressor.java | Adds configurable Zstd compression level. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/LZ4WithLengthCompressor.java | Adds configurable LZ4 HC level for length-prefixed LZ4 compressor. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/LZ4Compressor.java | Adds configurable LZ4 HC level and avoids recreating compressor on each call. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/GzipCompressor.java | Adds configurable Deflater compression level. |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/compression/ChunkCompressorFactory.java | Adds factory overload to construct compressors with optional explicit level. |
| pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/custom/TextIndicesTest.java | Updates FieldConfig construction for new compression codec spec type. |
| pinot-core/src/test/java/org/apache/pinot/queries/BaseFSTBasedRegexpLikeQueriesTest.java | Updates FieldConfig construction for new compression codec spec type. |
| pinot-core/src/test/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunctionTest.java | Updates FieldConfig construction for new compression codec spec type. |
| pinot-core/src/test/java/org/apache/pinot/core/data/manager/TableIndexingTest.java | Updates FieldConfig construction for new compression codec spec type. |
| pinot-common/src/test/java/org/apache/pinot/common/utils/config/TableConfigSerDeUtilsTest.java | Adds serde coverage for parameterized codec spec in table config. |
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
Show resolved
Hide resolved
...t-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java
Outdated
Show resolved
Hide resolved
d522a5e to
c1aeec4
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Backward Compatibility & Segment Format Analysis
I've reviewed this PR for critical backward compatibility concerns with segment format changes. Here are my findings:
CRITICAL: Segment Rewrite Detection Logic Gap
Existing segments written before this PR will lack the new FORWARD_INDEX_COMPRESSION_CODEC metadata key. If a user later modifies only the compression level (e.g., changing from ZSTD to ZSTD(3)), the ForwardIndexHandler rewrite detection will fail to trigger because:
- Old segments have no persisted codec spec in metadata
- The rewrite logic compares current config against persisted metadata spec
- Missing metadata = no comparison possible = no rewrite triggered
- Result: Silent configuration skew - user expects rewrite, segments remain at old level
Recommendation: ForwardIndexHandler should explicitly handle missing codec metadata as "unknown level" state and trigger rewrites when config specifies explicit levels for old segments.
SAFE: Segment Format Read Compatibility
✓ New metadata key is additive (V1Constants.Column.FORWARD_INDEX_COMPRESSION_CODEC)
✓ Old code reading new segments: unaffected (skips unknown metadata)
✓ New code reading old segments: unaffected (spec is optional, defaults apply)
✓ No changes to forward-index header or chunk encoding format
SAFE: Configuration Serialization
✓ CompressionCodecSpec uses Jackson @JsonValue/@JsonCreator for stable string format
✓ Format is human-readable: CODEC or CODEC(level)
✓ Parser is strict (rejects malformed input early in config validation)
✓ Alias canonicalization (ZSTD → ZSTANDARD) handled consistently
SAFE: Legacy Configuration Compatibility
✓ Plain codec names (ZSTANDARD, GZIP, LZ4) preserve prior behavior
✓ Explicit levels default to library defaults when omitted
✓ FieldConfig compatibility constructors maintain config deserialization paths
✓ Comprehensive test coverage for serde paths
c1aeec4 to
3433bb5
Compare
|
Addressed the legacy-segment rewrite gap called out in review. Changes in this push:
Focused validation run:
|
Summary
CompressionCodecSpecparser/model forcompressionCodec, includingCODECandCODEC(level)syntax plusZSTD -> ZSTANDARDcanonicalizationUser manual
This change keeps the external config field name as
compressionCodecand extends its accepted values.Accepted syntax:
CODECCODEC(level)Parsing rules:
ZSTDis accepted anywhereZSTANDARDis acceptedZSTDnormalized toZSTANDARDExamples of accepted values:
ZSTANDARDZSTDZSTANDARD(3)ZSTD(3)GZIP(6)LZ4(12)Supported level ranges:
ZSTANDARD/ZSTD:1..22GZIP:0..9LZ4:1..17Codecs that do not expose a meaningful level in Pinot continue to reject parameterized syntax. For example:
SNAPPY(3)PASS_THROUGH(1)MV_ENTRY_DICT(2)Malformed values are also rejected. For example:
ZSTD()ZSTD(foo)UNKNOWN(3)Sample configs
Legacy raw-index field config remains valid, now with parameterized syntax:
{ "fieldConfigList": [ { "name": "message", "encodingType": "RAW", "compressionCodec": "ZSTD(3)" }, { "name": "payload", "encodingType": "RAW", "compressionCodec": "GZIP(6)" } ] }The newer forward-index config under
indexes.forwardalso accepts the samecompressionCodecsyntax:{ "fieldConfigList": [ { "name": "message", "indexes": { "forward": { "compressionCodec": "ZSTANDARD(3)", "deriveNumDocsPerChunk": true, "rawIndexWriterVersion": 10 } } }, { "name": "blob", "indexes": { "forward": { "compressionCodec": "LZ4(12)" } } } ] }Alias handling is equivalent, so these are treated the same:
{ "compressionCodec": "ZSTD" } { "compressionCodec": "ZSTANDARD" } { "compressionCodec": "ZSTD(3)" } { "compressionCodec": "ZSTANDARD(3)" }Backward compatibility
This change is intentionally conservative.
Config compatibility:
compressionCodecSNAPPY,PASS_THROUGH,ZSTANDARD, andLZ4continue to deserialize and behave as beforeZSTDis newly accepted as an alias and normalized internally toZSTANDARDWrite-path compatibility:
LZ4still uses Pinot's existing fast-compressor pathLZ4(level)switches to HC compression with the requested levelRead-path and segment compatibility:
Reload/rewrite behavior:
Validation
./mvnw -pl pinot-segment-spi,pinot-segment-local -am -Dcheckstyle.skip=true -Dtest=ForwardIndexConfigTest,ForwardIndexTypeTest,IndexLoadingConfigTest,TestCompression,ForwardIndexHandlerTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-spi,pinot-common -am -Dcheckstyle.skip=true -Dtest=CompressionCodecSpecTest,TableConfigSerDeUtilsTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local spotless:apply -DskipTests./mvnw -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local checkstyle:check -DskipTests./mvnw -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local license:format -DskipTests./mvnw -pl pinot-spi,pinot-segment-spi,pinot-common,pinot-segment-local license:check -DskipTests