Upgrade t-digest from 3.2 to 3.3 with error rate fix#18103
Upgrade t-digest from 3.2 to 3.3 with error rate fix#18103
Conversation
There was a problem hiding this comment.
Pull request overview
Upgrades the com.tdunning:t-digest dependency to 3.3 and updates/extends Pinot’s percentile TDigest test suite to account for the changed quantile behavior and merge-order sensitivity introduced in 3.3.
Changes:
- Bump
t-digestfrom 3.2 → 3.3 (and updateLICENSE-binaryreference). - Adjust star-tree vs non-star-tree pre-aggregated TDigest test to use higher compression and tighter error bounds.
- Update expected percentile results in aggregation-function tests and add a new production-behavior accuracy test for
PercentileTDigestValueAggregator.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Updates the t-digest.version property to 3.3 for the build. |
| LICENSE-binary | Updates the recorded binary dependency version for com.tdunning:t-digest. |
| pinot-core/src/test/java/org/apache/pinot/core/startree/v2/PreAggregatedPercentileTDigestStarTreeV2Test.java | Raises test digest compression and tightens the star-tree vs non-star-tree divergence tolerance for 3.3 behavior. |
| pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/function/PercentileTDigestMVAggregationFunctionTest.java | Updates expected percentile outputs to match 3.3’s results on small datasets. |
| pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/function/PercentileSmartTDigestAggregationFunctionTest.java | Updates expected outputs consistent with 3.3 behavior and removes an override that is no longer needed. |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/aggregator/PercentileTDigestValueAggregatorTest.java | Adds a new accuracy-focused test suite comparing TDigest outputs against exact quantiles under several production-like merge/ser-deser scenarios. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18103 +/- ##
=============================================
- Coverage 63.04% 33.44% -29.60%
+ Complexity 1617 794 -823
=============================================
Files 3202 3202
Lines 194718 194727 +9
Branches 30047 30047
=============================================
- Hits 122760 65132 -57628
- Misses 62233 124069 +61836
+ Partials 9725 5526 -4199
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
It would be good to have some sample data that could be integrated into the t-digest tests. I would have expected that the necessary increase of the compression parameter would only be about 2x, not the 5x observed. If some sample test case that depends only on t-digest could be constructed, I would be able to investigate on the t-digest side. |
| /** | ||
| * Tests that the {@link PercentileTDigestValueAggregator} produces accurate quantile estimates | ||
| * under production-like conditions using the default compression (100). | ||
| * | ||
| * <p>These tests simulate single-path production behavior — where raw values are ingested into | ||
| * a TDigest, serialized/deserialized through segment and star-tree building, and then queried. |
There was a problem hiding this comment.
The class/Javadoc and nearby comments say the tests run with the default compression of 100, but PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION was changed to 500. This makes the test documentation misleading (and the stated expected error characteristics may not apply). Update the comments to reflect the actual default compression being exercised (or explicitly construct the aggregator with compression=100 if that’s what this test intends to validate).
| // Always inject the default compression so it is stored in star-tree metadata. | ||
| // This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100) | ||
| // from new segments (metadata present, built with the current default). |
There was a problem hiding this comment.
The comment here says injecting the default compression will be “stored in star-tree metadata” and used by canUseStarTree() to distinguish old vs new segments. However, this method only returns ExpressionContext arguments for ValueAggregatorFactory; star-tree metadata persistence is based on AggregationSpec.getFunctionParameters() (see StarTreeV2Metadata.writeMetadata). If AggregationSpec.functionParameters is empty (e.g., aggregationConfigs without functionParameters), metadata will still omit compressionFactor and canUseStarTree() will treat it as default=100 while the builder will create digests at DEFAULT_TDIGEST_COMPRESSION=500. Consider fixing this by ensuring AggregationSpec.functionParameters always includes compressionFactor for PERCENTILETDIGEST/PERCENTILERAWTDIGEST when absent, and adjust/remove this comment accordingly.
| // Always inject the default compression so it is stored in star-tree metadata. | |
| // This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100) | |
| // from new segments (metadata present, built with the current default). | |
| // Persist the default compression in functionParameters so downstream star-tree metadata writers | |
| // record the same compression used by the builder when the config omits it. | |
| functionParameters.put(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY, | |
| PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION); |
| /** | ||
| * Returns the default {@link AggregationSpec} for the given function type. For PERCENTILETDIGEST and | ||
| * PERCENTILERAWTDIGEST, the default compression factor is explicitly included in the function parameters so that | ||
| * it is persisted in star-tree metadata. This allows {@code canUseStarTree()} to distinguish old segments (built | ||
| * with the previous default of 100, which have no compression metadata) from new segments. | ||
| */ | ||
| private static AggregationSpec getDefaultAggregationSpec(AggregationFunctionType functionType) { | ||
| switch (functionType) { | ||
| case PERCENTILETDIGEST: | ||
| case PERCENTILERAWTDIGEST: | ||
| return new AggregationSpec(null, null, null, null, null, | ||
| Map.of(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY, | ||
| PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION)); | ||
| default: | ||
| return AggregationSpec.DEFAULT; | ||
| } |
There was a problem hiding this comment.
By persisting compressionFactor=PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION (500) into star-tree metadata, PercentileTDigestAggregationFunction.canUseStarTree() will reject using the star-tree for queries that omit the compression argument (query-time default is still 100). This means percentiletdigest(col, p) queries will stop using the star-tree after segments are rebuilt unless users update queries to specify compression=500 (or canUseStarTree is relaxed to allow higher-compression star-trees). If this behavior change is intended, it would be good to make it explicit in code/comments and ensure there’s a migration path for existing star-tree users.
|
Does this mean with the same compression level, 3.3 has much higher error rate than 3.2? |
|
The algorithmic changes between 3.2 and 3.3 which were intended to provide a firm bound on the number of centroids had the side effect of changing the relationship between the number of centroids and the compression factor by about a factor of 2:1 (old:new). The accuracy is primarily driven by the number of retained centroids. This means that you have to have a higher compression parameter to keep the same number of centroids to keep the same level of accuracy. I am surprised, however, by the assertion that a 5:1 increase in compression parameter is required and would want to understand more about the test that supports that assertion. |
Resolves the accuracy regression that blocked #7076 by using higher compression (750) in the pre-aggregated star-tree test to keep star-tree vs non-star-tree quantile divergence below 0.5%. t-digest 3.3 changed centroid management (unit-weight first/last centroids, stricter tail interpolation), which increases merge-order sensitivity. The star-tree path does multi-level serialize/deserialize/merge while the non-star-tree path merges sequentially, causing quantile divergence at low compression values. Experimental results on the PreAggregated star-tree test (10 randomized runs each): - compression=300, MAX_ERROR=0.5%: 0/10 passes (errors 0.54-1.07%) - compression=500, MAX_ERROR=0.5%: fails (0.62% error) - compression=750, MAX_ERROR=0.5%: 10/10 passes - compression=1000, MAX_ERROR=0.5%: 10/10 passes For comparison, t-digest 3.2 with compression=300 passes 10/10 at 0.5%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that PercentileTDigestValueAggregator with default compression=100 produces accurate quantile estimates under production-like conditions: - Single-path raw value ingestion: < 0.5% error vs exact quantiles - Pre-aggregated byte[] merge: < 0.5% error - Multi-level star-tree merge with ser/deser cycles: < 1.0% error - Stability across 10 random seeds: < 0.5% error - Serialization round-trip (5 cycles): no accuracy degradation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
t-digest 3.3 has higher merge-order sensitivity than 3.2. At the previous default compression=100, multi-level merge in star-tree building produces ~0.35% error vs ground truth (compared to 0.06% with 3.2). Increasing to 500 brings error back to ~0.09%. Changes: - PercentileTDigestValueAggregator: DEFAULT_TDIGEST_COMPRESSION 100->500 - StarTreeBuilderUtils: always inject default compression into expression context so the aggregator receives it explicitly - StarTreeV2BuilderConfig: persist compression=500 in functionParameters when using old functionColumnPairs syntax, so canUseStarTree() can distinguish old segments (no metadata, compression=100) from new segments (metadata=500) Backward compatibility: - Old segments have no compressionFactor in metadata -> canUseStarTree() treats them as compression=100 -> matched by queries with default compression (100) - New segments have compressionFactor=500 in metadata -> matched by queries explicitly specifying compression=500 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
298e411 to
73f569b
Compare
|
Split the pure reproducer into a separate PR for clarity: #18166 The reproducer is in
To reproduce locally on the split branch: ./mvnw -pl pinot-segment-local
-Dtest=TDigestMergeOrderReproducerTest
-Dsurefire.failIfNoSpecifiedTests=false testI also ran that test in 10 fresh Surefire JVMs and it passed The asserted behavior on the current 3.3 branch is:
That should make it easier to discuss the sample case independently from the production compression change in #18103. |
|
Follow-up on the split reproducer PR: I updated #18166 so it now includes a direct What changed:
So the answer to “is this a test that passes on 3.2 but fails on 3.3?” is:
Repro: ./mvnw -pl pinot-segment-local -Dtest=TDigestVersionComparisonTest,TDigestMergeOrderReproducerTest -Dsurefire.failIfNoSpecifiedTests=false testThe comparison test asserts that, on the minimized exact-quantile scenario:
From the local exploration used to lock in that scenario, the observed numbers were roughly:
I also reran the repro command in 10 fresh Surefire JVMs and it passed |
Summary
Upgrades the t-digest library from 3.2 to 3.3, resolving the accuracy regression that blocked #7076 (open since June 2021). Increases star-tree default compression from 100 to 500 to preserve error rates for existing users.
Background
PR #7076 attempted this upgrade but was never merged because @Jackie-Jiang observed that t-digest 3.3 produced >1% error between star-tree and non-star-tree merge paths, whereas 3.2 stayed <0.5%. @tdunning (t-digest author) discussed the issue but it was never fully resolved.
Root Cause
t-digest 3.3 changed centroid management (unit-weight first/last centroids, stricter tail interpolation), which increases merge-order sensitivity. The star-tree path does multi-level serialize/deserialize/merge while the non-star-tree path merges sequentially — this causes quantile divergence at low compression values that was tolerable in 3.2 but exceeds tolerance in 3.3.
Fix
Performance: 3.2 vs 3.3
At the same compression=100, 3.3 is significantly faster:
add()100k valuesmerge()100 digestsserialize()deserialize()quantile()At the new star-tree default of compression=500, 3.3 is still comparable to 3.2 at compression=100:
add()100k valuesmerge()100 digestsserialize()deserialize()quantile()Accuracy: multi-level merge error vs ground truth
Backward compatibility
compressionFactorin metadata):canUseStarTree()treats them as compression=100, matching queries with default query-time compression (100)compressionFactor=500explicitly persisted in metadata): matched by queries specifying compression=500PERCENTILE_TDIGEST(col, 95, 500)or star-treeaggregationConfigs.functionParameters.compressionFactorChanges
pom.xml: t-digest 3.2 → 3.3LICENSE-binary: Updated version referencePercentileTDigestValueAggregator:DEFAULT_TDIGEST_COMPRESSION100→500 with rationale commentStarTreeBuilderUtils: Always inject default compression into expression context for PERCENTILETDIGEST/PERCENTILERAWTDIGESTStarTreeV2BuilderConfig: PersistcompressionFactor=500infunctionParameterswhen using oldfunctionColumnPairssyntax, socanUseStarTree()can distinguish old segments (no metadata = 100) from new segments (metadata = 500)PreAggregatedPercentileTDigestStarTreeV2Test: COMPRESSION 50→750, MAX_ERROR 5%→0.5%PercentileTDigestMVAggregationFunctionTest: Updated expected values for 3.3's improved interpolationPercentileSmartTDigestAggregationFunctionTest: Updated expected valuesPercentileTDigestValueAggregatorTest(new): Production-behavior accuracy test verifying quantile estimates against exact ground truthTest plan
PreAggregatedPercentileTDigestStarTreeV2Test— passesPercentileTDigestStarTreeV2Test— passesPercentileTDigestWithMVStarTreeV2Test— passesPercentileTDigestMVAggregationFunctionTest— passesPercentileSmartTDigestAggregationFunctionTest— all 96 tests passPercentileTDigestValueAggregatorTest— all 5 tests passPercentileTDigestQueriesTest— passesPercentileTDigestMVQueriesTest— passesspotless:apply,checkstyle:check,license:check— all pass🤖 Generated with Claude Code