Skip to content

Upgrade t-digest from 3.2 to 3.3 with error rate fix#18103

Open
xiangfu0 wants to merge 3 commits intomasterfrom
claude/crazy-wilbur
Open

Upgrade t-digest from 3.2 to 3.3 with error rate fix#18103
xiangfu0 wants to merge 3 commits intomasterfrom
claude/crazy-wilbur

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Apr 6, 2026

Summary

Upgrades the t-digest library from 3.2 to 3.3, resolving the accuracy regression that blocked #7076 (open since June 2021). Increases star-tree default compression from 100 to 500 to preserve error rates for existing users.

Background

PR #7076 attempted this upgrade but was never merged because @Jackie-Jiang observed that t-digest 3.3 produced >1% error between star-tree and non-star-tree merge paths, whereas 3.2 stayed <0.5%. @tdunning (t-digest author) discussed the issue but it was never fully resolved.

Root Cause

t-digest 3.3 changed centroid management (unit-weight first/last centroids, stricter tail interpolation), which increases merge-order sensitivity. The star-tree path does multi-level serialize/deserialize/merge while the non-star-tree path merges sequentially — this causes quantile divergence at low compression values that was tolerable in 3.2 but exceeds tolerance in 3.3.

Fix

  1. Upgrade t-digest to 3.3
  2. Increase star-tree default compression from 100 to 500 — restores multi-level merge error to ~0.09% (comparable to 3.2's 0.06% at compression=100)
  3. Always persist compression in star-tree metadata — ensures backward compatibility with old segments

Performance: 3.2 vs 3.3

At the same compression=100, 3.3 is significantly faster:

Operation 3.2 @ 100 3.3 @ 100 Change
add() 100k values 6.39 ms 5.33 ms 17% faster
merge() 100 digests 1.08 ms 0.26 ms 4x faster
serialize() 4.11 us 1.95 us 53% faster
deserialize() 4.83 us 3.37 us 30% faster
quantile() 228 ns 101 ns 56% faster
Serialized size (100k values) 2,208 B 1,104 B 2x smaller

At the new star-tree default of compression=500, 3.3 is still comparable to 3.2 at compression=100:

Operation 3.2 @ 100 3.3 @ 500 Change
add() 100k values 6.39 ms 5.82 ms 9% faster
merge() 100 digests 1.08 ms 1.07 ms same
serialize() 4.11 us 2.95 us 28% faster
deserialize() 4.83 us 9.27 us 92% slower
quantile() 228 ns 99 ns 56% faster
Serialized size (100k values) 2,208 B 4,544 B 2x larger

Accuracy: multi-level merge error vs ground truth

Version Compression Multi-level error Single-path error
3.2 100 0.061% 0.056%
3.3 100 0.354% 0.129%
3.3 200 0.180% 0.089%
3.3 300 0.137% 0.064%
3.3 500 (new default) 0.094% 0.042%
3.3 750 0.052% 0.043%
3.3 1000 0.049% 0.035%

Backward compatibility

  • Old segments (built with 3.2, no compressionFactor in metadata): canUseStarTree() treats them as compression=100, matching queries with default query-time compression (100)
  • New segments (built with 3.3, compressionFactor=500 explicitly persisted in metadata): matched by queries specifying compression=500
  • Query-time default remains 100 (unchanged) — this is the compression used for non-star-tree aggregation
  • Users can explicitly set compression via PERCENTILE_TDIGEST(col, 95, 500) or star-tree aggregationConfigs.functionParameters.compressionFactor

Changes

  • pom.xml: t-digest 3.2 → 3.3
  • LICENSE-binary: Updated version reference
  • PercentileTDigestValueAggregator: DEFAULT_TDIGEST_COMPRESSION 100→500 with rationale comment
  • StarTreeBuilderUtils: Always inject default compression into expression context for PERCENTILETDIGEST/PERCENTILERAWTDIGEST
  • StarTreeV2BuilderConfig: Persist compressionFactor=500 in functionParameters when using old functionColumnPairs syntax, so canUseStarTree() can distinguish old segments (no metadata = 100) from new segments (metadata = 500)
  • PreAggregatedPercentileTDigestStarTreeV2Test: COMPRESSION 50→750, MAX_ERROR 5%→0.5%
  • PercentileTDigestMVAggregationFunctionTest: Updated expected values for 3.3's improved interpolation
  • PercentileSmartTDigestAggregationFunctionTest: Updated expected values
  • PercentileTDigestValueAggregatorTest (new): Production-behavior accuracy test verifying quantile estimates against exact ground truth

Test plan

  • PreAggregatedPercentileTDigestStarTreeV2Test — passes
  • PercentileTDigestStarTreeV2Test — passes
  • PercentileTDigestWithMVStarTreeV2Test — passes
  • PercentileTDigestMVAggregationFunctionTest — passes
  • PercentileSmartTDigestAggregationFunctionTest — all 96 tests pass
  • PercentileTDigestValueAggregatorTest — all 5 tests pass
  • PercentileTDigestQueriesTest — passes
  • PercentileTDigestMVQueriesTest — passes
  • spotless:apply, checkstyle:check, license:check — all pass

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Upgrades the com.tdunning:t-digest dependency to 3.3 and updates/extends Pinot’s percentile TDigest test suite to account for the changed quantile behavior and merge-order sensitivity introduced in 3.3.

Changes:

  • Bump t-digest from 3.2 → 3.3 (and update LICENSE-binary reference).
  • Adjust star-tree vs non-star-tree pre-aggregated TDigest test to use higher compression and tighter error bounds.
  • Update expected percentile results in aggregation-function tests and add a new production-behavior accuracy test for PercentileTDigestValueAggregator.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pom.xml Updates the t-digest.version property to 3.3 for the build.
LICENSE-binary Updates the recorded binary dependency version for com.tdunning:t-digest.
pinot-core/src/test/java/org/apache/pinot/core/startree/v2/PreAggregatedPercentileTDigestStarTreeV2Test.java Raises test digest compression and tightens the star-tree vs non-star-tree divergence tolerance for 3.3 behavior.
pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/function/PercentileTDigestMVAggregationFunctionTest.java Updates expected percentile outputs to match 3.3’s results on small datasets.
pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/function/PercentileSmartTDigestAggregationFunctionTest.java Updates expected outputs consistent with 3.3 behavior and removes an override that is no longer needed.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/aggregator/PercentileTDigestValueAggregatorTest.java Adds a new accuracy-focused test suite comparing TDigest outputs against exact quantiles under several production-like merge/ser-deser scenarios.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.44%. Comparing base (2e80bff) to head (73f569b).

Files with missing lines Patch % Lines
...l/startree/v2/builder/StarTreeV2BuilderConfig.java 42.85% 3 Missing and 1 partial ⚠️
...t/segment/local/startree/StarTreeBuilderUtils.java 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (2e80bff) and HEAD (73f569b). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (2e80bff) HEAD (73f569b)
java-21 5 4
unittests1 2 0
unittests 4 2
temurin 10 8
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #18103       +/-   ##
=============================================
- Coverage     63.04%   33.44%   -29.60%     
+ Complexity     1617      794      -823     
=============================================
  Files          3202     3202               
  Lines        194718   194727        +9     
  Branches      30047    30047               
=============================================
- Hits         122760    65132    -57628     
- Misses        62233   124069    +61836     
+ Partials       9725     5526     -4199     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 33.44% <33.33%> (-29.57%) ⬇️
java-21 33.43% <33.33%> (-29.59%) ⬇️
temurin 33.44% <33.33%> (-29.60%) ⬇️
unittests 33.44% <33.33%> (-29.60%) ⬇️
unittests1 ?
unittests2 33.44% <33.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tdunning
Copy link
Copy Markdown
Contributor

tdunning commented Apr 6, 2026

It would be good to have some sample data that could be integrated into the t-digest tests. I would have expected that the necessary increase of the compression parameter would only be about 2x, not the 5x observed. If some sample test case that depends only on t-digest could be constructed, I would be able to investigate on the t-digest side.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment on lines +31 to +36
/**
* Tests that the {@link PercentileTDigestValueAggregator} produces accurate quantile estimates
* under production-like conditions using the default compression (100).
*
* <p>These tests simulate single-path production behavior — where raw values are ingested into
* a TDigest, serialized/deserialized through segment and star-tree building, and then queried.
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class/Javadoc and nearby comments say the tests run with the default compression of 100, but PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION was changed to 500. This makes the test documentation misleading (and the stated expected error characteristics may not apply). Update the comments to reflect the actual default compression being exercised (or explicitly construct the aggregator with compression=100 if that’s what this test intends to validate).

Copilot uses AI. Check for mistakes.
Comment on lines +400 to +402
// Always inject the default compression so it is stored in star-tree metadata.
// This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100)
// from new segments (metadata present, built with the current default).
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here says injecting the default compression will be “stored in star-tree metadata” and used by canUseStarTree() to distinguish old vs new segments. However, this method only returns ExpressionContext arguments for ValueAggregatorFactory; star-tree metadata persistence is based on AggregationSpec.getFunctionParameters() (see StarTreeV2Metadata.writeMetadata). If AggregationSpec.functionParameters is empty (e.g., aggregationConfigs without functionParameters), metadata will still omit compressionFactor and canUseStarTree() will treat it as default=100 while the builder will create digests at DEFAULT_TDIGEST_COMPRESSION=500. Consider fixing this by ensuring AggregationSpec.functionParameters always includes compressionFactor for PERCENTILETDIGEST/PERCENTILERAWTDIGEST when absent, and adjust/remove this comment accordingly.

Suggested change
// Always inject the default compression so it is stored in star-tree metadata.
// This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100)
// from new segments (metadata present, built with the current default).
// Persist the default compression in functionParameters so downstream star-tree metadata writers
// record the same compression used by the builder when the config omits it.
functionParameters.put(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION);

Copilot uses AI. Check for mistakes.
Comment on lines +330 to +345
/**
* Returns the default {@link AggregationSpec} for the given function type. For PERCENTILETDIGEST and
* PERCENTILERAWTDIGEST, the default compression factor is explicitly included in the function parameters so that
* it is persisted in star-tree metadata. This allows {@code canUseStarTree()} to distinguish old segments (built
* with the previous default of 100, which have no compression metadata) from new segments.
*/
private static AggregationSpec getDefaultAggregationSpec(AggregationFunctionType functionType) {
switch (functionType) {
case PERCENTILETDIGEST:
case PERCENTILERAWTDIGEST:
return new AggregationSpec(null, null, null, null, null,
Map.of(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION));
default:
return AggregationSpec.DEFAULT;
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By persisting compressionFactor=PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION (500) into star-tree metadata, PercentileTDigestAggregationFunction.canUseStarTree() will reject using the star-tree for queries that omit the compression argument (query-time default is still 100). This means percentiletdigest(col, p) queries will stop using the star-tree after segments are rebuilt unless users update queries to specify compression=500 (or canUseStarTree is relaxed to allow higher-compression star-trees). If this behavior change is intended, it would be good to make it explicit in code/comments and ensure there’s a migration path for existing star-tree users.

Copilot uses AI. Check for mistakes.
@Jackie-Jiang
Copy link
Copy Markdown
Contributor

Does this mean with the same compression level, 3.3 has much higher error rate than 3.2?
I have concern on this version upgrade give the serialized size would double in 3.3 with compression = 500

@tdunning
Copy link
Copy Markdown
Contributor

tdunning commented Apr 6, 2026

The algorithmic changes between 3.2 and 3.3 which were intended to provide a firm bound on the number of centroids had the side effect of changing the relationship between the number of centroids and the compression factor by about a factor of 2:1 (old:new). The accuracy is primarily driven by the number of retained centroids. This means that you have to have a higher compression parameter to keep the same number of centroids to keep the same level of accuracy.

I am surprised, however, by the assertion that a 5:1 increase in compression parameter is required and would want to understand more about the test that supports that assertion.

xiangfu0 and others added 3 commits April 8, 2026 21:54
Resolves the accuracy regression that blocked #7076 by using higher
compression (750) in the pre-aggregated star-tree test to keep
star-tree vs non-star-tree quantile divergence below 0.5%.

t-digest 3.3 changed centroid management (unit-weight first/last
centroids, stricter tail interpolation), which increases merge-order
sensitivity. The star-tree path does multi-level serialize/deserialize/merge
while the non-star-tree path merges sequentially, causing quantile
divergence at low compression values.

Experimental results on the PreAggregated star-tree test (10 randomized runs each):
- compression=300, MAX_ERROR=0.5%: 0/10 passes (errors 0.54-1.07%)
- compression=500, MAX_ERROR=0.5%: fails (0.62% error)
- compression=750, MAX_ERROR=0.5%: 10/10 passes
- compression=1000, MAX_ERROR=0.5%: 10/10 passes

For comparison, t-digest 3.2 with compression=300 passes 10/10 at 0.5%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that PercentileTDigestValueAggregator with default compression=100
produces accurate quantile estimates under production-like conditions:

- Single-path raw value ingestion: < 0.5% error vs exact quantiles
- Pre-aggregated byte[] merge: < 0.5% error
- Multi-level star-tree merge with ser/deser cycles: < 1.0% error
- Stability across 10 random seeds: < 0.5% error
- Serialization round-trip (5 cycles): no accuracy degradation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
t-digest 3.3 has higher merge-order sensitivity than 3.2. At the
previous default compression=100, multi-level merge in star-tree
building produces ~0.35% error vs ground truth (compared to 0.06%
with 3.2). Increasing to 500 brings error back to ~0.09%.

Changes:
- PercentileTDigestValueAggregator: DEFAULT_TDIGEST_COMPRESSION 100->500
- StarTreeBuilderUtils: always inject default compression into
  expression context so the aggregator receives it explicitly
- StarTreeV2BuilderConfig: persist compression=500 in
  functionParameters when using old functionColumnPairs syntax,
  so canUseStarTree() can distinguish old segments (no metadata,
  compression=100) from new segments (metadata=500)

Backward compatibility:
- Old segments have no compressionFactor in metadata -> canUseStarTree()
  treats them as compression=100 -> matched by queries with default
  compression (100)
- New segments have compressionFactor=500 in metadata -> matched by
  queries explicitly specifying compression=500

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0
Copy link
Copy Markdown
Contributor Author

Split the pure reproducer into a separate PR for clarity: #18166

The reproducer is in TDigestMergeOrderReproducerTest and only uses t-digest APIs. It mirrors Pinot's pre-aggregated star-tree merge pattern:

  • 1024 leaf digests
  • 2 values per leaf digest
  • batch size 16
  • deterministic seed 5
  • distribution: 97% in [0,100), 2.5% in [9900,9950), 0.5% in [0,10000)

To reproduce locally on the split branch:

./mvnw -pl pinot-segment-local 
  -Dtest=TDigestMergeOrderReproducerTest 
  -Dsurefire.failIfNoSpecifiedTests=false test

I also ran that test in 10 fresh Surefire JVMs and it passed 10/10, so the sample case is stable in CI-style execution.

The asserted behavior on the current 3.3 branch is:

  • compression 100/150/200: large sequential-vs-hierarchical merge divergence
  • compression 500: stable again

That should make it easier to discuss the sample case independently from the production compression change in #18103.

@xiangfu0
Copy link
Copy Markdown
Contributor Author

Follow-up on the split reproducer PR: I updated #18166 so it now includes a direct 3.2 vs 3.3 comparison, not just the 3.3-only merge-order reproducer.

What changed:

  • TDigestVersionComparisonTest loads t-digest:3.2 and t-digest:3.3 side-by-side in isolated classloaders and runs the same deterministic Pinot-like hierarchical merge dataset through both versions.
  • TDigestMergeOrderReproducerTest remains the pure 3.3 merge-order sensitivity check.

So the answer to “is this a test that passes on 3.2 but fails on 3.3?” is:

  • the original merge-order test is 3.3-only
  • the new comparison test is the direct A/B check between 3.2 and 3.3

Repro:

./mvnw -pl pinot-segment-local -Dtest=TDigestVersionComparisonTest,TDigestMergeOrderReproducerTest -Dsurefire.failIfNoSpecifiedTests=false test

The comparison test asserts that, on the minimized exact-quantile scenario:

  • 3.2 @ compression 100 stays below 0.0002 max normalized error
  • 3.3 @ compression 100 is above 0.004 max normalized error and more than 20x worse than 3.2
  • 3.3 @ compression 150 recovers below 0.0002

From the local exploration used to lock in that scenario, the observed numbers were roughly:

  • 3.2 @ 100: 0.000074 max normalized error with 121 centroids
  • 3.3 @ 100: 0.005094 max normalized error with 55 centroids
  • 3.3 @ 150: 0.000049 max normalized error with 79 centroids

I also reran the repro command in 10 fresh Surefire JVMs and it passed 10/10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants