Upgrade t-digest from 3.2 to 3.3 with error rate fix by xiangfu0 · Pull Request #18103 · apache/pinot

xiangfu0 · 2026-04-06T10:26:58Z

Summary

Upgrades the t-digest library from 3.2 to 3.3, resolving the accuracy regression that blocked #7076 (open since June 2021). Increases star-tree default compression from 100 to 500 to preserve error rates for existing users.

Background

PR #7076 attempted this upgrade but was never merged because @Jackie-Jiang observed that t-digest 3.3 produced >1% error between star-tree and non-star-tree merge paths, whereas 3.2 stayed <0.5%. @tdunning (t-digest author) discussed the issue but it was never fully resolved.

Root Cause

t-digest 3.3 changed centroid management (unit-weight first/last centroids, stricter tail interpolation), which increases merge-order sensitivity. The star-tree path does multi-level serialize/deserialize/merge while the non-star-tree path merges sequentially — this causes quantile divergence at low compression values that was tolerable in 3.2 but exceeds tolerance in 3.3.

Fix

Upgrade t-digest to 3.3
Increase star-tree default compression from 100 to 500 — restores multi-level merge error to ~0.09% (comparable to 3.2's 0.06% at compression=100)
Always persist compression in star-tree metadata — ensures backward compatibility with old segments

Performance: 3.2 vs 3.3

At the same compression=100, 3.3 is significantly faster:

Operation	3.2 @ 100	3.3 @ 100	Change
`add()` 100k values	6.39 ms	5.33 ms	17% faster
`merge()` 100 digests	1.08 ms	0.26 ms	4x faster
`serialize()`	4.11 us	1.95 us	53% faster
`deserialize()`	4.83 us	3.37 us	30% faster
`quantile()`	228 ns	101 ns	56% faster
Serialized size (100k values)	2,208 B	1,104 B	2x smaller

At the new star-tree default of compression=500, 3.3 is still comparable to 3.2 at compression=100:

Operation	3.2 @ 100	3.3 @ 500	Change
`add()` 100k values	6.39 ms	5.82 ms	9% faster
`merge()` 100 digests	1.08 ms	1.07 ms	same
`serialize()`	4.11 us	2.95 us	28% faster
`deserialize()`	4.83 us	9.27 us	92% slower
`quantile()`	228 ns	99 ns	56% faster
Serialized size (100k values)	2,208 B	4,544 B	2x larger

Accuracy: multi-level merge error vs ground truth

Version	Compression	Multi-level error	Single-path error
3.2	100	0.061%	0.056%
3.3	100	0.354%	0.129%
3.3	200	0.180%	0.089%
3.3	300	0.137%	0.064%
3.3	500 (new default)	0.094%	0.042%
3.3	750	0.052%	0.043%
3.3	1000	0.049%	0.035%

Backward compatibility

Old segments (built with 3.2, no compressionFactor in metadata): canUseStarTree() treats them as compression=100, matching queries with default query-time compression (100)
New segments (built with 3.3, compressionFactor=500 explicitly persisted in metadata): matched by queries specifying compression=500
Query-time default remains 100 (unchanged) — this is the compression used for non-star-tree aggregation
Users can explicitly set compression via PERCENTILE_TDIGEST(col, 95, 500) or star-tree aggregationConfigs.functionParameters.compressionFactor

Changes

pom.xml: t-digest 3.2 → 3.3
LICENSE-binary: Updated version reference
PercentileTDigestValueAggregator: DEFAULT_TDIGEST_COMPRESSION 100→500 with rationale comment
StarTreeBuilderUtils: Always inject default compression into expression context for PERCENTILETDIGEST/PERCENTILERAWTDIGEST
StarTreeV2BuilderConfig: Persist compressionFactor=500 in functionParameters when using old functionColumnPairs syntax, so canUseStarTree() can distinguish old segments (no metadata = 100) from new segments (metadata = 500)
PreAggregatedPercentileTDigestStarTreeV2Test: COMPRESSION 50→750, MAX_ERROR 5%→0.5%
PercentileTDigestMVAggregationFunctionTest: Updated expected values for 3.3's improved interpolation
PercentileSmartTDigestAggregationFunctionTest: Updated expected values
PercentileTDigestValueAggregatorTest (new): Production-behavior accuracy test verifying quantile estimates against exact ground truth

Test plan

PreAggregatedPercentileTDigestStarTreeV2Test — passes
PercentileTDigestStarTreeV2Test — passes
PercentileTDigestWithMVStarTreeV2Test — passes
PercentileTDigestMVAggregationFunctionTest — passes
PercentileSmartTDigestAggregationFunctionTest — all 96 tests pass
PercentileTDigestValueAggregatorTest — all 5 tests pass
PercentileTDigestQueriesTest — passes
PercentileTDigestMVQueriesTest — passes
spotless:apply, checkstyle:check, license:check — all pass

🤖 Generated with Claude Code

Copilot

Pull request overview

Upgrades the com.tdunning:t-digest dependency to 3.3 and updates/extends Pinot’s percentile TDigest test suite to account for the changed quantile behavior and merge-order sensitivity introduced in 3.3.

Changes:

Bump t-digest from 3.2 → 3.3 (and update LICENSE-binary reference).
Adjust star-tree vs non-star-tree pre-aggregated TDigest test to use higher compression and tighter error bounds.
Update expected percentile results in aggregation-function tests and add a new production-behavior accuracy test for PercentileTDigestValueAggregator.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
pom.xml	Updates the `t-digest.version` property to 3.3 for the build.
LICENSE-binary	Updates the recorded binary dependency version for `com.tdunning:t-digest`.
pinot-core/src/test/java/org/apache/pinot/core/startree/v2/PreAggregatedPercentileTDigestStarTreeV2Test.java	Raises test digest compression and tightens the star-tree vs non-star-tree divergence tolerance for 3.3 behavior.
pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/function/PercentileTDigestMVAggregationFunctionTest.java	Updates expected percentile outputs to match 3.3’s results on small datasets.
pinot-core/src/test/java/org/apache/pinot/core/query/aggregation/function/PercentileSmartTDigestAggregationFunctionTest.java	Updates expected outputs consistent with 3.3 behavior and removes an override that is no longer needed.
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/aggregator/PercentileTDigestValueAggregatorTest.java	Adds a new accuracy-focused test suite comparing TDigest outputs against exact quantiles under several production-like merge/ser-deser scenarios.

codecov-commenter · 2026-04-06T12:38:14Z

Codecov Report

❌ Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.44%. Comparing base (2e80bff) to head (73f569b).

Files with missing lines	Patch %	Lines
...l/startree/v2/builder/StarTreeV2BuilderConfig.java	42.85%	3 Missing and 1 partial ⚠️
...t/segment/local/startree/StarTreeBuilderUtils.java	0.00%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (2e80bff) and HEAD (73f569b). Click for more details.

HEAD has 8 uploads less than BASE

Flag BASE (2e80bff) HEAD (73f569b)

java-21 5 4

unittests1 2 0

unittests 4 2

temurin 10 8

java-11 5 4

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #18103       +/-   ##
=============================================
- Coverage     63.04%   33.44%   -29.60%     
+ Complexity     1617      794      -823     
=============================================
  Files          3202     3202               
  Lines        194718   194727        +9     
  Branches      30047    30047               
=============================================
- Hits         122760    65132    -57628     
- Misses        62233   124069    +61836     
+ Partials       9725     5526     -4199

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`33.44% <33.33%> (-29.57%)`	⬇️
java-21	`33.43% <33.33%> (-29.59%)`	⬇️
temurin	`33.44% <33.33%> (-29.60%)`	⬇️
unittests	`33.44% <33.33%> (-29.60%)`	⬇️
unittests1	`?`
unittests2	`33.44% <33.33%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

tdunning · 2026-04-06T14:50:06Z

It would be good to have some sample data that could be integrated into the t-digest tests. I would have expected that the necessary increase of the compression parameter would only be about 2x, not the 5x observed. If some sample test case that depends only on t-digest could be constructed, I would be able to investigate on the t-digest side.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-06T19:47:12Z

...est/java/org/apache/pinot/segment/local/aggregator/PercentileTDigestValueAggregatorTest.java

+/**
+ * Tests that the {@link PercentileTDigestValueAggregator} produces accurate quantile estimates
+ * under production-like conditions using the default compression (100).
+ *
+ * <p>These tests simulate single-path production behavior — where raw values are ingested into
+ * a TDigest, serialized/deserialized through segment and star-tree building, and then queried.


The class/Javadoc and nearby comments say the tests run with the default compression of 100, but PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION was changed to 500. This makes the test documentation misleading (and the stated expected error characteristics may not apply). Update the comments to reflect the actual default compression being exercised (or explicitly construct the aggregator with compression=100 if that’s what this test intends to validate).

Copilot · 2026-04-06T19:47:12Z

...egment-local/src/main/java/org/apache/pinot/segment/local/startree/StarTreeBuilderUtils.java

+          // Always inject the default compression so it is stored in star-tree metadata.
+          // This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100)
+          // from new segments (metadata present, built with the current default).


The comment here says injecting the default compression will be “stored in star-tree metadata” and used by canUseStarTree() to distinguish old vs new segments. However, this method only returns ExpressionContext arguments for ValueAggregatorFactory; star-tree metadata persistence is based on AggregationSpec.getFunctionParameters() (see StarTreeV2Metadata.writeMetadata). If AggregationSpec.functionParameters is empty (e.g., aggregationConfigs without functionParameters), metadata will still omit compressionFactor and canUseStarTree() will treat it as default=100 while the builder will create digests at DEFAULT_TDIGEST_COMPRESSION=500. Consider fixing this by ensuring AggregationSpec.functionParameters always includes compressionFactor for PERCENTILETDIGEST/PERCENTILERAWTDIGEST when absent, and adjust/remove this comment accordingly.

Suggested change

// Always inject the default compression so it is stored in star-tree metadata.

// This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100)

// from new segments (metadata present, built with the current default).

// Persist the default compression in functionParameters so downstream star-tree metadata writers

// record the same compression used by the builder when the config omits it.

functionParameters.put(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,

PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION);

Copilot · 2026-04-06T19:47:13Z

...rc/main/java/org/apache/pinot/segment/local/startree/v2/builder/StarTreeV2BuilderConfig.java

+  /**
+   * Returns the default {@link AggregationSpec} for the given function type. For PERCENTILETDIGEST and
+   * PERCENTILERAWTDIGEST, the default compression factor is explicitly included in the function parameters so that
+   * it is persisted in star-tree metadata. This allows {@code canUseStarTree()} to distinguish old segments (built
+   * with the previous default of 100, which have no compression metadata) from new segments.
+   */
+  private static AggregationSpec getDefaultAggregationSpec(AggregationFunctionType functionType) {
+    switch (functionType) {
+      case PERCENTILETDIGEST:
+      case PERCENTILERAWTDIGEST:
+        return new AggregationSpec(null, null, null, null, null,
+            Map.of(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
+                PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION));
+      default:
+        return AggregationSpec.DEFAULT;
+    }


By persisting compressionFactor=PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION (500) into star-tree metadata, PercentileTDigestAggregationFunction.canUseStarTree() will reject using the star-tree for queries that omit the compression argument (query-time default is still 100). This means percentiletdigest(col, p) queries will stop using the star-tree after segments are rebuilt unless users update queries to specify compression=500 (or canUseStarTree is relaxed to allow higher-compression star-trees). If this behavior change is intended, it would be good to make it explicit in code/comments and ensure there’s a migration path for existing star-tree users.

Jackie-Jiang · 2026-04-06T20:49:00Z

Does this mean with the same compression level, 3.3 has much higher error rate than 3.2?
I have concern on this version upgrade give the serialized size would double in 3.3 with compression = 500

tdunning · 2026-04-06T21:49:32Z

The algorithmic changes between 3.2 and 3.3 which were intended to provide a firm bound on the number of centroids had the side effect of changing the relationship between the number of centroids and the compression factor by about a factor of 2:1 (old:new). The accuracy is primarily driven by the number of retained centroids. This means that you have to have a higher compression parameter to keep the same number of centroids to keep the same level of accuracy.

I am surprised, however, by the assertion that a 5:1 increase in compression parameter is required and would want to understand more about the test that supports that assertion.

Resolves the accuracy regression that blocked #7076 by using higher compression (750) in the pre-aggregated star-tree test to keep star-tree vs non-star-tree quantile divergence below 0.5%. t-digest 3.3 changed centroid management (unit-weight first/last centroids, stricter tail interpolation), which increases merge-order sensitivity. The star-tree path does multi-level serialize/deserialize/merge while the non-star-tree path merges sequentially, causing quantile divergence at low compression values. Experimental results on the PreAggregated star-tree test (10 randomized runs each): - compression=300, MAX_ERROR=0.5%: 0/10 passes (errors 0.54-1.07%) - compression=500, MAX_ERROR=0.5%: fails (0.62% error) - compression=750, MAX_ERROR=0.5%: 10/10 passes - compression=1000, MAX_ERROR=0.5%: 10/10 passes For comparison, t-digest 3.2 with compression=300 passes 10/10 at 0.5%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Verifies that PercentileTDigestValueAggregator with default compression=100 produces accurate quantile estimates under production-like conditions: - Single-path raw value ingestion: < 0.5% error vs exact quantiles - Pre-aggregated byte[] merge: < 0.5% error - Multi-level star-tree merge with ser/deser cycles: < 1.0% error - Stability across 10 random seeds: < 0.5% error - Serialization round-trip (5 cycles): no accuracy degradation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

t-digest 3.3 has higher merge-order sensitivity than 3.2. At the previous default compression=100, multi-level merge in star-tree building produces ~0.35% error vs ground truth (compared to 0.06% with 3.2). Increasing to 500 brings error back to ~0.09%. Changes: - PercentileTDigestValueAggregator: DEFAULT_TDIGEST_COMPRESSION 100->500 - StarTreeBuilderUtils: always inject default compression into expression context so the aggregator receives it explicitly - StarTreeV2BuilderConfig: persist compression=500 in functionParameters when using old functionColumnPairs syntax, so canUseStarTree() can distinguish old segments (no metadata, compression=100) from new segments (metadata=500) Backward compatibility: - Old segments have no compressionFactor in metadata -> canUseStarTree() treats them as compression=100 -> matched by queries with default compression (100) - New segments have compressionFactor=500 in metadata -> matched by queries explicitly specifying compression=500 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xiangfu0 · 2026-04-10T23:33:58Z

Split the pure reproducer into a separate PR for clarity: #18166

The reproducer is in TDigestMergeOrderReproducerTest and only uses t-digest APIs. It mirrors Pinot's pre-aggregated star-tree merge pattern:

1024 leaf digests
2 values per leaf digest
batch size 16
deterministic seed 5
distribution: 97% in [0,100), 2.5% in [9900,9950), 0.5% in [0,10000)

To reproduce locally on the split branch:

./mvnw -pl pinot-segment-local 
  -Dtest=TDigestMergeOrderReproducerTest 
  -Dsurefire.failIfNoSpecifiedTests=false test

I also ran that test in 10 fresh Surefire JVMs and it passed 10/10, so the sample case is stable in CI-style execution.

The asserted behavior on the current 3.3 branch is:

compression 100/150/200: large sequential-vs-hierarchical merge divergence
compression 500: stable again

That should make it easier to discuss the sample case independently from the production compression change in #18103.

xiangfu0 · 2026-04-11T00:49:24Z

Follow-up on the split reproducer PR: I updated #18166 so it now includes a direct 3.2 vs 3.3 comparison, not just the 3.3-only merge-order reproducer.

What changed:

TDigestVersionComparisonTest loads t-digest:3.2 and t-digest:3.3 side-by-side in isolated classloaders and runs the same deterministic Pinot-like hierarchical merge dataset through both versions.
TDigestMergeOrderReproducerTest remains the pure 3.3 merge-order sensitivity check.

So the answer to “is this a test that passes on 3.2 but fails on 3.3?” is:

the original merge-order test is 3.3-only
the new comparison test is the direct A/B check between 3.2 and 3.3

Repro:

./mvnw -pl pinot-segment-local -Dtest=TDigestVersionComparisonTest,TDigestMergeOrderReproducerTest -Dsurefire.failIfNoSpecifiedTests=false test

The comparison test asserts that, on the minimized exact-quantile scenario:

3.2 @ compression 100 stays below 0.0002 max normalized error
3.3 @ compression 100 is above 0.004 max normalized error and more than 20x worse than 3.2
3.3 @ compression 150 recovers below 0.0002

From the local exploration used to lock in that scenario, the observed numbers were roughly:

3.2 @ 100: 0.000074 max normalized error with 121 centroids
3.3 @ 100: 0.005094 max normalized error with 55 centroids
3.3 @ 150: 0.000049 max normalized error with 79 centroids

I also reran the repro command in 10 fresh Surefire JVMs and it passed 10/10.

xiangfu0 requested a review from Copilot April 6, 2026 10:59

Copilot started reviewing on behalf of xiangfu0 April 6, 2026 11:00 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

xiangfu0 requested a review from Copilot April 6, 2026 19:41

Copilot started reviewing on behalf of xiangfu0 April 6, 2026 19:42 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

xiangfu0 and others added 3 commits April 8, 2026 21:54

xiangfu0 force-pushed the claude/crazy-wilbur branch from 298e411 to 73f569b Compare April 9, 2026 04:54

xiangfu0 mentioned this pull request Apr 10, 2026

Add t-digest 3.2 vs 3.3 comparison and merge-order reproducer #18166

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade t-digest from 3.2 to 3.3 with error rate fix#18103

Upgrade t-digest from 3.2 to 3.3 with error rate fix#18103
xiangfu0 wants to merge 3 commits intomasterfrom
claude/crazy-wilbur

xiangfu0 commented Apr 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

codecov-commenter commented Apr 6, 2026 •

edited

Loading

Uh oh!

tdunning commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Jackie-Jiang commented Apr 6, 2026

Uh oh!

tdunning commented Apr 6, 2026

Uh oh!

xiangfu0 commented Apr 10, 2026

Uh oh!

xiangfu0 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

-          // Always inject the default compression so it is stored in star-tree metadata.
-          // This allows canUseStarTree() to distinguish old segments (no metadata, built with compression=100)
-          // from new segments (metadata present, built with the current default).
+          // Persist the default compression in functionParameters so downstream star-tree metadata writers
+          // record the same compression used by the builder when the config omits it.
+          functionParameters.put(Constants.PERCENTILETDIGEST_COMPRESSION_FACTOR_KEY,
+              PercentileTDigestValueAggregator.DEFAULT_TDIGEST_COMPRESSION);

Conversation

xiangfu0 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Root Cause

Fix

Performance: 3.2 vs 3.3

Accuracy: multi-level merge error vs ground truth

Backward compatibility

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov-commenter commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tdunning commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Jackie-Jiang commented Apr 6, 2026

Uh oh!

tdunning commented Apr 6, 2026

Uh oh!

xiangfu0 commented Apr 10, 2026

Uh oh!

xiangfu0 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xiangfu0 commented Apr 6, 2026 •

edited

Loading

codecov-commenter commented Apr 6, 2026 •

edited

Loading