Skip to content

[Bug] Fix distributed TopNAggregation accuracy for SUM, COUNT, and MEAN aggregations #13811

@eye-gu

Description

@eye-gu

Search before asking

  • I had searched in the issues and found no similar issues.

Apache SkyWalking Component

BanyanDB (apache/skywalking-banyandb)

What happened

Problem

In distributed TopN queries, each data node sets agg to AGGREGATION_FUNCTION_UNSPECIFIED and applies the TopN limit locally before sending results to the liaison node. This produces incorrect results for SUM, COUNT, and MEAN aggregations in two ways:

  1. Cross-node truncation: An entity that ranks low on one data node might rank high globally after merging partial results across all nodes.

  2. Intra-node truncation: A single data node may have multiple data points for the same entity (e.g., across different shards), but only the largest one survives the local TopN truncation. For example, with COUNT aggregation and TopN=3, if a node has 2 data points for entity5, only the larger one is sent to liaison, causing the count to be 1 instead of 2.

MIN and MAX are not affected since they are monotonic — the global MIN/MAX always appears in at least one node's local TopN, and truncation within a node doesn't discard the extremum.

Proposed Solution

  • For MIN/MAX: keep applying TopN limit at data nodes (safe to truncate)
  • For SUM/COUNT/MEAN: push the TopN query to all data nodes without truncation (TopN=0) and with the actual agg function and emit_partial=true, so data nodes return partial aggregated values (e.g., [sum, count] for MEAN) along with shard_id metadata
  • At the liaison node, merge partial results by entity using a min-heap aggregator, then compute the final TopN
  • Include shard_id in the group-by key for non-MIN/MAX to prevent incorrect cross-shard merging

What you expected to happen

Distributed TopN queries should return accurate results for all aggregation types (MIN, MAX, SUM, COUNT, MEAN) regardless of the number of data nodes.

For SUM/COUNT/MEAN, the final TopN should reflect the true aggregated value computed from all data points across all data nodes, not just the top-N per-node subset.

How to reproduce

  1. Deploy BanyanDB in distributed mode with at least 2 data nodes.
  2. Ingest data points where the same entity has data on multiple data nodes or across multiple shards.
  3. Issue a TopN query with COUNT, SUM, or MEAN aggregation.
  4. Expected: entity with true aggregated value rank N should appear in TopN.
    Actual: entity is missing because its per-node/sub-shard partial value was below the local TopN threshold and was truncated.

Anything else

No response

Are you willing to submit a pull request to fix on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

Metadata

Metadata

Labels

bugSomething isn't working and you are sure it's a bug!databaseBanyanDB - SkyWalking native database

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions