Skip to content

HIVE-29554: Refactor MTableColumnStatistics and MPartitionColumnStatistics to avoid code duplication#6420

Open
thomasrebele wants to merge 1 commit intoapache:masterfrom
thomasrebele:tr/HIVE-29554
Open

HIVE-29554: Refactor MTableColumnStatistics and MPartitionColumnStatistics to avoid code duplication#6420
thomasrebele wants to merge 1 commit intoapache:masterfrom
thomasrebele:tr/HIVE-29554

Conversation

@thomasrebele
Copy link
Copy Markdown
Contributor

HIVE-29554

What changes were proposed in this pull request?

Extract a common superclass MColumnStatistics from MPartitionColumnStatistics and MColumnStatistics and combine some methods that use the latter two.

Why are the changes needed?

There is a lot of code duplication around column statistics, which increases the risk of diverging code and partially applied fixes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. I've run Hive from the master branch (13f3208), created a table with some values, and executed analyze table tab1 compute statistics for columns;. I verified the statistics with describe formatted tab1 a;.
  2. I stopped Hive, and run Hive with the changes of the PR. I checked that the statistics are shown correctly. I inserted more values into the table and executed the ANALYZE TABLE ... command. I verified the statistics.
  3. I stopped Hive, and run Hive from the master branch again. I verified the statistics. I inserted more values into the table and executed the ANALYZE TABLE ... command. I verified the statistics.

I started and run the DESCRIBE FORMATTED, ANALYZE TABLE, DESCRIBE FORMATTED sequence for

  • a metastore server with the changes from the PR, and
  • a hiveserver and beeline from the master branch

and vice versa, i.e.,

  • a metastore server from the master branch, and
  • a hiveserver and beeline with the changes from the PR

All commands worked as expected.

@sonarqubecloud
Copy link
Copy Markdown

this.avgColLen = avgColLen;
}

public void setDateStats(Long numNulls, Long numNDVs, byte[] bitVector, byte[] histogram, Long lowValue, Long highValue) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are Sonar warnings about the length of the line. The original methods in MPartitionColumnStatistics and MTableColumnStatistics had them as well. In case I make an update to the PR, I'll fix it, otherwise I would just leave that line (and setTimestampStats) as-is.

Copy link
Copy Markdown
Contributor Author

@thomasrebele thomasrebele Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sonar warns about the "cognitive complexity" limit. The limit has been exceeded in the original implementation as well. I think refactoring the methods is out-of-scope for this PR, so I'll just leave them as-is.

@thomasrebele thomasrebele marked this pull request as ready for review April 13, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants