Improve accuracy of compression stats #7901
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change improves the accuracy of compression chunks size stats. These stats are computed at compression time, but can be inaccurate for a number of reasons.
First, the "after" compression size only included the size of the compressed relation. While the non-compressed relation is, in most cases, empty after compression, it still occupies some space. In some other cases, data might be left in the non-compressed relation after compression or it might contain garbage. In particular, Hypercore TAM retains indexes on the non-compressed relation so it should be included in the post-compression size.
Second, segmentwise compression didn't update the compression stats at all. As a consequence, the stats can become out-of-date in case of backfill and/or deletes that increase or decrease the amount of data. An extreme case of this occurs when setting Hypercore TAM as default on the hypertable since new chunks are technically "compressed" by default, but empty, and all inserts are akin to backfill. In that case no stats are created at all.
However, updating compression stats on segmentwise recompression is challenging because it is not possible to distinguish between backfilled tuples (which increase the size) and tuples that were decompressed due to updates (which don't increase size). Therefore, this change currently only updates stats when the compressed relation is empty.
Fixes: #7713