Improve accuracy of compression stats #7901

erimatnor · 2025-04-02T09:05:49Z

This change improves the accuracy of compression chunks size stats. These stats are computed at compression time, but can be inaccurate for a number of reasons.

First, the "after" compression size only included the size of the compressed relation. While the non-compressed relation is, in most cases, empty after compression, it still occupies some space. In some other cases, data might be left in the non-compressed relation after compression or it might contain garbage. In particular, Hypercore TAM retains indexes on the non-compressed relation so it should be included in the post-compression size.

Second, segmentwise compression didn't update the compression stats at all. As a consequence, the stats can become out-of-date in case of backfill and/or deletes that increase or decrease the amount of data. An extreme case of this occurs when setting Hypercore TAM as default on the hypertable since new chunks are technically "compressed" by default, but empty, and all inserts are akin to backfill. In that case no stats are created at all.

However, updating compression stats on segmentwise recompression is challenging because it is not possible to distinguish between backfilled tuples (which increase the size) and tuples that were decompressed due to updates (which don't increase size). Therefore, this change currently only updates stats when the compressed relation is empty.

Fixes: #7713

This change improves the accuracy of compression chunks size stats. These stats are computed at compression time, but can be inaccurate for a number of reasons. First, the "after" compression size only included the size of the compressed relation. While the non-compressed relation is, in most cases, empty after compression, it still occupies some space. In some other cases, data might be left in the non-compressed relation after compression or it might contain garbage. In particular, Hypercore TAM retains indexes on the non-compressed relation so it should be included in the post-compression size. Second, segmentwise compression didn't update the compression stats at all. As a consequence, the stats can become out-of-date in case of backfill and/or deletes that increase or decrease the amount of data. An extreme case of this occurs when setting Hypercore TAM as default on the hypertable since new chunks are technically "compressed" by default, but empty, and all inserts are akin to backfill. In that case no stats are created at all. However, updating compression stats on segmentwise recompression is challenging because it is not possible to distinguish between backfilled tuples (which increase the size) and tuples that were decompressed due to updates (which don't increase size). Therefore, this change currently only updates stats when the compressed relation is empty.

github-actions bot assigned erimatnor Apr 2, 2025

erimatnor closed this May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve accuracy of compression stats #7901

Improve accuracy of compression stats #7901

Uh oh!

erimatnor commented Apr 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Improve accuracy of compression stats #7901

Improve accuracy of compression stats #7901

Uh oh!

Conversation

erimatnor commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

erimatnor commented Apr 2, 2025 •

edited

Loading