Skip to content

Conversation

@ronaldngounou
Copy link
Member

@ronaldngounou ronaldngounou commented Oct 11, 2025

As per performance improvements to etcd size limits have been evaluated to 100GB instead of 8GB.
https://www.cncf.io/blog/2019/05/09/performance-optimization-of-etcd-in-web-scale-data-scenario/

Contributes to issue #588

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ronaldngounou
Once this PR has been reviewed and has the lgtm label, please assign ivanvc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ronaldngounou ronaldngounou force-pushed the issue588-lift_etcd_GB_limit branch 2 times, most recently from ba37b27 to c93d626 Compare October 11, 2025 09:40
@ronaldngounou ronaldngounou force-pushed the issue588-lift_etcd_GB_limit branch from c93d626 to 49407a6 Compare October 11, 2025 18:36
@ronaldngounou
Copy link
Member Author

Lint issues fixed:

content/en/docs/v3.4/faq.md:29:291 MD059/descriptive-link-text 
Link text should be descriptive [Context: "[here]"] 
(https://github.com/DavidAnson/markdownlint/blob/main/doc/md059.md)

@jberkus
Copy link
Contributor

jberkus commented Oct 18, 2025

If you're doing this refactoring, I'd like to make it clear to users that the 100GB is a recommended maximum size, and not a hard limit. This would mean different text in a couple of places. I don't know what the actual hard limit is; probably need to look at the boltDB code.

@ronaldngounou
Copy link
Member Author

Could you please suggest a wording that we should have in the meatime?

## Memory

etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly. 100GB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it.

Copy link
Contributor

@wendy-ha18 wendy-ha18 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the context of this doc: etcd has a relatively small memory ...... Typically 8GB is enough.... 100GB is a suggested maximum size for normal environments and etcd warns at startup if the configured value exceeds it is makes more sense for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually have a warning at 100GB? I don't have a machine I can test that on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this change

## Memory

etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a limit, not a recommendation:

Suggested change
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 100GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly.
etcd has a relatively small memory footprint but its performance still depends on having enough memory. An etcd server will aggressively cache key-value data and spends most of the rest of its memory tracking watchers. Typically 8GB is enough. For heavy deployments with thousands of watchers and millions of keys, allocate 16GB to 64GB memory accordingly, up to a recommended maximum of 100GB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@jberkus
Copy link
Contributor

jberkus commented Nov 19, 2025

For content/en/blog/2023/how_to_debug_large_db_size_issue.md let's take it out of this PR, and open a separate effort to convert the blog post into an Operations doc.

@ToSuperGod
Copy link

May I ask if there is any data in etcd that affects the cluster during data compression and fragmentation after storing 50GB of data? And how long does it take for large-scale insertion/query operations after completing the above operations

@ronaldngounou ronaldngounou force-pushed the issue588-lift_etcd_GB_limit branch from 49407a6 to 8f3c651 Compare December 5, 2025 05:44
@ronaldngounou
Copy link
Member Author

@ToSuperGod

May I ask if there is any data in etcd that affects the cluster during data compression and fragmentation after storing 50GB of data? And how long does it take for large-scale insertion/query operations after completing the above operations

When etcd stores large amounts of data (like 50GB, which is quite large for etcd), several things happen:
The main issue is fragmentation in the backend BoltDB database. As etcd performs updates and deletes, it creates "holes" in the database file. This fragmentation affects both disk usage and read performance since the database file becomes much larger than the actual data it contains.
Compression in etcd happens at the storage layer (BoltDB uses some compression), but the bigger concern is the growth of the database file itself. With heavy fragmentation, you might see a 50GB logical database consuming significantly more disk space, and reads having to traverse more pages.

Cluster Impact:
Performance degradation occurs in several ways: slower reads from the apiserver (affecting all kubectl commands and controller reconciliation loops), increased latency for watch operations (which controllers depend on), and potential leader election issues if followers fall too far behind during compaction.

The most critical impact is on write performance - if etcd is struggling, the entire Kubernetes control plane slows down because every resource change goes through etcd.

Compaction and Defragmentation:
After compaction (which removes old revisions) and defragmentation (which rebuilds the database file), you should see significant improvement. Defragmentation is particularly important - it rewrites the database to eliminate holes.

Timing for Large Operations:
For insertion/query timing after these operations, it really depends on your specific setup, but generally:

  • After proper compaction and defrag, query latency should drop significantly (potentially 50-90% improvement if fragmentation was severe)

  • Large-scale insertions should also improve, but etcd has hard limits - it's designed for storing configuration/state, not as a general-purpose database

  • etcd recommends keeping databases under 8GB in practice

If you're consistently storing 50GB in etcd, that's a red flag - you might need to rethink what you're storing there. Consider if you're inadvertently storing large ConfigMaps/Secrets or have resource leaks.

@ronaldngounou ronaldngounou force-pushed the issue588-lift_etcd_GB_limit branch from 8f3c651 to 1faefe1 Compare December 5, 2025 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants