Open
Description
What would you like to be added?
Want to collect ideas for improvements with large impact. While v3.6 was targeting stability and testing, the v3.7 should be a release that shows that there is still a lot of place for performance improvement in etcd. Goal is to inspire more ambitious work and start working towards it in v3.7. This is not exclusive list , and I'm happy to add ideas:
- RangeStream - Issue Large range queries from api-server could take down etcd due to OOM #12342 Bring the KEP-5116 to etcd. More about it in https://www.youtube.com/watch?v=SdLLOcNZN5E. Based on my initial research we cannot introduce streaming encoders for GRPC, so the only option seems like adding separate streaming range method. PoC in Draft implementation of streaming range request #19766 is very promising. Older attempt Yxjetcd rangestream #12343
- Async raft - Bring async raft improvements in raft: support asynchronous storage writes raft#8 to etcd. Issue Async raft #18027, Draft implementation Async raft #18027
- Reduce memory usage of etcd member catchup mechanism - Introduce a mechanism to propagate a slowest member raft index, so other members can compact raft log based on it. Reduce memory usage of etcd member catchup mechanism #17098
- Incremental defrag - do minimal defrag operation part of normal transactions to reduce the need for run full defrag that locks whole database for tens of seconds Incremental etcd defrag bbolt#694
- Rewrite watch to ensure stable memory usage - Watch starvation can cause OOMs #16839 showed how fragile watch is. Like K8s etcd should have a mechanism that drops slow watchers and distributes watch events in a way that maintains a stable memory.
- Rewrite watch to prevent Put&Watch impacting others performance - Slow watchers impact PUT latency #18109 Lower throughput while having more and more watchers #19064
- Async log writing - Would be good to confirm performance impact of logging in etcd, I know sync info logging is issue for K8s. If there is impact we might consider implementing Support asynchronous writing of log file. #17071
- Improve performance for range with limit - One approach would be to rewrite index to store precomputed count within btree Experiments with replacing index implementation of
btree with key -> revisions
withrevision list of immutable trees
#18184. Other to provide option to disable count calculation. Optimize RangeRequest Revision count #16510. My preference would be to optimize to avoid overloading API with too many options.
Why is this needed?
Give community ideas to drive more interest in more harder but more impact contributions.