[opt](memory) Reduce CloudReplica per-instance memory footprint by dataroaring · Pull Request #61289 · apache/doris

dataroaring · 2026-03-12T13:19:24Z

Summary

Lazy-init ConcurrentHashMaps: Replace eager new ConcurrentHashMap<>() initialization with null defaults and lazy allocation via double-checked locking. For millions of CloudReplica instances, this saves ~120 bytes per replica when maps are empty (most replicas never use secondaryClusterToBackends).
Intern cluster ID strings: Add a static intern pool for cluster ID strings to eliminate duplicate String instances across CloudReplica objects. During Gson deserialization, each replica gets its own String copy of the same cluster ID (~40-70 bytes each). Interning shares a single instance.

Memory savings estimate (1M tablets):

Optimization	Per-replica savings	Total (1M)
Lazy-init secondaryClusterToBackends	~56 bytes	~53 MB
Lazy-init primaryClusterToBackend (unassigned)	~56 bytes	varies
Smaller ConcurrentHashMap capacity (2 vs 16)	~112 bytes	~107 MB
Cluster ID string interning	~40-70 bytes	~40-70 MB
Total	~160-230 bytes	~150-230 MB

Changes:

primaryClusterToBackend: volatile, null by default, lazy-allocated with new ConcurrentHashMap<>(2)
secondaryClusterToBackends: volatile, null by default, lazy-allocated with new ConcurrentHashMap<>(2)
getOrCreatePrimaryMap() / getOrCreateSecondaryMap(): double-checked locking helpers
internClusterId(): heap-based intern pool using static ConcurrentHashMap
All access sites updated with null-safe patterns
gsonPostProcess(): interns keys from deserialized maps

Test plan

Verify FE compilation passes
Run existing CloudReplica-related tests
Verify backward compatibility: old checkpoint/editlog with eager-initialized maps deserializes correctly
Verify updateClusterToPrimaryBe / updateClusterToSecondaryBe work correctly with lazy init
Verify clearClusterToBe handles null maps gracefully

🤖 Generated with Claude Code

Replace eager ConcurrentHashMap initialization with null defaults and lazy allocation via double-checked locking. For millions of CloudReplica instances, this saves ~120 bytes per replica when maps are empty. - primaryClusterToBackend: null by default, allocated on first put - secondaryClusterToBackends: null by default, allocated on first put - Add volatile for thread-safe lazy initialization - Add null-safe access patterns throughout all methods - Use initial capacity 2 (vs default 16) for small map optimization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

hello-stephen · 2026-03-12T13:19:30Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

Optimizes CloudReplica memory usage by avoiding eager allocations and deduplicating repeated cluster ID String instances across replicas.

Changes:

Lazy-initializes primaryClusterToBackend and secondaryClusterToBackends via double-checked locking and smaller ConcurrentHashMap initial capacity.
Adds a static cluster ID intern pool and interns keys during updates and post-deserialization.
Updates call sites to be null-safe when maps are uninitialized.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T13:25:32Z