[opt](memory) Reduce CloudReplica per-instance memory footprint#61289
[opt](memory) Reduce CloudReplica per-instance memory footprint#61289dataroaring wants to merge 2 commits intomasterfrom
Conversation
Replace eager ConcurrentHashMap initialization with null defaults and lazy allocation via double-checked locking. For millions of CloudReplica instances, this saves ~120 bytes per replica when maps are empty. - primaryClusterToBackend: null by default, allocated on first put - secondaryClusterToBackends: null by default, allocated on first put - Add volatile for thread-safe lazy initialization - Add null-safe access patterns throughout all methods - Use initial capacity 2 (vs default 16) for small map optimization Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Optimizes CloudReplica memory usage by avoiding eager allocations and deduplicating repeated cluster ID String instances across replicas.
Changes:
- Lazy-initializes
primaryClusterToBackendandsecondaryClusterToBackendsvia double-checked locking and smallerConcurrentHashMapinitial capacity. - Adds a static cluster ID intern pool and interns keys during updates and post-deserialization.
- Updates call sites to be null-safe when maps are uninitialized.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Intern pool for cluster ID strings to avoid millions of duplicate String instances. | ||
| // Typically only a handful of distinct cluster IDs exist in the system. | ||
| private static final ConcurrentHashMap<String, String> CLUSTER_ID_POOL = new ConcurrentHashMap<>(); | ||
|
|
||
| private static String internClusterId(String clusterId) { | ||
| if (clusterId == null) { | ||
| return null; | ||
| } | ||
| String existing = CLUSTER_ID_POOL.putIfAbsent(clusterId, clusterId); | ||
| return existing != null ? existing : clusterId; | ||
| } |
There was a problem hiding this comment.
The static CLUSTER_ID_POOL is unbounded and will retain all distinct cluster IDs for the lifetime of the JVM. If cluster IDs can be unbounded (e.g., come from external inputs or can churn over time), this becomes a memory leak and can negate the intended savings. Consider using a bounded cache (e.g., max size + eviction), or weak-value/weak-key interning (if available in the codebase) so unused IDs can be reclaimed; at minimum, document/enforce that cluster IDs are from a small, fixed set.
| // clusterId, secondaryBe, changeTimestamp | ||
| private Map<String, Pair<Long, Long>> secondaryClusterToBackends | ||
| = new ConcurrentHashMap<String, Pair<Long, Long>>(); | ||
| private volatile Map<String, Pair<Long, Long>> secondaryClusterToBackends; |
There was a problem hiding this comment.
secondaryClusterToBackends is typed as Map, so after Gson deserialization it may be a non-concurrent implementation (e.g., LinkedTreeMap). In that case it will be non-null and getOrCreateSecondaryMap() will not replace it, and subsequent concurrent reads/writes can become unsafe. A concrete fix is to (1) declare the field as ConcurrentHashMap<String, Pair<Long, Long>> (and have getOrCreateSecondaryMap() return ConcurrentHashMap), and/or (2) in gsonPostProcess() normalize any deserialized map into a ConcurrentHashMap (similar to what’s done for primaryClusterToBackend).
| private volatile Map<String, Pair<Long, Long>> secondaryClusterToBackends; | |
| private volatile ConcurrentHashMap<String, Pair<Long, Long>> secondaryClusterToBackends; |
| private Map<String, Pair<Long, Long>> getOrCreateSecondaryMap() { | ||
| Map<String, Pair<Long, Long>> map = secondaryClusterToBackends; | ||
| if (map == null) { | ||
| synchronized (this) { | ||
| map = secondaryClusterToBackends; | ||
| if (map == null) { | ||
| map = new ConcurrentHashMap<>(2); | ||
| secondaryClusterToBackends = map; | ||
| } | ||
| } | ||
| } | ||
| return map; |
There was a problem hiding this comment.
secondaryClusterToBackends is typed as Map, so after Gson deserialization it may be a non-concurrent implementation (e.g., LinkedTreeMap). In that case it will be non-null and getOrCreateSecondaryMap() will not replace it, and subsequent concurrent reads/writes can become unsafe. A concrete fix is to (1) declare the field as ConcurrentHashMap<String, Pair<Long, Long>> (and have getOrCreateSecondaryMap() return ConcurrentHashMap), and/or (2) in gsonPostProcess() normalize any deserialized map into a ConcurrentHashMap (similar to what’s done for primaryClusterToBackend).
| private Map<String, Pair<Long, Long>> getOrCreateSecondaryMap() { | |
| Map<String, Pair<Long, Long>> map = secondaryClusterToBackends; | |
| if (map == null) { | |
| synchronized (this) { | |
| map = secondaryClusterToBackends; | |
| if (map == null) { | |
| map = new ConcurrentHashMap<>(2); | |
| secondaryClusterToBackends = map; | |
| } | |
| } | |
| } | |
| return map; | |
| private ConcurrentHashMap<String, Pair<Long, Long>> getOrCreateSecondaryMap() { | |
| Map<String, Pair<Long, Long>> map = secondaryClusterToBackends; | |
| if (map instanceof ConcurrentHashMap) { | |
| return (ConcurrentHashMap<String, Pair<Long, Long>>) map; | |
| } | |
| synchronized (this) { | |
| map = secondaryClusterToBackends; | |
| if (map instanceof ConcurrentHashMap) { | |
| return (ConcurrentHashMap<String, Pair<Long, Long>>) map; | |
| } | |
| ConcurrentHashMap<String, Pair<Long, Long>> concurrentMap = new ConcurrentHashMap<>(2); | |
| if (map != null) { | |
| concurrentMap.putAll(map); | |
| } | |
| secondaryClusterToBackends = concurrentMap; | |
| return concurrentMap; | |
| } |
| if (!replicaEnough && !allowColdRead && priMap != null && priMap.containsKey(clusterId)) { | ||
| backendId = priMap.get(clusterId); |
There was a problem hiding this comment.
This does two hash lookups (containsKey then get). Since ConcurrentHashMap disallows null values, you can do a single get and check for null (or use getOrDefault) to reduce overhead on a hot path.
| if (!replicaEnough && !allowColdRead && priMap != null && priMap.containsKey(clusterId)) { | |
| backendId = priMap.get(clusterId); | |
| if (!replicaEnough && !allowColdRead && priMap != null) { | |
| Long primaryBackendId = priMap.get(clusterId); | |
| if (primaryBackendId != null) { | |
| backendId = primaryBackendId; | |
| } |
| if (primaryClusterToBackends != null) { | ||
| ConcurrentHashMap<String, Long> map = getOrCreatePrimaryMap(); | ||
| for (Map.Entry<String, List<Long>> entry : primaryClusterToBackends.entrySet()) { | ||
| String clusterId = entry.getKey(); | ||
| List<Long> beIds = entry.getValue(); | ||
| if (beIds != null && !beIds.isEmpty()) { | ||
| primaryClusterToBackend.put(clusterId, beIds.get(0)); | ||
| map.put(internClusterId(clusterId), beIds.get(0)); | ||
| } | ||
| } | ||
| this.primaryClusterToBackends = null; | ||
| } |
There was a problem hiding this comment.
This eagerly allocates primaryClusterToBackend whenever primaryClusterToBackends is non-null, even if all entries have empty/null backend lists (no effective data to migrate). To preserve the memory-saving goal, consider deferring getOrCreatePrimaryMap() until the first time you actually encounter a non-empty beIds (i.e., allocate only when you’re about to put).
Add a static intern pool for cluster ID strings to eliminate millions of duplicate String instances across CloudReplica objects. Each replica stores cluster IDs as map keys; without interning, Gson deserialization creates a separate String instance per replica (~40-70 bytes each). - Add ConcurrentHashMap-based intern pool (heap-safe, unlike String.intern) - Intern strings at all write points (updateClusterToPrimaryBe/Secondary) - Intern keys during gsonPostProcess for deserialized maps - For 1M replicas with 3 clusters: saves ~40-70 MB of duplicate Strings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0d10541 to
8698ad6
Compare
Summary
new ConcurrentHashMap<>()initialization with null defaults and lazy allocation via double-checked locking. For millions of CloudReplica instances, this saves ~120 bytes per replica when maps are empty (most replicas never usesecondaryClusterToBackends).Memory savings estimate (1M tablets):
Changes:
primaryClusterToBackend:volatile, null by default, lazy-allocated withnew ConcurrentHashMap<>(2)secondaryClusterToBackends:volatile, null by default, lazy-allocated withnew ConcurrentHashMap<>(2)getOrCreatePrimaryMap()/getOrCreateSecondaryMap(): double-checked locking helpersinternClusterId(): heap-based intern pool using static ConcurrentHashMapgsonPostProcess(): interns keys from deserialized mapsTest plan
updateClusterToPrimaryBe/updateClusterToSecondaryBework correctly with lazy initclearClusterToBehandles null maps gracefully🤖 Generated with Claude Code