Skip to content

[opt](memory) Reduce CloudReplica per-instance memory footprint#61289

Open
dataroaring wants to merge 2 commits intomasterfrom
feature/cloud-replica-memory-opt
Open

[opt](memory) Reduce CloudReplica per-instance memory footprint#61289
dataroaring wants to merge 2 commits intomasterfrom
feature/cloud-replica-memory-opt

Conversation

@dataroaring
Copy link
Contributor

Summary

  • Lazy-init ConcurrentHashMaps: Replace eager new ConcurrentHashMap<>() initialization with null defaults and lazy allocation via double-checked locking. For millions of CloudReplica instances, this saves ~120 bytes per replica when maps are empty (most replicas never use secondaryClusterToBackends).
  • Intern cluster ID strings: Add a static intern pool for cluster ID strings to eliminate duplicate String instances across CloudReplica objects. During Gson deserialization, each replica gets its own String copy of the same cluster ID (~40-70 bytes each). Interning shares a single instance.

Memory savings estimate (1M tablets):

Optimization Per-replica savings Total (1M)
Lazy-init secondaryClusterToBackends ~56 bytes ~53 MB
Lazy-init primaryClusterToBackend (unassigned) ~56 bytes varies
Smaller ConcurrentHashMap capacity (2 vs 16) ~112 bytes ~107 MB
Cluster ID string interning ~40-70 bytes ~40-70 MB
Total ~160-230 bytes ~150-230 MB

Changes:

  • primaryClusterToBackend: volatile, null by default, lazy-allocated with new ConcurrentHashMap<>(2)
  • secondaryClusterToBackends: volatile, null by default, lazy-allocated with new ConcurrentHashMap<>(2)
  • getOrCreatePrimaryMap() / getOrCreateSecondaryMap(): double-checked locking helpers
  • internClusterId(): heap-based intern pool using static ConcurrentHashMap
  • All access sites updated with null-safe patterns
  • gsonPostProcess(): interns keys from deserialized maps

Test plan

  • Verify FE compilation passes
  • Run existing CloudReplica-related tests
  • Verify backward compatibility: old checkpoint/editlog with eager-initialized maps deserializes correctly
  • Verify updateClusterToPrimaryBe / updateClusterToSecondaryBe work correctly with lazy init
  • Verify clearClusterToBe handles null maps gracefully

🤖 Generated with Claude Code

Replace eager ConcurrentHashMap initialization with null defaults and
lazy allocation via double-checked locking. For millions of CloudReplica
instances, this saves ~120 bytes per replica when maps are empty.

- primaryClusterToBackend: null by default, allocated on first put
- secondaryClusterToBackends: null by default, allocated on first put
- Add volatile for thread-safe lazy initialization
- Add null-safe access patterns throughout all methods
- Use initial capacity 2 (vs default 16) for small map optimization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 12, 2026 13:19
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes CloudReplica memory usage by avoiding eager allocations and deduplicating repeated cluster ID String instances across replicas.

Changes:

  • Lazy-initializes primaryClusterToBackend and secondaryClusterToBackends via double-checked locking and smaller ConcurrentHashMap initial capacity.
  • Adds a static cluster ID intern pool and interns keys during updates and post-deserialization.
  • Updates call sites to be null-safe when maps are uninitialized.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +86
// Intern pool for cluster ID strings to avoid millions of duplicate String instances.
// Typically only a handful of distinct cluster IDs exist in the system.
private static final ConcurrentHashMap<String, String> CLUSTER_ID_POOL = new ConcurrentHashMap<>();

private static String internClusterId(String clusterId) {
if (clusterId == null) {
return null;
}
String existing = CLUSTER_ID_POOL.putIfAbsent(clusterId, clusterId);
return existing != null ? existing : clusterId;
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static CLUSTER_ID_POOL is unbounded and will retain all distinct cluster IDs for the lifetime of the JVM. If cluster IDs can be unbounded (e.g., come from external inputs or can churn over time), this becomes a memory leak and can negate the intended savings. Consider using a bounded cache (e.g., max size + eviction), or weak-value/weak-key interning (if available in the codebase) so unused IDs can be reclaimed; at minimum, document/enforce that cluster IDs are from a small, fixed set.

Copilot uses AI. Check for mistakes.
// clusterId, secondaryBe, changeTimestamp
private Map<String, Pair<Long, Long>> secondaryClusterToBackends
= new ConcurrentHashMap<String, Pair<Long, Long>>();
private volatile Map<String, Pair<Long, Long>> secondaryClusterToBackends;
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secondaryClusterToBackends is typed as Map, so after Gson deserialization it may be a non-concurrent implementation (e.g., LinkedTreeMap). In that case it will be non-null and getOrCreateSecondaryMap() will not replace it, and subsequent concurrent reads/writes can become unsafe. A concrete fix is to (1) declare the field as ConcurrentHashMap<String, Pair<Long, Long>> (and have getOrCreateSecondaryMap() return ConcurrentHashMap), and/or (2) in gsonPostProcess() normalize any deserialized map into a ConcurrentHashMap (similar to what’s done for primaryClusterToBackend).

Suggested change
private volatile Map<String, Pair<Long, Long>> secondaryClusterToBackends;
private volatile ConcurrentHashMap<String, Pair<Long, Long>> secondaryClusterToBackends;

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +138
private Map<String, Pair<Long, Long>> getOrCreateSecondaryMap() {
Map<String, Pair<Long, Long>> map = secondaryClusterToBackends;
if (map == null) {
synchronized (this) {
map = secondaryClusterToBackends;
if (map == null) {
map = new ConcurrentHashMap<>(2);
secondaryClusterToBackends = map;
}
}
}
return map;
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secondaryClusterToBackends is typed as Map, so after Gson deserialization it may be a non-concurrent implementation (e.g., LinkedTreeMap). In that case it will be non-null and getOrCreateSecondaryMap() will not replace it, and subsequent concurrent reads/writes can become unsafe. A concrete fix is to (1) declare the field as ConcurrentHashMap<String, Pair<Long, Long>> (and have getOrCreateSecondaryMap() return ConcurrentHashMap), and/or (2) in gsonPostProcess() normalize any deserialized map into a ConcurrentHashMap (similar to what’s done for primaryClusterToBackend).

Suggested change
private Map<String, Pair<Long, Long>> getOrCreateSecondaryMap() {
Map<String, Pair<Long, Long>> map = secondaryClusterToBackends;
if (map == null) {
synchronized (this) {
map = secondaryClusterToBackends;
if (map == null) {
map = new ConcurrentHashMap<>(2);
secondaryClusterToBackends = map;
}
}
}
return map;
private ConcurrentHashMap<String, Pair<Long, Long>> getOrCreateSecondaryMap() {
Map<String, Pair<Long, Long>> map = secondaryClusterToBackends;
if (map instanceof ConcurrentHashMap) {
return (ConcurrentHashMap<String, Pair<Long, Long>>) map;
}
synchronized (this) {
map = secondaryClusterToBackends;
if (map instanceof ConcurrentHashMap) {
return (ConcurrentHashMap<String, Pair<Long, Long>>) map;
}
ConcurrentHashMap<String, Pair<Long, Long>> concurrentMap = new ConcurrentHashMap<>(2);
if (map != null) {
concurrentMap.putAll(map);
}
secondaryClusterToBackends = concurrentMap;
return concurrentMap;
}

Copilot uses AI. Check for mistakes.
Comment on lines +360 to +361
if (!replicaEnough && !allowColdRead && priMap != null && priMap.containsKey(clusterId)) {
backendId = priMap.get(clusterId);
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does two hash lookups (containsKey then get). Since ConcurrentHashMap disallows null values, you can do a single get and check for null (or use getOrDefault) to reduce overhead on a hot path.

Suggested change
if (!replicaEnough && !allowColdRead && priMap != null && priMap.containsKey(clusterId)) {
backendId = priMap.get(clusterId);
if (!replicaEnough && !allowColdRead && priMap != null) {
Long primaryBackendId = priMap.get(clusterId);
if (primaryBackendId != null) {
backendId = primaryBackendId;
}

Copilot uses AI. Check for mistakes.
Comment on lines 718 to 728
if (primaryClusterToBackends != null) {
ConcurrentHashMap<String, Long> map = getOrCreatePrimaryMap();
for (Map.Entry<String, List<Long>> entry : primaryClusterToBackends.entrySet()) {
String clusterId = entry.getKey();
List<Long> beIds = entry.getValue();
if (beIds != null && !beIds.isEmpty()) {
primaryClusterToBackend.put(clusterId, beIds.get(0));
map.put(internClusterId(clusterId), beIds.get(0));
}
}
this.primaryClusterToBackends = null;
}
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This eagerly allocates primaryClusterToBackend whenever primaryClusterToBackends is non-null, even if all entries have empty/null backend lists (no effective data to migrate). To preserve the memory-saving goal, consider deferring getOrCreatePrimaryMap() until the first time you actually encounter a non-empty beIds (i.e., allocate only when you’re about to put).

Copilot uses AI. Check for mistakes.
Add a static intern pool for cluster ID strings to eliminate millions
of duplicate String instances across CloudReplica objects. Each replica
stores cluster IDs as map keys; without interning, Gson deserialization
creates a separate String instance per replica (~40-70 bytes each).

- Add ConcurrentHashMap-based intern pool (heap-safe, unlike String.intern)
- Intern strings at all write points (updateClusterToPrimaryBe/Secondary)
- Intern keys during gsonPostProcess for deserialized maps
- For 1M replicas with 3 clusters: saves ~40-70 MB of duplicate Strings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dataroaring dataroaring force-pushed the feature/cloud-replica-memory-opt branch from 0d10541 to 8698ad6 Compare March 13, 2026 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants