couchbaselabs
diff --git a/‎.agents/profiles/Architect.md‎
Lines changed: 6 additions & 0 deletions b/‎.agents/profiles/Architect.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎.agents/profiles/CBRestLoader.md‎
Lines changed: 89 additions & 16 deletions b/‎.agents/profiles/CBRestLoader.md‎
Lines changed: 89 additions & 16 deletions
diff --git a/‎.agents/profiles/MongoCoder.md‎
Lines changed: 5 additions & 0 deletions b/‎.agents/profiles/MongoCoder.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions b/‎AGENTS.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎Makefile‎
Lines changed: 7 additions & 0 deletions b/‎Makefile‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎src/main/java/RestServer/CollectionLoadBatcher.java‎
Lines changed: 143 additions & 0 deletions b/‎src/main/java/RestServer/CollectionLoadBatcher.java‎
Lines changed: 143 additions & 0 deletions
@@ -24,17 +24,23 @@ graph TD
   elasticsearch[src/main/java/elasticsearch] -->|Defines Requirements| ARCH[The Architect]
   Mongo[src/main/java/mongo] -->|Defines Requirements| ARCH[The Architect]
   Utils-->|Defines Requirements| ARCH[The Architect]
+  RestServer-->|Defines Requirements| ARCH[The Architect]
   LoaderJava[src/main/java/Loader.java] -->|Invokes| Couchbase
   MongoLoaderJava[src/main/java/MongoLoader.java] -->|Invokes| Mongo
   SIFTLoaderJava[src/main/java/SIFTLoader.java] -->|Invokes| elasticsearch
   RestServer-->|Utilizes| Couchbase
   RestServer-->|Utilizes| Mongo
   RestServer-->|Utilizes| Utils
+  RestServer/SharedClusterManager -->|Manages| Couchbase/sdk
+  RestServer/CollectionLoadBatcher -->|Coordinates| RestServer/TaskRequest
   Couchbase-->|Uses| Utils
+  Couchbase/sdk/SDKClientPool -->|Uses| Couchbase/sdk/SharedClusterManager
   Mongo-->|Uses| Utils
   elasticsearch-->|Uses| Utils
   Utils-->|Utilized by| Couchbase
   Utils-->|Utilized by| Mongo
   Utils-->|Utilized by| elasticsearch
   Utils-->|Utilized by| RestServer
+  Couchbase/sdk/SharedClusterManager -->|Optimizes| Cluster Connections
+  RestServer/CollectionLoadBatcher -->|Optimizes| Multi-Collection Loads
 ```
@@ -11,13 +11,23 @@ To generate high-performance, thread-safe, and efficient REST-based document loa
 graph TD
   RestApplication[src/main/java/RestServer/RestApplication.java] -->|Entry Point| RESTLOADER[The CBRestLoader]
   TaskRequest[src/main/java/RestServer/TaskRequest.java] -->|Business Logic| RESTLOADER
+  CollectionLoadBatcher[src/main/java/RestServer/CollectionLoadBatcher.java] -->|Batch Processing| RESTLOADER
   RESTLOADER-->|Utilizes| Couchbase[src/main/java/couchbase]
-  Couchbase-->|Utilizes| Utils[src/main/java/utils]
+  Couchbase-->|Uses| Utils[src/main/java/utils]
+  Couchbase/sdk/SDKClientPool -->|Uses| SharedClusterManager[src/main/java/couchbase/sdk/SharedClusterManager.java]
+  SharedClusterManager -->|Manages| Cluster Instances
+  CollectionLoadBatcher -->|Coordinates| TaskManager
   Utils-->|Utilized by| Couchbase
+  Utils-->|Utilized by| RestServer
 ```
 
 ### Logic & Constraints
 * **Step-Zero:** Always scan `./src/main/java/couchbase` and `./src/main/java/RestServer` to understand existing SDK and REST patterns before proposing new code.
+* **Component Selection:**
+  - **Single Collection Workloads**: Use standard `SDKClientPool` → `SDKClient` → `Cluster` pattern
+  - **Multi-Collection Workloads (100-1000 collections)**: Use `SharedClusterManager` + dynamic collection switching
+  - **Massive Collection Loads (1000+ collections)**: Use `CollectionLoadBatcher` + `SharedClusterManager`
+  - **High-Throughput Operations**: Leverage shared ClusterEnvironment with 500+ KV connections
 * **REST API Focus:** Modifications target Spring Boot REST endpoints (RestHandlers) and TaskRequest business logic for HTTP-based document loading.
 * **SDK Precision:** Default to the latest Couchbase SDK (v3.x) unless specified otherwise.
 * **N1QL Mastery:** Must prioritize Indexing strategies and GSI (Global Secondary Index) awareness when writing queries.
@@ -26,6 +36,37 @@ graph TD
   - Always include error handling for DocumentNotFound and CasMismatch.
 * **Tone:** Technical, efficiency-focused, and precise.
 
+### Core Architecture Components
+
+**SharedClusterManager** (`couchbase/sdk/SharedClusterManager.java`)
+- **Purpose**: Singleton pattern managing shared Cluster instances per server connection to avoid connection exhaustion
+- **Key Features**:
+  - Shared ClusterEnvironment with optimized KV connections (default: 500 for massively parallel loads)
+  - Thread-safe reference counting for Cluster instances
+  - Automatic environment recreation post-shutdown for long-running workloads
+  - Supports both TLS and non-TLS connections
+- **Usage Pattern**:
+  ```java
+  Cluster cluster = SharedClusterManager.getCluster(server);
+  // Perform operations
+  SharedClusterManager.releaseCluster(server);
+  ```
+- **Performance Benefits**: Eliminates connection thrashing for multi-collection workloads, reduces memory overhead from per-collection Cluster instances
+
+**CollectionLoadBatcher** (`RestServer/CollectionLoadBatcher.java`)
+- **Purpose**: Java-side batch processing for massive collection loads (thousands of collections)
+- **Key Features**:
+  - Fixed batch size (default: 50) with concurrent processing
+  - Thread-safe batch state tracking with progress monitoring
+  - Prevents worker starvation and queue overhead
+  - Integration with REST API via `submitToBatch()` endpoint
+- **Usage Pattern**:
+  ```java
+  ResponseEntity<Map<String, Object>> result = 
+      CollectionLoadBatcher.submitToBatch(requestBody);
+  ```
+- **Performance Benefits**: Sequential Python calls become batched Java operations, maximizing throughput for massive collection loads
+
 ### Work flow of loading
 sequenceDiagram
     participant C as Client (REST)
@@ -55,9 +96,18 @@ sequenceDiagram
 
 ### Performance Optimization Guidelines
 * **Multi-Collection Strategy**: Prefer bucket-level clients with dynamic collection switching over per-collection client instances. Workers should call `selectCollection()` dynamically per operation instead of creating dedicated clients per collection.
-* **Connection Scaling**: KV connections should scale based on: `num_workers × target_collections / connection_reuse_factor`. Default of 5 connections per SDKClient may be insufficient for high-concurrency multi-collection workloads.
+* **Shared Cluster Management**: Use `SharedClusterManager` for all multi-collection workloads. It provides:
+  - Single Cluster instance per server connection to avoid connection exhaustion
+  - Optimized KV connections (default: 500) for massively parallel collection loads
+  - Thread-safe reference counting and automatic resource cleanup
+  - Environment recreation capability for long-running workloads
+* **Connection Scaling**: KV connections should scale based on: `num_workers × target_collections / connection_reuse_factor`. Default of 5 connections per SDKClient may be insufficient for high-concurrency multi-collection workloads. SharedClusterManager defaults to 500 KV connections for large-scale loads.
 * **Thread Pool Sizing**: Set `num_workers` based on concurrent task throughput needs, not total collections. Example: 60 workers efficiently handle 5000 collections with proper batching, rather than allocating 20 workers per collection.
-* **Batch Processing**: For large-scale multi-collection loading, use batch processing to load collections in chunks (e.g., 60-100 collections per batch) to avoid client pool exhaustion.
+* **Batch Processing**: For large-scale multi-collection loading (1000+ collections), use `CollectionLoadBatcher` to:
+  - Process collections in batches (default: 50 per batch)
+  - Prevent worker starvation and reduce queue overhead
+  - Monitor batch progress and completion status
+  - Automatically start next batch after current completion
 * **Client Pool Optimization**: SDKClientPool should cache clients at bucket level and support dynamic scope/collection switching, not create separate client instances per (scope+collection) combination.
 
 ### Architecture Anti-Patterns
@@ -75,30 +125,53 @@ Client → TaskManager → WorkLoadGenerate → SDKClientPool → Specific Colle
 ```
 Suitable for: Single collection workloads with static configuration.
 
-**Multi-Collection Optimized (Recommended):**
+**Multi-Collection Optimized (SharedClusterManager):**
 ```
-Client → TaskManager → WorkLoadTasks → SDKClientPool (Bucket-Level)
-                                   ↓
+Client → TaskManager → WorkLoadTasks → SDKClientPool → SharedClusterManager
+                                                 ↓
+                                            Single Cluster per Server
+                                                 ↓
                             Dynamic Collection Switching per Worker
-                                   ↓
-                         Worker cycles through multiple collections
+                                                 ↓
+                                         Worker cycles through collections
 ```
-Suitable for: Large-scale multi-collection loading (hundreds/thousands of collections).
+Suitable for: Large-scale multi-collection loading (hundreds/thousands) with optimized connection management.
 
-**Batched Multi-Collection:**
+**Batched Multi-Collection (CollectionLoadBatcher):**
 ```
-Client → TaskManager → BatchManager → WorkLoadGenerate (per batch)
-                         ↓
-                    60 workers load 60 collections concurrently
-                         ↓
-                    Next batch starts after completion
+Client → CollectionLoadBatcher → (Batch 1: 50 collections)
+                               → WorkLoadGenerate per collection
+                               → Progress Tracking
+                               → (Batch 2: 50 collections) after completion
 ```
-Suitable for: Very large collections (1000+) with controlled resource usage.
+Suitable for: Very large number of collections (1000+) where Python sequential calls would cause worker starvation. Uses SharedClusterManager internally for connection optimization.
 
 ### Key Performance Metrics to Monitor
+* **SharedClusterManager Metrics**:
+  - Cluster reference count and reuse rate
+  - KV connection utilization vs capacity (default: 500)
+  - Environment shutdown/recreation events
+  - Per-server cluster instance count
+* **CollectionLoadBatcher Metrics**:
+  - Active batch count and batch progress percentage
+  - Collections loaded per batch vs batch size (default: 50)
+  - Batch completion rate and queue depth
+  - Batch processor thread pool utilization
 * **Connection Pool Utilization**: Monitor KV connection count vs capacity
 * **Client Pool Efficiency**: Track client reuse rate vs new client creation
 * **Thread Wait Time**: Measure worker idle time waiting for tasks vs clients
 * **Task Queue Depth**: Monitor pending tasks in TaskManager
 * **Collection Throughput**: Track collections loaded per time unit
 * **Document Success Rate**: Monitor failedMutations and retry patterns
+
+### Hard Constraints Integration
+* **SharedClusterManager**: Must use `SharedClusterManager.getCluster(server)` and `releaseCluster(server)` for all multi-collection operations. Never create standalone Cluster instances for large-scale workloads.
+* **Environment Lifecycle**: Must follow proper ClusterEnvironment lifecycle - use shared environment with automatic recreation capability, never manually manage environment shutdown/reactivation.
+* **Batch Processing Threshold**: For workloads with >100 collections, use `CollectionLoadBatcher.submitToBatch()` instead of direct REST calls to prevent worker starvation.
+* **Thread Safety**: SharedClusterManager uses synchronized methods and volatile shutdown flag - ensure thread-safe access patterns when dealing with reference counting and environment state.
+* **Error Handling**: Always handle `AuthenticationFailureException` and cluster connection errors with proper logging and retries in both SharedClusterManager and CollectionLoadBatcher.
+
+### Build Verification
+```
+mvn clean compile package
+```
@@ -25,3 +25,8 @@ graph TD
   - Always include error handling for DocumentNotFound and DuplicateKey errors.
   - Ensure proper connection pooling and MongoClient management.
 * **Tone:** Technical, efficiency-focused, and precise.
+
+### Build Verification
+```
+mvn clean compile package
+```
@@ -11,6 +11,7 @@ This project uses specialized AI agents to maintain code quality and architectur
 ### Orchestration Logic
 * **If** the user asks for thread, doc_key. document generator related code -> **Handoff to:** `The Architect`.
 * **If** the user asks for Couchbase Sirius or REST based loader related code → **Handoff to:** `The CBRestLoader`.
+* **If** the user asks for batch processing, shared cluster management, or massive collection load optimization → **Handoff to:** `The CBRestLoader` with focus on `SharedClusterManager` and `CollectionLoadBatcher`.
 * **If** the user asks for Couchbase command line loader related code → **Handoff to:** `The CBCmdlineLoader`.
 * **If** the user asks for a Mongo related code → **Handoff to:** `The MongoCoder`.
 
 
@@ -0,0 +1,7 @@
+.PHONY: all rest_server
+
+all:
+	mvn clean compile package
+
+rest_server: all
+	java -cp ./target/magmadocloader/magmadocloader.jar RestServer.RestApplication --server.port=8080 --server.name="sirius_java_rest_loader"
@@ -0,0 +1,143 @@
+package RestServer;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+
+import org.springframework.http.HttpStatus;
+import org.springframework.http.ResponseEntity;
+
+/**
+ * CollectionLoadBatcher implements Java-side batching for massive collection loads.
+ * When Python calls doc_load() sequentially for many collections, this batches them
+ * to prevent worker starvation and queue overhead.
+ */
+public class CollectionLoadBatcher {
+    static Logger logger = LogManager.getLogger(CollectionLoadBatcher.class);
+    
+    private static final int BATCH_SIZE = 50;  // Process 50 collections concurrently
+    private static ExecutorService batchExecutor;
+    private static Map<String, BatchState> batchStates = new ConcurrentHashMap<>();
+    private static Object batchLock = new Object();
+    
+    static {
+        batchExecutor = Executors.newFixedThreadPool(5);  // 5 concurrent batch processors
+        logger.info("CollectionLoadBatcher initialized with batch size: " + BATCH_SIZE);
+    }
+    
+    public static class BatchState {
+        String batchId;
+        List<String> tasknames = new ArrayList<>();
+        int totalCollections;
+        int completedCollections;
+        long startTime;
+        
+        public BatchState(String batchId, int totalCollections) {
+            this.batchId = batchId;
+            this.totalCollections = totalCollections;
+            this.completedCollections = 0;
+            this.startTime = System.currentTimeMillis();
+        }
+        
+        public synchronized void addTask(String taskname) {
+            tasknames.add(taskname);
+            completedCollections++;
+        }
+        
+        public synchronized boolean isComplete() {
+            return completedCollections >= totalCollections;
+        }
+        
+        public synchronized double getProgress() {
+            return (double)completedCollections / totalCollections;
+        }
+    }
+    
+    /**
+     * Submit a collection load request to the batch processor
+     */
+    public static ResponseEntity<Map<String, Object>> submitToBatch(Map<String, Object> requestBody) {
+        try {
+            TaskRequest taskRequest = TaskRequest.fromJson(requestBody.toString());
+            
+            // Get current batch or create new one
+            String batchId = getCurrentBatchId();
+            BatchState batchState = batchStates.computeIfAbsent(batchId, k -> 
+                new BatchState(batchId, BATCH_SIZE));
+            
+            // Process the doc_load normally
+            ResponseEntity<Map<String, Object>> result = taskRequest.doc_load();
+            
+            // Add to batch
+            batchState.addTask(result.getBody().get("tasks").toString());
+            
+            // Check if batch is complete and start next batch
+            if (batchState.isComplete()) {
+                logger.info("Batch " + batchId + " complete (" + batchState.totalCollections + " collections)");
+                batchStates.remove(batchId);
+                
+                // Start processing next batch if there are pending loads
+                startNextBatch();
+            }
+            
+            return result;
+            
+        } catch (Exception e) {
+            Map<String, Object> body = new HashMap<>();
+            body.put("error", "Batch processing failed: " + e.getMessage());
+            body.put("status", false);
+            return new ResponseEntity<>(body, HttpStatus.INTERNAL_SERVER_ERROR);
+        }
+    }
+    
+    private static synchronized String getCurrentBatchId() {
+        // Find current batch with capacity
+        for (Map.Entry<String, BatchState> entry : batchStates.entrySet()) {
+            if (!entry.getValue().isComplete()) {
+                return entry.getKey();
+            }
+        }
+        
+        // Create new batch ID
+        return "batch_" + System.currentTimeMillis();
+    }
+    
+    private static void startNextBatch() {
+        // Could implement proactive batch starting if needed
+        logger.debug("Ready for next batch of collection loads");
+    }
+    
+    public static void shutdown() {
+        if (batchExecutor != null) {
+            batchExecutor.shutdownNow();
+            logger.info("CollectionLoadBatcher shutdown complete");
+        }
+    }
+    
+    public static Map<String, Object> getStats() {
+        Map<String, Object> stats = new HashMap<>();
+        stats.put("active_batches", batchStates.size());
+        stats.put("total_capacity", BATCH_SIZE);
+        
+        List<String> batchProgress = new ArrayList<>();
+        for (Map.Entry<String, BatchState> entry : batchStates.entrySet()) {
+            BatchState state = entry.getValue();
+            batchProgress.add(String.format("%s: %.1f%% (%d/%d)", 
+                entry.getKey(), 
+                state.getProgress() * 100,
+                state.completedCollections,
+                state.totalCollections));
+        }
+        stats.put("batch_progress", batchProgress);
+        
+        return stats;
+    }
+}