finos
diff --git a/‎docs/source/tutorials/scaling.rst‎
Lines changed: 35 additions & 91 deletions b/‎docs/source/tutorials/scaling.rst‎
Lines changed: 35 additions & 91 deletions
diff --git a/‎src/cpp/scaler/utility/pymod/indexed_queue.cpp‎
Lines changed: 1 addition & 6 deletions b/‎src/cpp/scaler/utility/pymod/indexed_queue.cpp‎
Lines changed: 1 addition & 6 deletions
diff --git a/‎src/cpp/scaler/utility/pymod/one_to_many_dict.cpp‎
Lines changed: 3 additions & 13 deletions b/‎src/cpp/scaler/utility/pymod/one_to_many_dict.cpp‎
Lines changed: 3 additions & 13 deletions
diff --git a/‎src/cpp/scaler/utility/pymod/stable_priority_queue.cpp‎
Lines changed: 6 additions & 0 deletions b/‎src/cpp/scaler/utility/pymod/stable_priority_queue.cpp‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎src/cpp/scaler/uv_ymq/CMakeLists.txt‎
Lines changed: 4 additions & 12 deletions b/‎src/cpp/scaler/uv_ymq/CMakeLists.txt‎
Lines changed: 4 additions & 12 deletions
diff --git a/‎src/cpp/scaler/uv_ymq/address.cpp‎
Lines changed: 23 additions & 8 deletions b/‎src/cpp/scaler/uv_ymq/address.cpp‎
Lines changed: 23 additions & 8 deletions
diff --git a/‎src/cpp/scaler/uv_ymq/address.h‎
Lines changed: 5 additions & 0 deletions b/‎src/cpp/scaler/uv_ymq/address.h‎
Lines changed: 5 additions & 0 deletions
@@ -11,6 +11,8 @@ The scaling system consists of two main components:
 1. **Scaling Controller**: A policy that monitors task queues and worker availability to make scaling decisions.
 2. **Worker Adapter**: A component that handles the actual creation and destruction of worker groups (e.g., starting containers, launching processes).
 
+The Scaling Controller runs within the Scheduler and communicates with Worker Adapters via Cap'n Proto messages. Worker Adapters connect to the Scheduler and receive scaling commands directly.
+
 The scaling policy is configured via the ``policy_content`` setting in the scheduler configuration:
 
 .. code:: bash
@@ -72,8 +74,7 @@ This policy is straightforward and works well for homogeneous workloads where al
 .. code:: bash
 
     scaler_scheduler tcp://127.0.0.1:8516 \
-        --policy-content "allocate=even_load; scaling=vanilla" \
-        --adapter-webhook-urls "http://localhost:8080/webhook"
+        --policy-content "allocate=even_load; scaling=vanilla"
 
 
 Capability Scaling (``capability``)
@@ -88,6 +89,7 @@ The capability scaling controller is designed for heterogeneous workloads where
 * Scales worker groups per capability set independently
 * Ensures tasks are matched to workers that can handle them
 * Prevents scaling down the last worker group capable of handling pending tasks
+* Prevents thrashing by checking if scale-down would immediately trigger scale-up
 
 **How It Works:**
 
@@ -107,8 +109,7 @@ The capability scaling controller is designed for heterogeneous workloads where
 .. code:: bash
 
     scaler_scheduler tcp://127.0.0.1:8516 \
-        --policy-content "allocate=capability; scaling=capability" \
-        --adapter-webhook-urls "http://localhost:8080/webhook"
+        --policy-content "allocate=capability; scaling=capability"
 
 **Example Scenario:**
 
@@ -138,127 +139,69 @@ With the capability scaling policy:
 3. Idle GPU workers can be shut down without affecting CPU task processing.
 
 
-**Worker Adapter Integration:**
-
-The capability scaling controller communicates with the worker adapter via HTTP webhooks. When requesting a new worker group, it includes the required capabilities:
-
-.. code:: json
-
-    {
-        "action": "start_worker_group",
-        "capabilities": {"gpu": 1}
-    }
-
-The worker adapter should provision workers with the requested capabilities and return:
-
-.. code:: json
-
-    {
-        "worker_group_id": "group-abc123",
-        "worker_ids": ["worker-1", "worker-2"],
-        "capabilities": {"gpu": 1}
-    }
-
-
 Fixed Elastic Scaling (``fixed_elastic``)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The fixed elastic scaling controller supports hybrid scaling with two worker adapters:
+The fixed elastic scaling controller supports hybrid scaling with multiple worker adapters:
 
-* **Primary Adapter**: Limited number of worker groups (e.g., on-premise resources)
-* **Secondary Adapter**: Overflow capacity (e.g., cloud burst)
+* **Primary Adapter**: A single worker group (identified by ``max_worker_groups == 1``) that starts once and never shuts down
+* **Secondary Adapter**: Elastic capacity (``max_worker_groups > 1``) that scales based on demand
 
-This is useful for scenarios where you have a fixed pool of dedicated resources but want to burst to cloud resources during peak demand.
+This is useful for scenarios where you have a fixed pool of dedicated resources but want to burst to additional resources during peak demand.
 
 .. code:: bash
 
     scaler_scheduler tcp://127.0.0.1:8516 \
-        --policy-content "allocate=even_load; scaling=fixed_elastic" \
-        --adapter-webhook-urls "http://localhost:8080/primary" "http://localhost:8081/secondary"
+        --policy-content "allocate=even_load; scaling=fixed_elastic"
 
 **Behavior:**
 
-* New worker groups are created from the primary adapter until its limit is reached
-* Once primary is at capacity, new groups are created from the secondary adapter
-* When scaling down, secondary adapter groups are shut down first
+* The primary adapter's worker group is started once and never shut down
+* Secondary adapter groups are created when demand exceeds primary capacity
+* When scaling down, only secondary adapter groups are shut down
 
 
 Worker Adapter Protocol
 -----------------------
 
-Scaling controllers communicate with worker adapters via HTTP POST requests to a webhook URL. The adapter must implement the following actions:
-
-**Get Adapter Info:**
-
-Request:
-
-.. code:: json
-
-    {"action": "get_worker_adapter_info"}
-
-Response:
-
-.. code:: json
-
-    {
-        "max_worker_groups": 10
-    }
-
-**Start Worker Group:**
-
-Request:
-
-.. code:: json
-
-    {
-        "action": "start_worker_group",
-        "capabilities": {"gpu": 1}
-    }
-
-Response (success - HTTP 200):
-
-.. code:: json
-
-    {
-        "worker_group_id": "group-abc123",
-        "worker_ids": ["worker-1", "worker-2"],
-        "capabilities": {"gpu": 1}
-    }
-
-Response (capacity exceeded - HTTP 429):
+Scaling controllers, running within the scheduler process, communicate with worker adapters using Cap'n Proto messages through the connection that worker adapters use to communicate with the scheduler. The protocol uses the following message types:
 
-.. code:: json
+**WorkerAdapterHeartbeat (Adapter -> Scheduler):**
 
-    {"error": "Capacity exceeded"}
+Worker adapters periodically send heartbeats to the scheduler containing their capacity information:
 
-**Shutdown Worker Group:**
+* ``max_worker_groups``: Maximum number of worker groups this adapter can manage
+* ``workers_per_group``: Number of workers in each group
+* ``capabilities``: Default capabilities for workers from this adapter
 
-Request:
+**WorkerAdapterCommand (Scheduler -> Adapter):**
 
-.. code:: json
+The scheduler sends commands to worker adapters:
 
-    {
-        "action": "shutdown_worker_group",
-        "worker_group_id": "group-abc123"
-    }
+* ``StartWorkerGroup``: Request to start a new worker group
 
-Response (success - HTTP 200):
+  * ``worker_group_id``: Empty for new groups (adapter assigns ID)
+  * ``capabilities``: Required capabilities for the worker group
 
-.. code:: json
+* ``ShutdownWorkerGroup``: Request to shut down an existing worker group
 
-    {"status": "shutdown"}
+  * ``worker_group_id``: ID of the group to shut down
 
-Response (not found - HTTP 404):
+**WorkerAdapterCommandResponse (Adapter -> Scheduler):**
 
-.. code:: json
+Worker adapters respond to commands with status and details:
 
-    {"error": "Worker group not found"}
+* ``worker_group_id``: ID of the affected worker group
+* ``command``: The command type this response is for
+* ``status``: Result status (``Success``, ``WorkerGroupTooMuch``, ``WorkerGroupIDNotFound``)
+* ``worker_ids``: List of worker IDs in the group (for start commands)
+* ``capabilities``: Actual capabilities of the started workers
 
 
 Example Worker Adapter
 ----------------------
 
-Here is an example of a simple worker adapter using the ECS (Amazon Elastic Container Service) integration:
+Here is an example of a worker adapter using the ECS (Amazon Elastic Container Service) integration:
 
 .. literalinclude:: ../../../src/scaler/worker_adapter/ecs.py
    :language: python
@@ -276,3 +219,4 @@ Tips
 
 3. **Monitor scaling events**: Use Scaler's monitoring tools (``scaler_top``) to observe scaling behavior and tune policies.
 
+4. **Worker Adapter Placement**: Run worker adapters on machines that can provision the required resources (e.g., run the ECS adapter where it has AWS credentials, run the native adapter on the target machine).
@@ -77,13 +77,8 @@ static PyObject* PyIndexedQueueRemove(PyIndexedQueue* self, PyObject* args)
     Py_RETURN_NONE;
 }
 
-static int PyIndexedQueueContains(PyObject* self, PyObject* args)
+static int PyIndexedQueueContains(PyObject* self, PyObject* item)
 {
-    PyObject* item {};
-    if (!PyArg_Parse(args, "O", &item)) {
-        return -1;
-    }
-
     return ((PyIndexedQueue*)self)->queue.contains(OwnedPyObject<>::fromBorrowed(item));
 }
 
 
@@ -255,24 +255,13 @@ static PyObject* PyOneToManyDictItems(PyOneToManyDict* self, PyObject* args)
     return itemList.take();
 }
 
-// called when using the 'in' operator (__contains__)
-static PyObject* PyOneToManyDictContains(PyOneToManyDict* self, PyObject* args)
+static int PyOneToManyDictContains(PyObject* self, PyObject* key)
 {
-    PyObject* key {};
-    if (!PyArg_ParseTuple(args, "O", &key)) {
-        return nullptr;  // Invalid arguments
-    }
-
-    if (self->dict.hasKey(OwnedPyObject<>::fromBorrowed(key))) {
-        Py_RETURN_TRUE;
-    } else {
-        Py_RETURN_FALSE;
-    }
+    return ((PyOneToManyDict*)self)->dict.hasKey(OwnedPyObject<>::fromBorrowed(key));
 }
 
 // Define the methods for the OneToManyDict Python class
 static PyMethodDef PyOneToManyDictMethods[] = {
-    {"__contains__", (PyCFunction)PyOneToManyDictContains, METH_VARARGS, "__contains__ method"},
     {"keys", (PyCFunction)PyOneToManyDictKeys, METH_VARARGS, "Get Keys from the dictionary"},
     {"values", (PyCFunction)PyOneToManyDictValues, METH_VARARGS, "Get Values from the dictionary"},
     {"items", (PyCFunction)PyOneToManyDictItems, METH_VARARGS, "Get Items from the dictionary"},
@@ -310,6 +299,7 @@ static PyType_Slot PyOneToManyDictSlots[] = {
     {Py_tp_new, (void*)PyOneToManyDictNew},
     {Py_tp_methods, PyOneToManyDictMethods},
     {Py_tp_iter, (void*)PyOneToManyDictIteratorIter},
+    {Py_sq_contains, (void*)PyOneToManyDictContains},
     {0, nullptr},
 };
 
 
@@ -132,12 +132,18 @@ static Py_ssize_t PyStablePriorityQueueSize(PyObject* self)
     return ((PyStablePriorityQueue*)self)->queue.size();
 }
 
+static int PyStablePriorityQueueContains(PyObject* self, PyObject* item)
+{
+    return ((PyStablePriorityQueue*)self)->queue._locator.count(OwnedPyObject<>::fromBorrowed(item)) > 0;
+}
+
 static PyType_Slot PyStablePriorityQueueSlots[] = {
     {Py_tp_new, (void*)PyStablePriorityQueueNew},
     {Py_tp_init, (void*)PyStablePriorityQueueInit},
     {Py_tp_dealloc, (void*)PyStablePriorityQueueDealloc},
     {Py_tp_methods, (void*)PyStablePriorityQueueMethods},
     {Py_sq_length, (void*)PyStablePriorityQueueSize},
+    {Py_sq_contains, (void*)PyStablePriorityQueueContains},
     {Py_tp_doc, (void*)"StablePriorityQueue"},
     {0, nullptr},
 };
 
@@ -1,25 +1,17 @@
 add_library(uv_ymq_objs OBJECT
-    accept_server.h
-    accept_server.cpp
-
     address.h
     address.cpp
 
     binder_socket.h
     binder_socket.cpp
 
-    connect_client.h
-    connect_client.cpp
-
     connector_socket.h
     connector_socket.cpp
 
-    event_loop_thread.h
-    event_loop_thread.cpp
-
     io_context.h
     io_context.cpp
-
-    message_connection.h
-    message_connection.cpp
 )
+
+add_subdirectory(internal)
+add_subdirectory(future)
+add_subdirectory(sync)
@@ -1,6 +1,7 @@
 #include "scaler/uv_ymq/address.h"
 
 #include <cassert>
+#include <utility>
 
 namespace scaler {
 namespace uv_ymq {
@@ -72,21 +73,35 @@ const scaler::wrapper::uv::SocketAddress& Address::asTCP() const noexcept
 
 const std::string& Address::asIPC() const noexcept
 {
-    assert(type() == Type::TCP);
+    assert(type() == Type::IPC);
     return std::get<std::string>(_value);
 }
 
-std::expected<Address, scaler::ymq::Error> Address::fromString(const std::string& address) noexcept
+std::expected<std::string, scaler::ymq::Error> Address::toString() const noexcept
 {
-    static constexpr std::string_view tcpPrefix = "tcp://";
-    static constexpr std::string_view ipcPrefix = "ipc://";
+    switch (type()) {
+        case Type::TCP: {
+            auto tcpAddrStr = asTCP().toString();
+            if (!tcpAddrStr.has_value()) {
+                return std::unexpected {scaler::ymq::Error {
+                    scaler::ymq::Error::ErrorCode::InvalidAddressFormat, "Failed to convert TCP address to string"}};
+            }
+
+            return std::string(_tcpPrefix) + tcpAddrStr.value();
+        }
+        case Type::IPC: return std::string(_ipcPrefix) + asIPC();
+        default: std::unreachable();
+    };
+}
 
-    if (address.starts_with(tcpPrefix)) {
-        return details::fromTCPString(address.substr(tcpPrefix.size()));
+std::expected<Address, scaler::ymq::Error> Address::fromString(const std::string& address) noexcept
+{
+    if (address.starts_with(_tcpPrefix)) {
+        return details::fromTCPString(address.substr(_tcpPrefix.size()));
     }
 
-    if (address.starts_with(ipcPrefix)) {
-        return Address(address.substr(ipcPrefix.size()));
+    if (address.starts_with(_ipcPrefix)) {
+        return Address(address.substr(_ipcPrefix.size()));
     }
 
     return std::unexpected {scaler::ymq::Error {
 
@@ -34,6 +34,8 @@ class Address {
 
     const std::string& asIPC() const noexcept;
 
+    std::expected<std::string, scaler::ymq::Error> toString() const noexcept;
+
     // Try to parse a string to an Address instance.
     //
     // Example of string values are:
@@ -45,6 +47,9 @@ class Address {
     static std::expected<Address, scaler::ymq::Error> fromString(const std::string& address) noexcept;
 
 private:
+    static constexpr std::string_view _tcpPrefix = "tcp://";
+    static constexpr std::string_view _ipcPrefix = "ipc://";
+
     std::variant<scaler::wrapper::uv::SocketAddress, std::string> _value;
 };
Original file line number	Diff line number	Diff line change
`@@ -77,13 +77,8 @@ static PyObject* PyIndexedQueueRemove(PyIndexedQueue* self, PyObject* args)`
`77`	`77`	`Py_RETURN_NONE;`
`78`	`78`	`}`
`79`	`79`
`80`		`-static int PyIndexedQueueContains(PyObject* self, PyObject* args)`
	`80`	`+static int PyIndexedQueueContains(PyObject* self, PyObject* item)`
`81`	`81`	`{`
`82`		`- PyObject* item {};`
`83`		`- if (!PyArg_Parse(args, "O", &item)) {`
`84`		`- return -1;`
`85`		`- }`
`86`		`-`
`87`	`82`	`return ((PyIndexedQueue*)self)->queue.contains(OwnedPyObject<>::fromBorrowed(item));`
`88`	`83`	`}`
`89`	`84`