You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/tutorials/scaling.rst
+35-91Lines changed: 35 additions & 91 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,8 @@ The scaling system consists of two main components:
11
11
1. **Scaling Controller**: A policy that monitors task queues and worker availability to make scaling decisions.
12
12
2. **Worker Adapter**: A component that handles the actual creation and destruction of worker groups (e.g., starting containers, launching processes).
13
13
14
+
The Scaling Controller runs within the Scheduler and communicates with Worker Adapters via Cap'n Proto messages. Worker Adapters connect to the Scheduler and receive scaling commands directly.
15
+
14
16
The scaling policy is configured via the ``policy_content`` setting in the scheduler configuration:
15
17
16
18
.. code:: bash
@@ -72,8 +74,7 @@ This policy is straightforward and works well for homogeneous workloads where al
@@ -138,127 +139,69 @@ With the capability scaling policy:
138
139
3. Idle GPU workers can be shut down without affecting CPU task processing.
139
140
140
141
141
-
**Worker Adapter Integration:**
142
-
143
-
The capability scaling controller communicates with the worker adapter via HTTP webhooks. When requesting a new worker group, it includes the required capabilities:
144
-
145
-
.. code:: json
146
-
147
-
{
148
-
"action": "start_worker_group",
149
-
"capabilities": {"gpu": 1}
150
-
}
151
-
152
-
The worker adapter should provision workers with the requested capabilities and return:
153
-
154
-
.. code:: json
155
-
156
-
{
157
-
"worker_group_id": "group-abc123",
158
-
"worker_ids": ["worker-1", "worker-2"],
159
-
"capabilities": {"gpu": 1}
160
-
}
161
-
162
-
163
142
Fixed Elastic Scaling (``fixed_elastic``)
164
143
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
165
144
166
-
The fixed elastic scaling controller supports hybrid scaling with two worker adapters:
145
+
The fixed elastic scaling controller supports hybrid scaling with multiple worker adapters:
167
146
168
-
* **Primary Adapter**: Limited number of worker groups (e.g., on-premise resources)
* New worker groups are created from the primary adapter until its limit is reached
182
-
* Once primary is at capacity, new groups are created from the secondary adapter
183
-
* When scaling down, secondary adapter groups are shut down first
159
+
* The primary adapter's worker group is started once and never shut down
160
+
* Secondary adapter groups are created when demand exceeds primary capacity
161
+
* When scaling down, only secondary adapter groups are shut down
184
162
185
163
186
164
Worker Adapter Protocol
187
165
-----------------------
188
166
189
-
Scaling controllers communicate with worker adapters via HTTP POST requests to a webhook URL. The adapter must implement the following actions:
190
-
191
-
**Get Adapter Info:**
192
-
193
-
Request:
194
-
195
-
.. code:: json
196
-
197
-
{"action": "get_worker_adapter_info"}
198
-
199
-
Response:
200
-
201
-
.. code:: json
202
-
203
-
{
204
-
"max_worker_groups": 10
205
-
}
206
-
207
-
**Start Worker Group:**
208
-
209
-
Request:
210
-
211
-
.. code:: json
212
-
213
-
{
214
-
"action": "start_worker_group",
215
-
"capabilities": {"gpu": 1}
216
-
}
217
-
218
-
Response (success - HTTP 200):
219
-
220
-
.. code:: json
221
-
222
-
{
223
-
"worker_group_id": "group-abc123",
224
-
"worker_ids": ["worker-1", "worker-2"],
225
-
"capabilities": {"gpu": 1}
226
-
}
227
-
228
-
Response (capacity exceeded - HTTP 429):
167
+
Scaling controllers, running within the scheduler process, communicate with worker adapters using Cap'n Proto messages through the connection that worker adapters use to communicate with the scheduler. The protocol uses the following message types:
3. **Monitor scaling events**: Use Scaler's monitoring tools (``scaler_top``) to observe scaling behavior and tune policies.
278
221
222
+
4. **Worker Adapter Placement**: Run worker adapters on machines that can provision the required resources (e.g., run the ECS adapter where it has AWS credentials, run the native adapter on the target machine).
0 commit comments