Skip to content
Draft
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
4506147
Implement ORB Worker Adapter
magniloquency Feb 10, 2026
783c10c
Add submit_tasks example to documentation and CI skip list
magniloquency Feb 10, 2026
9133f52
Merge branch 'main' into orb
magniloquency Feb 11, 2026
a1ff776
move orb/ and ami/ to driver/
magniloquency Feb 11, 2026
a46af96
remove superfluous checks in orb config
magniloquency Feb 11, 2026
e8ae3fc
move import
magniloquency Feb 11, 2026
2312d66
add a check for self._orb before returning machines
magniloquency Feb 11, 2026
89d964b
adjust security group rules
magniloquency Feb 11, 2026
248f2f7
move method
magniloquency Feb 11, 2026
0333ee7
rename no random worker ids to deterministic worker ids
magniloquency Feb 11, 2026
ce17847
add comment
magniloquency Feb 11, 2026
2796ead
run submit tasks in ci
magniloquency Feb 11, 2026
389a7b6
fix help text
magniloquency Feb 11, 2026
9eae58d
make _filter_data a static method
magniloquency Feb 11, 2026
d333899
refactor orb worker adapter polling to use constants
magniloquency Feb 11, 2026
562ec6a
flake8
magniloquency Feb 11, 2026
7a61c0e
don't touch skip examples file
magniloquency Feb 11, 2026
594342d
Merge branch 'main' of https://github.com/finos/opengris-scaler into orb
magniloquency Feb 12, 2026
350e840
bump minor version
magniloquency Feb 12, 2026
8ad0a61
Merge branch 'main' into orb
magniloquency Feb 13, 2026
a2d3856
docs: add worker adapter tutorials and update ORB integration details
magniloquency Feb 13, 2026
0329716
Merge branch 'orb' of https://github.com/magniloquency/scaler into orb
magniloquency Feb 13, 2026
c50fddb
refactor: move orb and ami drivers to src/scaler/drivers
magniloquency Feb 13, 2026
90f608a
Delete test.toml
magniloquency Feb 13, 2026
711e9e9
Merge branch 'main' into orb
magniloquency Feb 18, 2026
f789743
Refactor ORB worker adapter to use ZMQ-based protocol
magniloquency Feb 18, 2026
287cd7c
fix type error
magniloquency Feb 18, 2026
1235fae
fix type error
magniloquency Feb 18, 2026
8fda46c
Output ORB templates in snake_case
magniloquency Feb 20, 2026
5c5a76e
Simplify ORB driver config files
magniloquency Feb 20, 2026
2130431
Document default no-scaling policy and vanilla scaler example in work…
magniloquency Feb 20, 2026
2b54a3a
Merge branch 'main' into orb
sharpener6 Feb 23, 2026
4956370
Merge branch 'main' into orb
sharpener6 Feb 23, 2026
31de2fd
import documentation changes from #574
magniloquency Feb 24, 2026
8d893fe
Merge branch 'main' into orb
magniloquency Feb 25, 2026
759f21a
Merge main into orb, adopting worker_manager naming convention
magniloquency Mar 4, 2026
2928bef
Refactor ORB worker adapter to worker_manager_adapter naming and impr…
magniloquency Mar 4, 2026
d31f8ad
Refactor ORB config handling and simplify worker manager
magniloquency Mar 4, 2026
830d35f
Rename run_worker_adapter_orb to run_worker_manager_orb
magniloquency Mar 4, 2026
da6338c
Fix ORB template missing instance_types and broken region injection
magniloquency Mar 4, 2026
940f0d0
Use subnet_ids list field instead of subnet_id in ORB template
magniloquency Mar 4, 2026
ef54411
Add name field to ORBMachine to fix TypeError on deserialization
magniloquency Mar 4, 2026
e90f85c
Filter unknown keys when deserializing ORBMachine from dict
magniloquency Mar 4, 2026
15f3c42
Fix duplicate commands sent to adapter while previous command is in-f…
magniloquency Mar 4, 2026
1e931e1
Merge origin/main into orb, resolving conflicts
magniloquency Mar 13, 2026
31af0c7
upgrade to orb 1.2
magniloquency Mar 13, 2026
2e591d8
Migrate ORB worker manager from CLI subprocess to Python SDK
magniloquency Mar 13, 2026
1d50e13
Remove unused ORBMachine and ORBRequest types
magniloquency Mar 13, 2026
9948bf6
Fix WorkerAdapterConfig -> WorkerManagerConfig rename
magniloquency Mar 13, 2026
342735e
Work around ORB SDK app_config timing bug by writing temp config file
magniloquency Mar 13, 2026
983589f
Use ORB_CONFIG_DIR env var to inject config into ORB singleton
magniloquency Mar 13, 2026
c086599
Add template_id, image_id, provider_api to configuration dict for ORB…
magniloquency Mar 13, 2026
721acc5
Switch ORB storage from sql to json (SQLQueryBuilder is abstract in i…
magniloquency Mar 13, 2026
b2235e7
Monkey-patch ORB TemplateRepositoryImpl.get_by_id to accept plain str
magniloquency Mar 13, 2026
380831d
Fix ORB 1.2.2 missing add() method on TemplateRepositoryImpl
magniloquency Mar 13, 2026
67d1710
Patch Template.get_domain_events/clear_domain_events missing in ORB 1…
magniloquency Mar 13, 2026
d77718c
Merge origin/main into orb, resolving conflicts
magniloquency Mar 16, 2026
071d4bc
Upgrade ORB dependency to 1.3 and adopt context-manager SDK API
magniloquency Mar 17, 2026
feee439
Update ORB worker manager adapter for post-WorkerGroup protocol
magniloquency Mar 17, 2026
e785ca9
Add opengris-scaler 1.15.0 AMI and move packer files to orb adapter d…
magniloquency Mar 17, 2026
af01940
Fix ORB create_template call to use flat kwargs instead of nested con…
magniloquency Mar 17, 2026
89e01a7
Add validate_template call and logging after create_template in ORB s…
magniloquency Mar 17, 2026
1c3821e
Rename _worker_groups to _workers in ORB worker adapter
magniloquency Mar 17, 2026
6b69334
Remove hardcoded --num-of-workers from ORB cluster launch script
magniloquency Mar 17, 2026
949348e
Remove inaccurate worker ID tracking comment from ORB cluster launch …
magniloquency Mar 17, 2026
321ed08
Remove hardcoded attribute metadata from ORB create_template call
magniloquency Mar 17, 2026
90633b7
Merge main into orb branch
magniloquency Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/actions/run-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,13 @@ runs:
run: |
uv pip install --system -r examples/applications/requirements_applications.txt
uv pip install --system -r examples/ray_compat/requirements.txt
readarray -t skip_examples < examples/skip_examples.txt
for example in "./examples"/*.py; do
filename=$(basename "$example")
if [[ " ${skip_examples[*]} " =~ [[:space:]]${filename}[[:space:]] ]]; then
echo "Skipping $example"
continue
fi
echo "Running $example"
python $example
done
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,6 @@ CMakeFiles/
.vs/
src/scaler/protocol/capnp/*.c++
src/scaler/protocol/capnp/*.h

orb/logs/
orb/metrics/
32 changes: 27 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,8 @@ The following table maps each Scaler command to its corresponding section name i
| `scaler_worker_adapter_native` | `[native_worker_adapter]` |
| `scaler_worker_adapter_fixed_native` | `[fixed_native_worker_adapter]` |
| `scaler_worker_adapter_symphony` | `[symphony_worker_adapter]` |
| `scaler_worker_adapter_orb` | `[orb_worker_adapter]` |
| `scaler_worker_adapter_ecs` | `[ecs_worker_adapter]` |

### Practical Scenarios & Examples

Expand Down Expand Up @@ -381,7 +383,7 @@ might be added in the future.
A Scaler scheduler can interface with IBM Spectrum Symphony to provide distributed computing across Symphony clusters.

```bash
$ scaler_worker_adapter_symphony tcp://127.0.0.1:2345 --service-name ScalerService --base-concurrency 4 --host 127.0.0.1 --port 8080
$ scaler_worker_adapter_symphony tcp://127.0.0.1:2345 --service-name ScalerService --base-concurrency 4 --adapter-web-host 127.0.0.1 --adapter-web-port 8080
```

This will start a Scaler worker that connects to the Scaler scheduler at `tcp://127.0.0.1:2345` and uses the Symphony
Expand Down Expand Up @@ -466,6 +468,26 @@ where `deepest_nesting_level` is the deepest nesting level a task has in your wo
workload that has
a base task that calls a nested task that calls another nested task, then the deepest nesting level is 2.

## ORB (AWS EC2) integration

A Scaler scheduler can interface with ORB (Open Resource Broker) to dynamically provision and manage workers on AWS EC2 instances.

```bash
$ scaler_worker_adapter_orb tcp://127.0.0.1:2345 --adapter-web-host 0.0.0.0 --adapter-web-port 8080 --image-id ami-0528819f94f4f5fa5
```

This will start an ORB worker adapter that connects to the Scaler scheduler at `tcp://127.0.0.1:2345`. The scheduler can then request new workers from this adapter, which will be launched as EC2 instances.

### Configuration

The ORB adapter requires `orb-py` and `boto3` to be installed. You can install them with:

```bash
$ pip install "opengris-scaler[orb]"
```

For more details on configuring ORB, including AWS credentials and instance templates, please refer to the [ORB Worker Adapter documentation](https://finos.github.io/opengris-scaler/tutorials/worker_adapters/orb.html).

## Worker Adapter usage

> **Note**: This feature is experimental and may change in future releases.
Expand All @@ -481,13 +503,13 @@ specification [here](https://github.com/finos/opengris).
Starting a Native Worker Adapter server at `http://127.0.0.1:8080`:

```bash
$ scaler_worker_adapter_native tcp://127.0.0.1:2345 --host 127.0.0.1 --port 8080
$ scaler_worker_adapter_native tcp://127.0.0.1:2345 --adapter-web-host 127.0.0.1 --adapter-web-port 8080
```

Pass the `--adapter-webhook-url` argument to the Scaler scheduler to connect to the Worker Adapter:
Pass the `--adapter-webhook-urls` argument to the Scaler scheduler to connect to the Worker Adapter:

```bash
$ scaler_scheduler tcp://127.0.0.1:2345 --adapter-webhook-url http://127.0.0.1:8080
$ scaler_scheduler tcp://127.0.0.1:2345 --adapter-webhook-urls http://127.0.0.1:8080
````

To check that the Worker Adapter is working, you can bring up `scaler_top` to see workers spawning and terminating as
Expand Down Expand Up @@ -567,7 +589,7 @@ W|Linux|15943|a7fe8b5e+ 0.0% 30.7m 0.0% 28.3m 1000 0 0 |
`scaler_ui` provides a web monitoring interface for Scaler.

```bash
$ scaler_ui tcp://127.0.0.1:2347 --port 8081
$ scaler_ui tcp://127.0.0.1:2347 --web-port 8081
```

This will open a web server on port `8081`.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,5 @@


# -- Extension configuration -------------------------------------------------

autosectionlabel_prefix_document = True
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Content
tutorials/quickstart
tutorials/features
tutorials/scaling
tutorials/worker_adapters/index
tutorials/compatibility/ray
tutorials/configuration
tutorials/examples
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/compatibility/ray.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Ray Compatibility Layer
Scaler is a lightweight distributed computation engine similar to Ray. Scaler supports many of the same concepts as Ray including
remote functions (known as tasks in Scaler), futures, cluster object storage, labels (known as capabilities in Scaler), and it comes with comparable monitoring tools.

Unlike Ray, Scaler supports both local clusters and also easily integrates with multiple cloud providers out of the box, including AWS EC2 and IBM Symphony,
Unlike Ray, Scaler supports both local clusters and also easily integrates with multiple cloud providers out of the box, including ORB (AWS EC2) and IBM Symphony,
with more providers planned for the future. You can view our `roadmap on GitHub <https://github.com/finos/opengris-scaler/discussions/333>`_
for details on upcoming cloud integrations.

Expand Down
9 changes: 9 additions & 0 deletions docs/source/tutorials/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,13 @@ Or through the programmatic API:
death_timeout_seconds=300,
)

Worker Adapter Settings
-----------------------

Worker adapters share many common configuration settings for networking, worker behavior, and logging.

For a full list of these settings, see the :doc:`Worker Adapter Common Parameters <worker_adapters/common_parameters>` documentation.

Configuring with TOML Files
---------------------------

Expand Down Expand Up @@ -192,6 +199,8 @@ The following table maps each Scaler command to its corresponding section name i
- ``[fixed_native_worker_adapter]``
* - ``scaler_worker_adapter_symphony``
- ``[symphony_worker_adapter]``
* - ``scaler_worker_adapter_orb``
- ``[orb_worker_adapter]``
* - ``scaler_worker_adapter_ecs``
- ``[ecs_worker_adapter]``

Expand Down
8 changes: 8 additions & 0 deletions docs/source/tutorials/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@ Shows how to send a basic task to scheduler
.. literalinclude:: ../../../examples/simple_client.py
:language: python

Submit Tasks
~~~~~~~~~~~~

Shows various ways to submit tasks (submit, map, starmap)

.. literalinclude:: ../../../examples/submit_tasks.py
:language: python

Client Mapping Tasks
~~~~~~~~~~~~~~~~~~~~

Expand Down
45 changes: 45 additions & 0 deletions docs/source/tutorials/worker_adapters/common_parameters.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Common Worker Adapter Parameters
================================

All worker adapters in Scaler share a set of common configuration parameters for connecting to the cluster, configuring the internal web server, and managing worker behavior.

.. note::
For more details on how to configure Scaler, see the :doc:`../configuration` section.

Worker Adapter Common Configuration
-----------------------------------

* ``scheduler_address`` (Positional, Required): The address of the scheduler that workers should connect to (e.g., ``tcp://127.0.0.1:8516``).
* ``--max-workers`` (``-mw``): Maximum number of workers that can be started (default: ``CPU_COUNT - 1``). Set to ``-1`` for no limit.
* ``--object-storage-address`` (``-osa``): Address of the object storage server (e.g., ``tcp://127.0.0.1:8517``).

Web Configuration
-----------------

Each worker adapter runs an internal HTTP server that the scheduler uses to send scaling requests.

* ``--adapter-web-host`` (Required): Host address for the worker adapter HTTP server (e.g., ``0.0.0.0``).
* ``--adapter-web-port`` (``-p``, Required): Port for the worker adapter HTTP server.

Worker Configuration (Passed to Workers)
----------------------------------------

These parameters are passed to the individual worker processes started by the adapter.

* ``--per-worker-task-queue-size`` (``-wtqs``): Set the task queue size per worker (default: 1000).
* ``--heartbeat-interval-seconds`` (``-his``): The interval at which workers send heartbeats to the scheduler (default: 2).
* ``--task-timeout-seconds`` (``-tts``): Number of seconds before a task is considered timed out. 0 means no timeout (default: 0).
* ``--death-timeout-seconds`` (``-dts``): Number of seconds before a worker is considered dead if no heartbeat is received (default: 300).
* ``--garbage-collect-interval-seconds`` (``-gc``): Interval at which the worker runs its garbage collector (default: 30).
* ``--trim-memory-threshold-bytes`` (``-tm``): Threshold for trimming libc's memory (default: 1073741824, i.e., 1GB).
* ``--hard-processor-suspend`` (``-hps``): When set, suspends worker processors using the SIGTSTP signal instead of a synchronization event.
* ``--worker-io-threads`` (``-wit``): Set the number of IO threads for the IO backend per worker (default: 1).
* ``--per-worker-capabilities`` (``-pwc``): A comma-separated list of capabilities provided by the workers (e.g., ``"linux,cpu=4"``).

Logging and Event Loop
----------------------

* ``--event-loop`` (``-el``): Select the event loop type (e.g., ``builtin``, ``uvloop``).
* ``--logging-level`` (``-ll``): Set the logging level (e.g., ``DEBUG``, ``INFO``, ``WARNING``, ``ERROR``, ``CRITICAL``).
* ``--logging-paths`` (``-lp``): List of paths where the logs should be written. Defaults to ``/dev/stdout``.
* ``--logging-config-file`` (``-lcf``): Path to a Python logging configuration file (``.conf`` format).
73 changes: 73 additions & 0 deletions docs/source/tutorials/worker_adapters/fixed_native.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
Fixed Native Worker Adapter
===========================

The Fixed Native worker adapter spawns a fixed number of worker subprocesses at startup. Unlike other worker adapters, it does **not** support dynamic scaling. It is useful for environments where you want a static pool of workers to be available immediately and do not want the system to dynamically adjust the number of processes.

Getting Started
---------------

To start the Fixed Native worker adapter, use the ``scaler_worker_adapter_fixed_native`` command.

Example command:

.. code-block:: bash

scaler_worker_adapter_fixed_native tcp://<SCHEDULER_IP>:8516 \
--adapter-web-host 0.0.0.0 \
--adapter-web-port 8080 \
--max-workers 8 \
--logging-level INFO \
--task-timeout-seconds 60

Equivalent configuration using a TOML file:

.. code-block:: bash

scaler_worker_adapter_fixed_native tcp://<SCHEDULER_IP>:8516 --config config.toml

.. code-block:: toml

# config.toml

[fixed_native_worker_adapter]
adapter_web_host = "0.0.0.0"
adapter_web_port = 8080
max_workers = 8
logging_level = "INFO"
task_timeout_seconds = 60

* ``tcp://<SCHEDULER_IP>:8516`` is the address workers will use to connect to the scheduler.
* The adapter HTTP server (webhook) will listen on port 8080.
* The adapter will immediately spawn 8 worker subprocesses at startup and maintain them.

How it Works
------------

Upon startup, the Fixed Native worker adapter spawns the number of workers specified by ``--max-workers``. It reports its capacity to the scheduler as 0 to prevent the scheduler from attempting to scale it up or down dynamically.

If a worker process terminates, the adapter does not automatically restart it (in the current implementation).

Integration with Cluster Classes
--------------------------------

The Fixed Native worker adapter is the underlying component used by the high-level ``Cluster`` and ``SchedulerClusterCombo`` classes. When you use ``SchedulerClusterCombo(n_workers=N)``, Scaler starts a ``Cluster`` process, which in turn uses a ``FixedNativeWorkerAdapter`` to spawn and manage ``N`` local worker subprocesses.

Supported Parameters
--------------------

.. note::
For more details on how to configure Scaler, see the :doc:`../configuration` section.

The Fixed Native worker adapter supports the following specific configuration parameters in addition to the common worker adapter parameters.

Fixed Native Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~

* ``--max-workers`` (``-mw``): The exact number of worker subprocesses to spawn at startup. Must be a non-negative integer.
* ``--preload``: Python module or script to preload in each worker process before it starts accepting tasks.
* ``--worker-io-threads`` (``-wit``): Number of IO threads for the IO backend per worker (default: ``1``).

Common Parameters
~~~~~~~~~~~~~~~~~

For a full list of common parameters including networking, worker configuration, and logging, see :doc:`common_parameters`.
40 changes: 40 additions & 0 deletions docs/source/tutorials/worker_adapters/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Worker Adapters
===============

Worker Adapters are components in Scaler that handle the actual provisioning and destruction of worker resources. They act as the bridge between Scaler's scaling policies and various infrastructure providers (e.g., local processes, cloud instances, container orchestrators).

.. note::
For more details on how to configure Scaler, see the :doc:`../configuration` section.

.. toctree::
:maxdepth: 1

native
fixed_native
orb
common_parameters

Adapter Overviews
-----------------

Scaler provides several worker adapters to support different execution environments.

Native
~~~~~~

The :doc:`Native <native>` worker adapter allows Scaler to dynamically provision workers as local subprocesses on the same machine. It is the simplest way to scale workloads across multiple CPU cores locally and supports dynamic auto-scaling.

Fixed Native
~~~~~~~~~~~~

The :doc:`Fixed Native <fixed_native>` worker adapter spawns a static number of worker subprocesses at startup and does not support dynamic scaling. It is the underlying component used by the high-level ``Cluster`` and ``SchedulerClusterCombo`` classes.

ORB (AWS EC2)
~~~~~~~~~~~~~

The :doc:`ORB <orb>` worker adapter allows Scaler to dynamically provision workers on AWS EC2 instances. This is ideal for scaling workloads that require significant cloud compute resources or specialized hardware like GPUs.

Common Parameters
~~~~~~~~~~~~~~~~~

All worker adapters share a set of :doc:`common configuration parameters <common_parameters>` for networking, worker behavior, and logging.
70 changes: 70 additions & 0 deletions docs/source/tutorials/worker_adapters/native.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
Native Worker Adapter
=====================

The Native worker adapter allows Scaler to dynamically provision workers as local subprocesses on the same machine where the adapter is running. This is the simplest way to scale Scaler workloads across multiple CPU cores on a single machine or a group of machines.

Getting Started
---------------

To start the Native worker adapter, use the ``scaler_worker_adapter_native`` command.

Example command:

.. code-block:: bash

scaler_worker_adapter_native tcp://<SCHEDULER_IP>:8516 \
--adapter-web-host 0.0.0.0 \
--adapter-web-port 8080 \
--max-workers 4 \
--logging-level INFO \
--task-timeout-seconds 60

Equivalent configuration using a TOML file:

.. code-block:: bash

scaler_worker_adapter_native tcp://<SCHEDULER_IP>:8516 --config config.toml

.. code-block:: toml

# config.toml

[native_worker_adapter]
adapter_web_host = "0.0.0.0"
adapter_web_port = 8080
max_workers = 4
logging_level = "INFO"
task_timeout_seconds = 60

* ``tcp://<SCHEDULER_IP>:8516`` is the address workers will use to connect to the scheduler.
* The adapter HTTP server (webhook) will listen on port 8080. The scheduler connects to this.
* The adapter can spawn up to 4 worker subprocesses.

How it Works
------------

When the scheduler determines that more capacity is needed, it sends a request to the Native worker adapter. The adapter then spawns a new worker process using the same Python interpreter and environment that started the adapter.

Each worker group managed by the Native adapter contains exactly one worker process.

Unlike the Fixed Native worker adapter, which spawns a static number of workers at startup, the Native worker adapter is designed to be used with Scaler's auto-scaling features to dynamically grow and shrink the local worker pool based on demand.

Supported Parameters
--------------------

.. note::
For more details on how to configure Scaler, see the :doc:`../configuration` section.

The Native worker adapter supports the following specific configuration parameters in addition to the common worker adapter parameters.

Native Configuration
~~~~~~~~~~~~~~~~~~~~

* ``--max-workers`` (``-mw``): Maximum number of worker subprocesses that can be started. Set to ``-1`` for no limit (default: ``-1``).
* ``--preload``: Python module or script to preload in each worker process before it starts accepting tasks.
* ``--worker-io-threads`` (``-wit``): Number of IO threads for the IO backend per worker (default: ``1``).

Common Parameters
~~~~~~~~~~~~~~~~~

For a full list of common parameters including networking, worker configuration, and logging, see :doc:`common_parameters`.
Loading
Loading