Skip to content

Remove Cluster class#582

Open
magniloquency wants to merge 11 commits intofinos:mainfrom
magniloquency:remove-cluster-class
Open

Remove Cluster class#582
magniloquency wants to merge 11 commits intofinos:mainfrom
magniloquency:remove-cluster-class

Conversation

@magniloquency
Copy link
Contributor

@magniloquency magniloquency commented Mar 3, 2026

Summary

  • Removes the Cluster class (multiprocessing.Process wrapping FixedNativeWorkerAdapter) and its associated ClusterConfig, eliminating an unnecessary intermediate subprocess layer
  • SchedulerClusterCombo, tests, and examples updated to use FixedNativeWorkerAdapter directly; workers are now direct children of the combo process instead of grandchildren
  • scaler_cluster entry point and run_cluster.py preserved for backwards compatibility, redirecting to the fixed native worker adapter; --num-of-workers accepted as an alias for --max-workers
  • SIGINT/SIGTERM handling added to the fixed native adapter entry point
  • ECS adapter updated to use scaler_cluster with corrected parameters (--max-workers replaces --num-of-workers/--worker-names; worker IDs are no longer pre-announced since workers self-assign UUIDs)
    • Even before, the cluster did not use the worker names to actually name the workers, and only used the list to control the number of workers spawned
  • Documentation updated: removed stale ClusterProcess/Worker[N] startup log output from quickstart, updated --num-of-workers references to canonical --max-workers, and corrected the Fixed Native adapter description

Breaking Changes

  • --worker-names CLI flag dropped: The scaler_cluster entry point no longer accepts --worker-names. Passing it will now result in an unrecognized argument error. This flag was previously accepted but had no effect on actual worker naming (workers were identified by generated IDs regardless). Users relying on it should simply remove it from their invocations.
  • Cluster and ClusterConfig removed from public API: from scaler import Cluster and from scaler.config.section.cluster import ClusterConfig will no longer work. Use FixedNativeWorkerAdapter and FixedNativeWorkerAdapterConfig directly.

ECS adapter: worker_ids removal

The old ECS adapter pre-announced worker IDs in the StartWorkerGroup response by generating random names (ECS|{uuid}) and computing IDs from them. However, ECS workers self-assign their own UUIDs on connect and are never told what names to use, so the pre-announced IDs never matched any real worker in the scheduler's information_snapshot. As a result, the scheduler's load-based group selection on shutdown (which sums queued_tasks across the group's worker IDs) always saw 0 tasks per group, making group selection arbitrary regardless.

Sending worker_ids=[] is functionally equivalent for all three scaling policies (vanilla, fixed_elastic, capability_scaling). For CapabilityScalingController it is slightly more correct: workers_in_group = 0 accurately reflects that no phantom workers are subtracted from remaining_worker_count when evaluating shutdown safety guards. The underlying limitation — that the ECS adapter cannot do per-group load-informed shutdown selection — is inherent to the architecture and pre-dates this PR.

Fixed native vs native

The fixed native (FN) and native adapters are quite similar, except that the FN adapter always spawns a fixed number of workers and does not support any API for scaling. I think that this makes the FN adapter incompatible with scaling controllers (managers?), and it also means that the FN entrypoint needs extra signal handling logic.

We could refactor the FN adapter to be more similar to the native adapter, however we have already discussed merging the two into one, and so I will refrain from this knowing that the FN adapter is likely to be replaced soon

Cluster was a multiprocessing.Process subclass that wrapped
FixedNativeWorkerAdapter in a subprocess with an asyncio event loop
solely to handle signals. This intermediate process layer is removed;
workers are now direct children of SchedulerClusterCombo.

- Delete Cluster, ClusterConfig, and the cluster entry point module
- Redirect scaler_cluster and run_cluster.py to the fixed native
  worker adapter entry point (with --num-of-workers alias for compat)
- Add SIGINT/SIGTERM handling to the fixed native adapter entry point
- Update SchedulerClusterCombo, tests, and examples to use
  FixedNativeWorkerAdapter directly
- Update ECS adapter to use scaler_cluster with updated parameters
  (--max-workers replaces --num-of-workers/--worker-names; worker IDs
  are no longer pre-announced since workers self-assign UUIDs)
@magniloquency magniloquency force-pushed the remove-cluster-class branch from a9f8f10 to b4ebe01 Compare March 3, 2026 01:39
Refactor fixed native entry point to accept a configurable section
name, so scaler_cluster reads [cluster] while
scaler_worker_adapter_fixed_native continues to read
[fixed_native_worker_adapter].
- Remove ClusterProcess/Worker startup console output from quickstart (no longer produced by FixedNativeWorkerAdapter)
- Update worker_adapters/index: Fixed Native is used by SchedulerClusterCombo, not Cluster
- Replace --num-of-workers/num_of_workers with canonical --max-workers/max_workers in configuration examples
@magniloquency magniloquency force-pushed the remove-cluster-class branch from fb59c21 to 575f361 Compare March 3, 2026 15:30
@sharpener6
Copy link
Collaborator

--worker-names was for cluster to separate different clusters, let's say you started 2 clusters of workers locally and connect to scheduler, on scaler_top you can easily look at which worker of cluster it connects to, did you tested this will work with multiple native fixed cluster on same machine?

@magniloquency
Copy link
Contributor Author

@sharpener6, the worker names aren't actually used for anything other than computing the # of workers. I'm not sure if that was intended, but this PR doesn't lose any behavior: https://github.com/search?q=repo%3Afinos%2Fopengris-scaler+worker_names+path%3A%2F%5Esrc%5C%2Fscaler%5C%2Fcluster%5C%2Fcluster.py%2F&type=code

@magniloquency magniloquency marked this pull request as ready for review March 3, 2026 15:58
sharpener6 and others added 2 commits March 3, 2026 15:45
Entry points were renamed from worker_adapter_* to worker_manager_*,
and run_cluster.py now imports from the cluster entry point directly.
magniloquency and others added 4 commits March 5, 2026 20:22
9668545 accidentally reverted the task_capabilities.py changes from
b4ebe01, re-introducing Cluster/ClusterConfig which this branch removes.
Restore the FixedNativeWorkerAdapter-based version from b4ebe01.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants