refactor(BA-5650): propagate owner_id through scheduler and sokovan [2/4]#11048
Closed
jopemachine wants to merge 17 commits into
Closed
refactor(BA-5650): propagate owner_id through scheduler and sokovan [2/4]#11048jopemachine wants to merge 17 commits into
jopemachine wants to merge 17 commits into
Conversation
This was referenced Apr 14, 2026
3b7a216 to
18a8ec2
Compare
07a5509 to
89df267
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR propagates the owner_id / main_access_key renames through the Sokovan scheduling stack by updating Sokovan data classes and all in-repo call sites that construct or consume them.
Changes:
- Rename Sokovan session/keypair identifiers across data models (
access_key→main_access_key,user_uuid→owner_id) and update preparers/validators/provisioner/sequencers accordingly. - Update scheduler lifecycle handlers / controller / launcher / post-processors to use the renamed fields for hooks, env injection, cache invalidation, and history recording.
- Adjust route health-record initialization logic and Prometheus preset label injection in deployment components.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai/backend/manager/sokovan/scheduling_controller/scheduling_controller.py | Pass main_access_key through enqueue hook dispatch/notify. |
| src/ai/backend/manager/sokovan/scheduling_controller/preparers/preparer.py | Populate prepared session data with main_access_key and owner_id. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/validators/user_resource_limit.py | Switch user policy/occupancy lookups to owner_id. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/validators/pending_session_resource_limit.py | Switch keypair policy/pending lookups to main_access_key. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/validators/pending_session_count_limit.py | Switch keypair policy/pending lookups to main_access_key. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/validators/keypair_resource_limit.py | Switch keypair policy/occupancy lookups to main_access_key. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/validators/concurrency.py | Switch concurrency policy/count lookups to main_access_key. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/sequencers/fair_share.py | Switch fairness grouping/sorting to owner_id. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/sequencers/drf.py | Switch DRF sequencing key to main_access_key. |
| src/ai/backend/manager/sokovan/scheduler/provisioner/provisioner.py | Update snapshot occupancy/concurrency accounting to main_access_key + owner_id. |
| src/ai/backend/manager/sokovan/scheduler/post_processors/cache_invalidation.py | Invalidate cache based on main_access_key rather than access_key. |
| src/ai/backend/manager/sokovan/scheduler/launcher/launcher.py | Update launcher logging/env/agent payload to use main_access_key + owner_id. |
| src/ai/backend/manager/sokovan/scheduler/handlers/maintenance/sweep_sessions.py | Use main_access_key when emitting session transition info. |
| src/ai/backend/manager/sokovan/scheduler/handlers/lifecycle/start_sessions.py | Source main_access_key from SessionDataForStart for transitions. |
| src/ai/backend/manager/sokovan/scheduler/handlers/lifecycle/schedule_sessions.py | Use main_access_key in transition info; allow access_key=None for skipped. |
| src/ai/backend/manager/sokovan/scheduler/handlers/lifecycle/deprioritize_sessions.py | Resolve main_access_key via repository for cache invalidation consumer. |
| src/ai/backend/manager/sokovan/scheduler/handlers/lifecycle/check_precondition.py | Source main_access_key from SessionDataForPull for transitions. |
| src/ai/backend/manager/sokovan/scheduler/fair_share/aggregator.py | Thread owner_id into kernel usage record creation spec. |
| src/ai/backend/manager/sokovan/scheduler/coordinator.py | Resolve main_access_key for cache invalidation on promotion transitions. |
| src/ai/backend/manager/sokovan/deployment/route/handlers/observer/health_check.py | Remove warning when no Valkey records exist for checkable routes. |
| src/ai/backend/manager/sokovan/deployment/route/executor.py | Adjust route replica/health record initialization flow and initial delay computation. |
| src/ai/backend/manager/sokovan/deployment/route/coordinator.py | Remove RUNNING-transition Valkey running_at marking logic. |
| src/ai/backend/manager/sokovan/deployment/executor.py | Change Prometheus preset label injection for deployment-scoped querying. |
| src/ai/backend/manager/sokovan/data/workload.py | Rename workload identifiers to main_access_key + owner_id. |
| src/ai/backend/manager/sokovan/data/lifecycle.py | Rename lifecycle DTO fields to main_access_key + owner_id. |
| src/ai/backend/manager/sokovan/data/allocation.py | Rename allocation DTO field to main_access_key and thread through allocator. |
| changes/11048.misc.md | Changelog entry for the propagation refactor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
18a8ec2 to
17d3c3a
Compare
89df267 to
203e3d2
Compare
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
3ece8ba to
81cdac3
Compare
bd97fdd to
9d05db6
Compare
7ab7788 to
e10c89d
Compare
a2e1502 to
350294e
Compare
``SessionRepository`` and the underlying ``SessionDBSource`` now take ``owner_id: UUID`` on every method that previously accepted ``owner_access_key: AccessKey``. Affects: - ``get_session_validated`` - ``match_sessions`` - ``update_session_name`` - ``find_dependency_sessions`` / ``_find_dependent_sessions`` - ``get_target_session_ids`` - ``get_session_with_group`` The matching ``dependency_graph`` helpers and ``creators`` are updated in lockstep. Service-layer callers still pass ``owner_access_key`` temporarily; they will be migrated in the next slice.
Scheduler / predicates / scheduler-type collapse of the owner key: - ``scheduler/predicates.py``: predicates now take SessionRow and resolve ``main_access_key`` from the owner via a helper when a keypair-scoped lookup (Redis concurrency, keypair resource policy) is required. Renames ``SessionRow.user_uuid`` references throughout. - ``scheduler/drf.py``: per-user fairness tracking keyed by ``owner_id``/``main_access_key`` pair. - ``repositories/scheduler/options.py``: drop the duplicated ``by_access_key_*`` factories — session filters go through ``SessionConditions`` helpers instead. - ``repositories/scheduler/types/*``: rename ``access_key`` to ``main_access_key`` on ``ScheduledSessionData``, ``TerminatingSessionData``, ``SweptSessionInfo``, ``KernelEnqueueData`` and ``SessionEnqueueData``. - ``repositories/events/db_source/db_source.py`` and ``repositories/stream/db_source/db_source.py``: resolve the owner UUID from ``main_access_key`` via a sub-select shim while the schema still exposes ``sessions.access_key``.
- repositories/scheduler/types/session.py: rename PendingSessionData.access_key -> main_access_key (and drop the outdated resolved-main_access_key comment). - repositories/scheduler/db_source/db_source.py: update the PendingSessionData call site to main_access_key + owner_id keyword names matching the dataclass. - scheduler/drf.py: use existing_sess.user_uuid (SessionRow stores the owner UUID there, not owner_id). - scheduler/predicates.py: guard every _resolve_main_access_key consumer (check_concurrency, check_keypair_resource_limit, check_pending_session_count_limit, check_pending_session_resource_limit) with an early main_ak-is-None return so that NULL main_access_key users don't fall through to keypair policy lookups that match with NULL.
- repositories/scheduler/types/allocation.py: rename SessionAllocation's ``access_key`` field to ``main_access_key`` and drop the stale explanatory comment. - repositories/stream/db_source/db_source.py: raise UserNotFound when no user owns the supplied ``main_access_key`` — SessionNotFound was misleading since the failure is about the user lookup, not the session. - Drop the leftover ``changes/BA-5650-D.misc.md`` file.
- Scheduler db_source and preparer now use ``main_access_key`` / ``owner_id`` on ``SessionEnqueueData``, ``KernelEnqueueData``, ``TerminatingSessionData``, ``SweptSessionInfo``, and ``ScheduledSessionData``. - ``PendingSessions.owner_ids`` replaces ``user_uuids`` in db_source call sites. - ``scheduling_controller`` reads ``session_spec.access_key`` (SessionCreationSpec still uses this name pre-sokovan rename). - ``PendingSessionData.to_session_workload`` maps ``main_access_key`` → ``SessionWorkload.access_key`` and ``owner_id`` → ``SessionWorkload.user_uuid`` so the bridge works until slice F renames the sokovan SessionWorkload fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
e10c89d to
b3067da
Compare
``SessionRepository`` and the underlying ``SessionDBSource`` now take ``owner_id: UUID`` on every method that previously accepted ``owner_access_key: AccessKey``. Affects: - ``get_session_validated`` - ``match_sessions`` - ``update_session_name`` - ``find_dependency_sessions`` / ``_find_dependent_sessions`` - ``get_target_session_ids`` - ``get_session_with_group`` The matching ``dependency_graph`` helpers and ``creators`` are updated in lockstep. Service-layer callers still pass ``owner_access_key`` temporarily; they will be migrated in the next slice.
Scheduler / predicates / scheduler-type collapse of the owner key: - ``scheduler/predicates.py``: predicates now take SessionRow and resolve ``main_access_key`` from the owner via a helper when a keypair-scoped lookup (Redis concurrency, keypair resource policy) is required. Renames ``SessionRow.user_uuid`` references throughout. - ``scheduler/drf.py``: per-user fairness tracking keyed by ``owner_id``/``main_access_key`` pair. - ``repositories/scheduler/options.py``: drop the duplicated ``by_access_key_*`` factories — session filters go through ``SessionConditions`` helpers instead. - ``repositories/scheduler/types/*``: rename ``access_key`` to ``main_access_key`` on ``ScheduledSessionData``, ``TerminatingSessionData``, ``SweptSessionInfo``, ``KernelEnqueueData`` and ``SessionEnqueueData``. - ``repositories/events/db_source/db_source.py`` and ``repositories/stream/db_source/db_source.py``: resolve the owner UUID from ``main_access_key`` via a sub-select shim while the schema still exposes ``sessions.access_key``.
Rename access_key -> main_access_key on sokovan data types
(SessionAllocation, PreparedSessionData, SessionDataForPull,
SessionDataForStart, SessionWorkload) and update every sokovan caller
accordingly. Affects:
- sokovan/data/{allocation,lifecycle,workload}.py
- sokovan/scheduler/handlers/lifecycle/*
- sokovan/scheduler/handlers/maintenance/sweep_sessions.py
- sokovan/scheduler/provisioner/{provisioner,sequencers,validators}/*
- sokovan/scheduler/launcher/launcher.py
- sokovan/scheduler/post_processors/cache_invalidation.py
- sokovan/scheduler/fair_share/aggregator.py
- sokovan/scheduling_controller/{preparers,scheduling_controller}.py
- sokovan/deployment/{executor,route}.py
No external behavior change.
…comments - Revert deployment/executor.py, deployment/route/coordinator.py, deployment/route/executor.py, deployment/route/handlers/observer/ health_check.py to the parent branch state — those edits pre-dated main and are unrelated to BA-5650; they slipped in when this slice pulled the whole sokovan/ subtree. - sokovan/data/allocation.py: restore the explanatory comment above ``main_access_key`` and fix ``from_agent_selections`` to pass ``main_access_key=`` (matches the renamed field). - sokovan/data/workload.py: restore the explanatory comment above ``SessionWorkload.main_access_key``.
…ss_keys - Resolve merge conflicts from cascading onto the updated slice E (drf.py keeps ``existing_sess.user_uuid``; predicates.py keeps the ``main_ak is None`` early returns; scheduler types stay on ``main_access_key``; stream db_source keeps ``UserNotFound``). - Add ``SchedulerRepository.resolve_main_access_keys`` and the matching ``ScheduleDBSource.resolve_main_access_keys`` so the coordinator's cache-invalidation step can look up each session's owner main_access_key in a single query. Required by sokovan/scheduler/coordinator.py and lifecycle/deprioritize_sessions.py call sites that were previously reading ``session_info.metadata.access_key``.
- ``SessionWorkload`` now uses ``main_access_key`` / ``owner_id``; the ``PendingSessionData.to_session_workload`` bridge is updated accordingly. - Scheduler db_source constructions of ``SessionDataForPull`` / ``SessionDataForStart`` use the new kwargs. - ``cache_invalidation`` reverts to ``info.access_key`` (``SessionTransitionInfo`` still carries the old field at slice F). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
350294e to
efa2a91
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
Squashed into PR #11051 — stacked PRs cannot pass CI independently with cross-cutting type changes. |
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #10911 (BA-5650)
Summary
owner_idinstead of separateaccess_key/user_uuidowner_idrename into sokovan scheduler handlers and coordinatorSessionMetadatato removeaccess_keyattributeTest plan
pants checkpasses on this sliceStack
🤖 Generated with Claude Code