refactor(BA-5715): resolve owner_id via current_user() in services#11049
Closed
jopemachine wants to merge 10 commits into
Closed
refactor(BA-5715): resolve owner_id via current_user() in services#11049jopemachine wants to merge 10 commits into
jopemachine wants to merge 10 commits into
Conversation
This was referenced Apr 14, 2026
07a5509 to
89df267
Compare
ba76760 to
4303676
Compare
89df267 to
203e3d2
Compare
4303676 to
557b59a
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors session service flows to stop threading owner_id/owner_access_key through most read/control actions and instead derive the caller identity from the current_user() context, while keeping delegation support for a limited set of creation actions.
Changes:
- Replace many
action.owner_*usages inSessionServicewith_requester_user_id()(viacurrent_user()), and add helper to resolve delegated owner main access key. - Inject
UserRepositoryintoSessionLifecycleManagerand wire it through the AgentRegistry DI graph. - Update session action dataclasses and template override validation to use
owner_id(UUID) instead ofowner_access_key.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai/backend/manager/services/session/types.py | Switch template override schema field from owner_access_key to owner_id. |
| src/ai/backend/manager/services/session/service.py | Derive requester owner_id from current_user(), add main-access-key resolver, and update many service methods accordingly. |
| src/ai/backend/manager/services/session/lifecycle.py | Constructor-inject UserRepository and use it to resolve main access key for hook notifications. |
| src/ai/backend/manager/services/session/actions/*.py | Remove owner_access_key from many read/control action dataclasses; change create actions to carry owner_id. |
| src/ai/backend/manager/repositories/model_serving/repository.py | Adjust SessionRow.get_session call signature; remove model-definition refresh logic during modify_endpoint. |
| src/ai/backend/manager/registry.py | Wire UserRepository into SessionLifecycleManager; update SessionRow.get_session calls to use owner_id. |
| src/ai/backend/manager/dependencies/agents/registry.py | Add UserRepository to AgentRegistry dependency input and provider wiring. |
| src/ai/backend/manager/dependencies/agents/composer.py | Instantiate and provide UserRepository in the DI composition for AgentRegistry. |
| changes/11049.misc.md | Changelog entry for the refactor. |
Comments suppressed due to low confidence (1)
src/ai/backend/manager/repositories/model_serving/repository.py:889
- This change removes the model-definition refresh behavior during endpoint modification: the new revision is no longer populated with a refreshed
model_definition, so updates tomodel-definition.yamlin the vfolder won’t be picked up. There is an existing regression test (tests/unit/manager/repositories/model_serving/test_modify_endpoint_model_definition.py) that asserts this behavior. If this behavior is still required, reintroduce the refresh (or move it elsewhere); if it’s intentionally being removed, the test suite and any docs/semantics aroundmodify_endpointshould be updated accordingly.
# If revision-level fields changed, create a new revision
if spec.has_revision_changes():
current_rev = endpoint_row._find_current_revision()
if current_rev is None:
raise InvalidAPIParameters("Endpoint has no current revision")
# Resolve image if changed
image_id = current_rev.image
image_ref = spec.image.optional_value()
if image_ref is not None:
image_name = image_ref.name
arch = image_ref.architecture.value()
resolved_image = await ImageRow.resolve(
db_session,
[ImageIdentifier(image_name, arch), ImageAlias(image_name)],
)
image_id = resolved_image.id
# Merge resource_slots: spec overrides current revision
merged_slots: ResourceSlot = (
spec.resource_slots.optional_value()
or ResourceSlot({
r.slot_name: r.quantity for r in current_rev.resource_slot_rows
})
)
# Merge revision fields: current revision as base, spec overrides
new_revision = DeploymentRevisionRow(
endpoint=endpoint_row.id,
revision_number=0, # placeholder, will be set atomically below
image=image_id,
model=current_rev.model,
model_mount_destination=current_rev.model_mount_destination,
model_definition_path=(
spec.model_definition_path.optional_value()
if spec.model_definition_path.optional_value() is not None
else current_rev.model_definition_path
),
resource_group=endpoint_row.resource_group,
resource_opts=(
spec.resource_opts.optional_value()
if spec.resource_opts.optional_value() is not None
else current_rev.resource_opts
),
cluster_mode=(
spec.cluster_mode.value().value
if spec.cluster_mode.optional_value() is not None
else current_rev.cluster_mode
),
cluster_size=(
spec.cluster_size.optional_value() or current_rev.cluster_size
),
startup_command=current_rev.startup_command,
bootstrap_script=current_rev.bootstrap_script,
environ=(
spec.environ.optional_value()
if spec.environ.optional_value() is not None
else current_rev.environ
),
callback_url=current_rev.callback_url,
runtime_variant=(
spec.runtime_variant.optional_value() or current_rev.runtime_variant
),
extra_mounts=list(current_rev.extra_mounts),
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
jopemachine
commented
Apr 14, 2026
bd97fdd to
9d05db6
Compare
313f8ca to
f870faa
Compare
a2e1502 to
350294e
Compare
94c9502 to
c24db41
Compare
350294e to
efa2a91
Compare
Rename access_key -> main_access_key on sokovan data types
(SessionAllocation, PreparedSessionData, SessionDataForPull,
SessionDataForStart, SessionWorkload) and update every sokovan caller
accordingly. Affects:
- sokovan/data/{allocation,lifecycle,workload}.py
- sokovan/scheduler/handlers/lifecycle/*
- sokovan/scheduler/handlers/maintenance/sweep_sessions.py
- sokovan/scheduler/provisioner/{provisioner,sequencers,validators}/*
- sokovan/scheduler/launcher/launcher.py
- sokovan/scheduler/post_processors/cache_invalidation.py
- sokovan/scheduler/fair_share/aggregator.py
- sokovan/scheduling_controller/{preparers,scheduling_controller}.py
- sokovan/deployment/{executor,route}.py
No external behavior change.
- Drop ``owner_id`` from 21 read/control session action dataclasses;
keep it only on ``create_from_template`` / ``create_from_params`` /
``create_cluster`` (the three delegation-capable creation paths).
- ``services/session/service.py``: add ``_requester_user_id()`` that
pulls the caller's UUID from the ``current_user()`` context var,
replacing 21 ``owner_id = action.owner_id`` sites.
- ``services/session/lifecycle.py``: inject ``UserRepository`` via the
constructor instead of instantiating it internally.
- ``registry.py``: accept ``UserRepository`` and forward it into
``SessionLifecycleManager``.
- ``dependencies/agents/{registry,composer}.py``: wire the repository
through the dependency graph.
- ``repositories/user/{repository,db_source/db_source}.py``: add
``get_main_access_key_by_id`` (renamed from ``_by_uuid``).
- Minor adapter/service touch-ups in ``services/export`` and
``repositories/model_serving``.
- Revert stray changes in repositories/model_serving/repository.py (DeploymentStorageSource import / model-definition fetch removal were unrelated to BA-5650). - services/session/service.py: _resolve_owner_main_access_key now uses the narrower UserRepository.get_main_access_key_by_id helper instead of loading the entire UserData record. - services/session/lifecycle.py: guard the POST_START_SESSION hook payload against a None main_access_key — log a warning and skip the hook rather than pass None into plugins that expect an AccessKey.
Clean up the ``changes/BA-5650-{D,E,F}.misc.md`` files that were superseded when each slice's news fragment was renamed to the assigned PR number (e.g. ``changes/11046.enhance.md``). These stragglers made it onto downstream branches during the cascade rebases.
Update remaining call sites in scheduler db_source and test fixtures to use the renamed dataclass fields: - ``main_access_key`` (previously ``access_key``) on ``ScheduledSessionData``, ``SessionDataForPull``, ``SessionDataForStart``, ``SessionWorkload``, ``SweptSessionInfo``, ``TerminatingSessionData``, ``PendingSessionData``. - ``owner_id`` (previously ``user_uuid``) on ``SessionEnqueueData``, ``KernelEnqueueData``, ``SessionDataForStart``, ``SessionWorkload``. - ``pending_sessions.owner_ids`` (previously ``user_uuids``) on ``PendingSessions``. Joined ``users`` for SweptSessionInfo and ScheduledSessionData queries so ``main_access_key`` is sourced from the user record rather than the session column being dropped in slice K. Drop unused ``target_main_access_key`` argument from ``EndpointRow.delegate_endpoint_ownership`` call (already unused by the ORM helper); keep the parameter on the repository facade for caller compatibility. Inject ``user_repository=MagicMock()`` into ``AgentRegistry`` test construction in ``test_reconcile_agent_resources``. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G). - ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's ``access_key`` field is now an empty string until the read-time resolver lands. - ``cache_invalidation`` uses ``info.access_key`` (the field name on SessionTransitionInfo). - ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction`` no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter. - ``ModelServingRepository.get_session_by_id`` drops the now-removed positional ``owner_access_key`` arg from ``SessionRow.get_session``. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f3342b7 to
cd02a14
Compare
…abels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
Consolidated into the merged PR stack (A+B+C+D, E+F, G+H, I) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📚 Stack
mainMerge from top to bottom. Intermediate slices may not build standalone; the final refactor tip is #11051, and #11040 is the schema-drop follow-up on top of it.
Summary
Drop
owner_idfrom 21 read/control session action dataclasses; keep it only on the three delegation-capable creation actions.services/session/service.py:_requester_user_id()resolves requester UUID from thecurrent_user()context var, replacing 21action.owner_idsites.services/session/lifecycle.py: injectUserRepositoryvia constructor.registry.py,dependencies/agents/*: wire the repository through the DI graph.UserRepository.get_main_access_key_by_idfinalised.Resolves BA-5715. Part of epic BA-5650.
BA-5650 Series: Split Rationale
Overall goal: migrate the session owner identifier from
access_key(keypair) toowner_id(user UUID), and drop thesessions.access_key/kernels.access_keycolumns.Split criteria: layer + dependency order. Bottom-up (DB helpers → service → API) so the destructive column drop can land safely at the end.
get_main_access_key_by_idand related resolver helpers (everything else depends on this)UserPermission.user_uuid → owner_id; addmain_access_keyfieldSessionData.user_uuid → owner_id; Row adapters; GQL nodeSessionRepository,SessionDBSource, creators signaturesowner_idfrom 21 read/control Actions; resolve viacurrent_user()owner_access_keyfrom REST v1 DTOs (breaking)Why this split
access_keyfromowner_id. BA-5653 (destructive) only runs once every reader has migrated off the dropped column.