Skip to content

refactor(BA-5715): resolve owner_id via current_user() in services#11049

Closed
jopemachine wants to merge 10 commits into
refactor/BA-5650-F-sokovan-owner-idfrom
refactor/BA-5650-G-action-service-owner-id
Closed

refactor(BA-5715): resolve owner_id via current_user() in services#11049
jopemachine wants to merge 10 commits into
refactor/BA-5650-F-sokovan-owner-idfrom
refactor/BA-5650-G-action-service-owner-id

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented Apr 14, 2026

📚 Stack

Merge from top to bottom. Intermediate slices may not build standalone; the final refactor tip is #11051, and #11040 is the schema-drop follow-up on top of it.

Summary

Drop owner_id from 21 read/control session action dataclasses; keep it only on the three delegation-capable creation actions. services/session/service.py: _requester_user_id() resolves requester UUID from the current_user() context var, replacing 21 action.owner_id sites. services/session/lifecycle.py: inject UserRepository via constructor. registry.py, dependencies/agents/*: wire the repository through the DI graph. UserRepository.get_main_access_key_by_id finalised.

Resolves BA-5715. Part of epic BA-5650.


BA-5650 Series: Split Rationale

Overall goal: migrate the session owner identifier from access_key (keypair) to owner_id (user UUID), and drop the sessions.access_key / kernels.access_key columns.

Split criteria: layer + dependency order. Bottom-up (DB helpers → service → API) so the destructive column drop can land safely at the end.

Order Issue Layer Role
1 BA-5709 Foundation Add get_main_access_key_by_id and related resolver helpers (everything else depends on this)
2 BA-5710 RBAC / Permission Rename UserPermission.user_uuid → owner_id; add main_access_key field
3 BA-5711 Data Rename SessionData.user_uuid → owner_id; Row adapters; GQL node
4 BA-5712 Repository Collapse SessionRepository, SessionDBSource, creators signatures
5 BA-5713 Scheduler predicates / drf / options; scheduler repo; stream/events shims
6 BA-5714 Sokovan allocation / lifecycle / workload, launcher, controller, executor
7 BA-5715 Service Drop owner_id from 21 read/control Actions; resolve via current_user()
8 BA-5716 API (v1) Remove owner_access_key from REST v1 DTOs (breaking)
9 BA-5717 Tests / Cleanup Test suite updates; remaining ORM / gql_legacy touch-ups
10 BA-5653 Schema Alembic migration — rename/drop columns (destructive, must land last)

Why this split

  • Dependencies: the BA-5709 helpers are required before 3–7 can recover access_key from owner_id. BA-5653 (destructive) only runs once every reader has migrated off the dropped column.
  • Review size: a single-shot change would touch hundreds of files. Slicing by layer keeps each PR focused on one concern.
  • Rollback safety: steps 1–7 are no-op on external behavior, step 8 is the API breaking change, step 10 is the schema drop. Each step is independently revertible.

@github-actions github-actions Bot added size:L 100~500 LoC comp:manager Related to Manager component labels Apr 14, 2026
@jopemachine jopemachine changed the title refactor(BA-5650-G): resolve owner_id via current_user() in services refactor(BA-5715): resolve owner_id via current_user() in services Apr 14, 2026
@jopemachine jopemachine marked this pull request as draft April 14, 2026 07:30
@jopemachine jopemachine force-pushed the refactor/BA-5650-F-sokovan-owner-id branch from 07a5509 to 89df267 Compare April 14, 2026 07:39
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from ba76760 to 4303676 Compare April 14, 2026 07:39
@jopemachine jopemachine requested a review from Copilot April 14, 2026 07:47
@jopemachine jopemachine force-pushed the refactor/BA-5650-F-sokovan-owner-id branch from 89df267 to 203e3d2 Compare April 14, 2026 07:52
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from 4303676 to 557b59a Compare April 14, 2026 07:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors session service flows to stop threading owner_id/owner_access_key through most read/control actions and instead derive the caller identity from the current_user() context, while keeping delegation support for a limited set of creation actions.

Changes:

  • Replace many action.owner_* usages in SessionService with _requester_user_id() (via current_user()), and add helper to resolve delegated owner main access key.
  • Inject UserRepository into SessionLifecycleManager and wire it through the AgentRegistry DI graph.
  • Update session action dataclasses and template override validation to use owner_id (UUID) instead of owner_access_key.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/ai/backend/manager/services/session/types.py Switch template override schema field from owner_access_key to owner_id.
src/ai/backend/manager/services/session/service.py Derive requester owner_id from current_user(), add main-access-key resolver, and update many service methods accordingly.
src/ai/backend/manager/services/session/lifecycle.py Constructor-inject UserRepository and use it to resolve main access key for hook notifications.
src/ai/backend/manager/services/session/actions/*.py Remove owner_access_key from many read/control action dataclasses; change create actions to carry owner_id.
src/ai/backend/manager/repositories/model_serving/repository.py Adjust SessionRow.get_session call signature; remove model-definition refresh logic during modify_endpoint.
src/ai/backend/manager/registry.py Wire UserRepository into SessionLifecycleManager; update SessionRow.get_session calls to use owner_id.
src/ai/backend/manager/dependencies/agents/registry.py Add UserRepository to AgentRegistry dependency input and provider wiring.
src/ai/backend/manager/dependencies/agents/composer.py Instantiate and provide UserRepository in the DI composition for AgentRegistry.
changes/11049.misc.md Changelog entry for the refactor.
Comments suppressed due to low confidence (1)

src/ai/backend/manager/repositories/model_serving/repository.py:889

  • This change removes the model-definition refresh behavior during endpoint modification: the new revision is no longer populated with a refreshed model_definition, so updates to model-definition.yaml in the vfolder won’t be picked up. There is an existing regression test (tests/unit/manager/repositories/model_serving/test_modify_endpoint_model_definition.py) that asserts this behavior. If this behavior is still required, reintroduce the refresh (or move it elsewhere); if it’s intentionally being removed, the test suite and any docs/semantics around modify_endpoint should be updated accordingly.
                # If revision-level fields changed, create a new revision
                if spec.has_revision_changes():
                    current_rev = endpoint_row._find_current_revision()
                    if current_rev is None:
                        raise InvalidAPIParameters("Endpoint has no current revision")

                    # Resolve image if changed
                    image_id = current_rev.image
                    image_ref = spec.image.optional_value()
                    if image_ref is not None:
                        image_name = image_ref.name
                        arch = image_ref.architecture.value()
                        resolved_image = await ImageRow.resolve(
                            db_session,
                            [ImageIdentifier(image_name, arch), ImageAlias(image_name)],
                        )
                        image_id = resolved_image.id

                    # Merge resource_slots: spec overrides current revision
                    merged_slots: ResourceSlot = (
                        spec.resource_slots.optional_value()
                        or ResourceSlot({
                            r.slot_name: r.quantity for r in current_rev.resource_slot_rows
                        })
                    )

                    # Merge revision fields: current revision as base, spec overrides
                    new_revision = DeploymentRevisionRow(
                        endpoint=endpoint_row.id,
                        revision_number=0,  # placeholder, will be set atomically below
                        image=image_id,
                        model=current_rev.model,
                        model_mount_destination=current_rev.model_mount_destination,
                        model_definition_path=(
                            spec.model_definition_path.optional_value()
                            if spec.model_definition_path.optional_value() is not None
                            else current_rev.model_definition_path
                        ),
                        resource_group=endpoint_row.resource_group,
                        resource_opts=(
                            spec.resource_opts.optional_value()
                            if spec.resource_opts.optional_value() is not None
                            else current_rev.resource_opts
                        ),
                        cluster_mode=(
                            spec.cluster_mode.value().value
                            if spec.cluster_mode.optional_value() is not None
                            else current_rev.cluster_mode
                        ),
                        cluster_size=(
                            spec.cluster_size.optional_value() or current_rev.cluster_size
                        ),
                        startup_command=current_rev.startup_command,
                        bootstrap_script=current_rev.bootstrap_script,
                        environ=(
                            spec.environ.optional_value()
                            if spec.environ.optional_value() is not None
                            else current_rev.environ
                        ),
                        callback_url=current_rev.callback_url,
                        runtime_variant=(
                            spec.runtime_variant.optional_value() or current_rev.runtime_variant
                        ),
                        extra_mounts=list(current_rev.extra_mounts),
                    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ai/backend/manager/repositories/model_serving/repository.py
Comment thread src/ai/backend/manager/services/session/lifecycle.py Outdated
Comment thread src/ai/backend/manager/services/session/service.py
Comment thread src/ai/backend/manager/services/session/service.py Outdated
Comment thread src/ai/backend/manager/services/session/service.py
Comment thread changes/BA-5650-F.misc.md Outdated
Comment thread src/ai/backend/manager/repositories/model_serving/repository.py
Comment thread src/ai/backend/manager/repositories/model_serving/repository.py
@jopemachine jopemachine force-pushed the refactor/BA-5650-F-sokovan-owner-id branch from bd97fdd to 9d05db6 Compare April 14, 2026 08:13
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from 313f8ca to f870faa Compare April 14, 2026 08:16
@github-actions github-actions Bot added size:M 30~100 LoC and removed size:L 100~500 LoC labels Apr 14, 2026
@jopemachine jopemachine force-pushed the refactor/BA-5650-F-sokovan-owner-id branch from a2e1502 to 350294e Compare April 14, 2026 08:34
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from 94c9502 to c24db41 Compare April 14, 2026 08:36
@github-actions github-actions Bot added size:L 100~500 LoC and removed size:M 30~100 LoC labels Apr 14, 2026
@jopemachine jopemachine force-pushed the refactor/BA-5650-F-sokovan-owner-id branch from 350294e to efa2a91 Compare April 14, 2026 10:21
jopemachine and others added 9 commits April 14, 2026 19:21
Rename access_key -> main_access_key on sokovan data types
(SessionAllocation, PreparedSessionData, SessionDataForPull,
SessionDataForStart, SessionWorkload) and update every sokovan caller
accordingly. Affects:

- sokovan/data/{allocation,lifecycle,workload}.py
- sokovan/scheduler/handlers/lifecycle/*
- sokovan/scheduler/handlers/maintenance/sweep_sessions.py
- sokovan/scheduler/provisioner/{provisioner,sequencers,validators}/*
- sokovan/scheduler/launcher/launcher.py
- sokovan/scheduler/post_processors/cache_invalidation.py
- sokovan/scheduler/fair_share/aggregator.py
- sokovan/scheduling_controller/{preparers,scheduling_controller}.py
- sokovan/deployment/{executor,route}.py

No external behavior change.
- Drop ``owner_id`` from 21 read/control session action dataclasses;
  keep it only on ``create_from_template`` / ``create_from_params`` /
  ``create_cluster`` (the three delegation-capable creation paths).
- ``services/session/service.py``: add ``_requester_user_id()`` that
  pulls the caller's UUID from the ``current_user()`` context var,
  replacing 21 ``owner_id = action.owner_id`` sites.
- ``services/session/lifecycle.py``: inject ``UserRepository`` via the
  constructor instead of instantiating it internally.
- ``registry.py``: accept ``UserRepository`` and forward it into
  ``SessionLifecycleManager``.
- ``dependencies/agents/{registry,composer}.py``: wire the repository
  through the dependency graph.
- ``repositories/user/{repository,db_source/db_source}.py``: add
  ``get_main_access_key_by_id`` (renamed from ``_by_uuid``).
- Minor adapter/service touch-ups in ``services/export`` and
  ``repositories/model_serving``.
- Revert stray changes in repositories/model_serving/repository.py
  (DeploymentStorageSource import / model-definition fetch removal
  were unrelated to BA-5650).
- services/session/service.py: _resolve_owner_main_access_key now
  uses the narrower UserRepository.get_main_access_key_by_id helper
  instead of loading the entire UserData record.
- services/session/lifecycle.py: guard the POST_START_SESSION hook
  payload against a None main_access_key — log a warning and skip the
  hook rather than pass None into plugins that expect an AccessKey.
Clean up the ``changes/BA-5650-{D,E,F}.misc.md`` files that were
superseded when each slice's news fragment was renamed to the assigned
PR number (e.g. ``changes/11046.enhance.md``). These stragglers made it
onto downstream branches during the cascade rebases.
Update remaining call sites in scheduler db_source and test fixtures
to use the renamed dataclass fields:

- ``main_access_key`` (previously ``access_key``) on
  ``ScheduledSessionData``, ``SessionDataForPull``, ``SessionDataForStart``,
  ``SessionWorkload``, ``SweptSessionInfo``, ``TerminatingSessionData``,
  ``PendingSessionData``.
- ``owner_id`` (previously ``user_uuid``) on
  ``SessionEnqueueData``, ``KernelEnqueueData``, ``SessionDataForStart``,
  ``SessionWorkload``.
- ``pending_sessions.owner_ids`` (previously ``user_uuids``) on
  ``PendingSessions``.

Joined ``users`` for SweptSessionInfo and ScheduledSessionData queries
so ``main_access_key`` is sourced from the user record rather than the
session column being dropped in slice K.

Drop unused ``target_main_access_key`` argument from
``EndpointRow.delegate_endpoint_ownership`` call (already unused by the
ORM helper); keep the parameter on the repository facade for caller
compatibility.

Inject ``user_repository=MagicMock()`` into ``AgentRegistry`` test
construction in ``test_reconcile_agent_resources``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData
  now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G).
- ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and
  drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's
  ``access_key`` field is now an empty string until the read-time
  resolver lands.
- ``cache_invalidation`` uses ``info.access_key`` (the field name on
  SessionTransitionInfo).
- ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction``
  no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter.
- ``ModelServingRepository.get_session_by_id`` drops the now-removed
  positional ``owner_access_key`` arg from ``SessionRow.get_session``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from f3342b7 to cd02a14 Compare April 14, 2026 10:25
@github-actions github-actions Bot added size:M 30~100 LoC and removed size:L 100~500 LoC labels Apr 14, 2026
…abels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine
Copy link
Copy Markdown
Member Author

Consolidated into the merged PR stack (A+B+C+D, E+F, G+H, I)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:manager Related to Manager component size:M 30~100 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants