Skip to content

refactor(BA-5650): unify session ownership to owner_id and main_access_key [1/2]#11050

Closed
jopemachine wants to merge 61 commits into
mainfrom
breaking/BA-5650-H-rest-v1-owner-id
Closed

refactor(BA-5650): unify session ownership to owner_id and main_access_key [1/2]#11050
jopemachine wants to merge 61 commits into
mainfrom
breaking/BA-5650-H-rest-v1-owner-id

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented Apr 14, 2026

Resolves #10914 (BA-5650)

Summary

  • Add main_access_key resolver helpers and thread through UserPermission
  • Rename SessionData.user_uuidowner_id, collapse repository signatures to owner_id: UUID
  • Propagate owner_id rename through scheduler, sokovan handlers, and coordinator
  • Resolve owner_id via current_user() in service layer
  • Breaking: remove owner_access_key from REST v1 session API
  • Update related test fixtures for mypy compliance

Test plan

  • pants lint passes
  • pants check passes (mypy: no issues found)
  • All unit and component tests pass

Stack

  1. refactor(BA-5650): unify session ownership to owner_id and main_access_key [1/2] #11050 ← You are here — source code changes + related test fixes
  2. refactor(BA-5650): update tests and remaining ORM references [2/2] #11051 — remaining test updates and ORM cleanup

🤖 Generated with Claude Code

Two independent additions that other parts of the BA-5650 stack build
on:

- ``UserRepository.get_main_access_key_by_id`` (+ the matching
  ``UserDBSource`` method) returns ``main_access_key`` for a given
  user UUID with a single readonly query. Used by downstream refactors
  that need to resolve the keypair access_key on demand from the owner.
- ``SessionConditions.by_access_key_*`` filters in
  ``models/session/conditions.py`` now resolve through a subquery on
  ``UserRow.main_access_key`` keyed by ``SessionRow.user_uuid``, instead
  of the ``sessions.access_key`` column. A new
  ``_owners_where_main_access_key`` staticmethod centralises the
  subquery shape.

Behavior is unchanged on ``main`` where ``sessions.access_key`` still
mirrors the owner's main_access_key; this change just lets us drop the
redundant column later.
Slice A scope is only the new get_main_access_key_by_id helper and the
SessionConditions rewrite. Accidentally dropping the
target_main_access_key argument from delegate_endpoint_ownership
belongs to a later slice that also updates EndpointRow and the user
service; restoring the signature here so this PR builds standalone.
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from ba76760 to 4303676 Compare April 14, 2026 07:39
@jopemachine jopemachine force-pushed the breaking/BA-5650-H-rest-v1-owner-id branch from 0b74482 to 6180d2d Compare April 14, 2026 07:39
@jopemachine jopemachine requested a review from Copilot April 14, 2026 07:47
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from 4303676 to 557b59a Compare April 14, 2026 07:52
@jopemachine jopemachine force-pushed the breaking/BA-5650-H-rest-v1-owner-id branch from 6180d2d to 8bb85b0 Compare April 14, 2026 07:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the owner_access_key parameter from REST v1 session endpoints and shifts delegation to owner_id: UUID (only for session creation endpoints), while making read/control endpoints always operate as the authenticated caller.

Changes:

  • Remove owner_access_key from REST v1 session request DTOs and update handlers to use owner_id for delegated session creation only.
  • Drop AuthProcessors/resolve_access_key_scope dependency from the v1 session handler and update route wiring accordingly.
  • Update REST v2 session handler + session adapter signatures to stop threading access_key for shutdown/logs/update, and adjust tests + add breaking-change note.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
tests/unit/common/dto/manager/session/test_request.py Updates request DTO tests for removed fields (but now contains some no-op assertions).
tests/component/session/test_session_query.py Removes AuthProcessors wiring for session handler in component test server setup.
tests/component/session/test_session_create_delegation.py Switches delegation input from owner_access_key to owner_id in the component test.
tests/component/session/conftest.py Removes AuthProcessors wiring for session handler in component test fixtures.
src/ai/backend/manager/api/rest/v2/session/handler.py Removes user_ctx/access_key threading for shutdown/logs/update calls into adapter.
src/ai/backend/manager/api/rest/tree.py Updates REST v1 session handler construction after dropping auth dependency.
src/ai/backend/manager/api/rest/session/handler.py Core v1 session handler updates: remove access-key scope resolution; use owner_id for delegation on create endpoints; enforce caller-only behavior elsewhere.
src/ai/backend/manager/api/adapters/session.py Removes adapter access_key plumbing for shutdown/logs/update; updates node mapping fields to owner_id/main_access_key.
src/ai/backend/common/dto/manager/session/request.py Removes owner_access_key fields from v1 request DTOs; adds owner_id to creation DTOs.
changes/11050.breaking.md Documents the breaking API change for clients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ai/backend/manager/api/rest/session/handler.py Outdated
Comment thread tests/unit/common/dto/manager/session/test_request.py Outdated
Comment thread tests/unit/common/dto/manager/session/test_request.py Outdated
Comment thread src/ai/backend/manager/api/rest/session/handler.py
Comment thread src/ai/backend/manager/api/rest/session/handler.py
Comment thread tests/component/session/test_session_create_delegation.py
Comment thread src/ai/backend/manager/api/rest/session/handler.py Outdated
Comment thread src/ai/backend/manager/api/rest/session/handler.py
Comment thread src/ai/backend/manager/api/rest/session/handler.py
Comment thread src/ai/backend/manager/api/rest/session/handler.py Outdated
Drop the redundant UserRow.main_access_key IS NOT NULL guard inside
SessionConditions._owners_where_main_access_key. The subquery selects
users.uuid (non-null PK), so NOT IN is well-defined; NULL
main_access_key fails the caller-supplied condition on its own. The
behaviour is unchanged — this just makes the helper contract clearer.

Also drop the stale misc news fragment (skip:changelog applies to this
purely additive refactor).
Rename UserPermission.user_uuid to owner_id and add main_access_key:
str | None. KernelRow.to_kernel_info populates the new field from the
eagerly-loaded user_row relationship; search_kernels now selectinload's
user_row so the scalar is available without N+1.

KernelRow.delegate_ownership drops its access_key argument (resolved
from the owner). recalc_concurrency_used switches from
KernelRow.access_key to a KernelRow.user_uuid subquery keyed by the
owner's main_access_key as a shim for the eventual access_key column
drop.

Test fixtures that construct UserPermission are updated.
Earlier the branch pulled test fixtures that also referenced later
slices' renames (SessionMetadata.owner_id, ScheduledSessionData
.main_access_key, SessionDataForPull/Start.main_access_key,
SessionData.owner_id). Revert those fixtures to main state and
re-apply only the UserPermission field renames (user_uuid -> owner_id,
access_key -> main_access_key) which are in this slice's scope.

Also update the two remaining live readers of UserPermission:
- sokovan/scheduler/fair_share/aggregator.py
- api/adapters/session.py (kernel_info_to_node)

And fix the SessionRow.delegate_ownership caller of
KernelRow.delegate_ownership to match the new single-argument signature.
jopemachine and others added 2 commits April 14, 2026 19:21
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData
  now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G).
- ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and
  drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's
  ``access_key`` field is now an empty string until the read-time
  resolver lands.
- ``cache_invalidation`` uses ``info.access_key`` (the field name on
  SessionTransitionInfo).
- ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction``
  no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter.
- ``ModelServingRepository.get_session_by_id`` drops the now-removed
  positional ``owner_access_key`` arg from ``SessionRow.get_session``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the refactor/BA-5650-G-action-service-owner-id branch from f3342b7 to cd02a14 Compare April 14, 2026 10:25
jopemachine and others added 13 commits April 14, 2026 19:26
``SessionRepository`` and the underlying ``SessionDBSource`` now take
``owner_id: UUID`` on every method that previously accepted
``owner_access_key: AccessKey``. Affects:

- ``get_session_validated``
- ``match_sessions``
- ``update_session_name``
- ``find_dependency_sessions`` / ``_find_dependent_sessions``
- ``get_target_session_ids``
- ``get_session_with_group``

The matching ``dependency_graph`` helpers and ``creators`` are updated
in lockstep. Service-layer callers still pass ``owner_access_key``
temporarily; they will be migrated in the next slice.
Rename access_key -> main_access_key on sokovan data types
(SessionAllocation, PreparedSessionData, SessionDataForPull,
SessionDataForStart, SessionWorkload) and update every sokovan caller
accordingly. Affects:

- sokovan/data/{allocation,lifecycle,workload}.py
- sokovan/scheduler/handlers/lifecycle/*
- sokovan/scheduler/handlers/maintenance/sweep_sessions.py
- sokovan/scheduler/provisioner/{provisioner,sequencers,validators}/*
- sokovan/scheduler/launcher/launcher.py
- sokovan/scheduler/post_processors/cache_invalidation.py
- sokovan/scheduler/fair_share/aggregator.py
- sokovan/scheduling_controller/{preparers,scheduling_controller}.py
- sokovan/deployment/{executor,route}.py

No external behavior change.
- Drop ``owner_id`` from 21 read/control session action dataclasses;
  keep it only on ``create_from_template`` / ``create_from_params`` /
  ``create_cluster`` (the three delegation-capable creation paths).
- ``services/session/service.py``: add ``_requester_user_id()`` that
  pulls the caller's UUID from the ``current_user()`` context var,
  replacing 21 ``owner_id = action.owner_id`` sites.
- ``services/session/lifecycle.py``: inject ``UserRepository`` via the
  constructor instead of instantiating it internally.
- ``registry.py``: accept ``UserRepository`` and forward it into
  ``SessionLifecycleManager``.
- ``dependencies/agents/{registry,composer}.py``: wire the repository
  through the dependency graph.
- ``repositories/user/{repository,db_source/db_source}.py``: add
  ``get_main_access_key_by_id`` (renamed from ``_by_uuid``).
- Minor adapter/service touch-ups in ``services/export`` and
  ``repositories/model_serving``.
Clean up the ``changes/BA-5650-{D,E,F}.misc.md`` files that were
superseded when each slice's news fragment was renamed to the assigned
PR number (e.g. ``changes/11046.enhance.md``). These stragglers made it
onto downstream branches during the cascade rebases.
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData
  now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G).
- ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and
  drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's
  ``access_key`` field is now an empty string until the read-time
  resolver lands.
- ``cache_invalidation`` uses ``info.access_key`` (the field name on
  SessionTransitionInfo).
- ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction``
  no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter.
- ``ModelServingRepository.get_session_by_id`` drops the now-removed
  positional ``owner_access_key`` arg from ``SessionRow.get_session``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scheduler / predicates / scheduler-type collapse of the owner key:

- ``scheduler/predicates.py``: predicates now take SessionRow and
  resolve ``main_access_key`` from the owner via a helper when a
  keypair-scoped lookup (Redis concurrency, keypair resource policy)
  is required. Renames ``SessionRow.user_uuid`` references throughout.
- ``scheduler/drf.py``: per-user fairness tracking keyed by
  ``owner_id``/``main_access_key`` pair.
- ``repositories/scheduler/options.py``: drop the duplicated
  ``by_access_key_*`` factories — session filters go through
  ``SessionConditions`` helpers instead.
- ``repositories/scheduler/types/*``: rename ``access_key`` to
  ``main_access_key`` on ``ScheduledSessionData``,
  ``TerminatingSessionData``, ``SweptSessionInfo``, ``KernelEnqueueData``
  and ``SessionEnqueueData``.
- ``repositories/events/db_source/db_source.py`` and
  ``repositories/stream/db_source/db_source.py``: resolve the owner
  UUID from ``main_access_key`` via a sub-select shim while the schema
  still exposes ``sessions.access_key``.
REST v1 session endpoints no longer accept owner_access_key. The
delegation field is replaced with owner_id (user UUID) and is honored
only on the three session-creation endpoints
(create_from_template / create_from_params / create_cluster). Read and
control endpoints always act as the authenticated caller.

- common/dto/manager/session/request.py: drop the owner_access_key
  field from Create/Destroy/Restart/GetContainerLogs/GetStatusHistory
  request DTOs; add owner_id to the three creation DTOs.
- api/rest/session/handler.py: remove all 26 resolve_access_key_scope
  calls, drop the AuthProcessors dependency, and renumber log format
  placeholders that dropped the owner argument.
- api/rest/v2/session/handler.py: drop the redundant user_ctx /
  access_key arguments from shutdown_service / get_logs / update.
- api/adapters/session.py: update adapter call sites accordingly.
- api/rest/tree.py: drop the auth= argument from SessionHandler().

Test updates for the corresponding DTO assertions and component
fixtures are included.
- api/rest/session/handler.py: fix log format mismatches. The
  GET_OR_CREATE and CREAT_CLUSTER logs now include both the requester
  access_key and the resolved owner_id. The remaining
  SYNC_AGENT_REGISTRY / TRANSIT_STATUS / COMMIT_SESSION /
  CONVERT_SESSION_TO_IMAGE / GET_COMMIT_STATUS / GET_ABUSING_REPORT /
  GET_STATUS_HISTORY logs drop the redundant second ``/{}`` slot so
  the format string matches the provided arguments.
- tests/unit/common/dto/manager/session/test_request.py: replace the
  ``assert req is not None`` placeholders with asserts that verify
  ``owner_access_key`` no longer exists and the empty model_dump()
  shape; gives us a regression guard for the removed field.
- tests/component/session/test_session_create_delegation.py: rename
  the test / class docstring from ``owner_access_key`` to ``owner_id``
  so they describe the new delegation shape.
Removed TestRestartSessionRequest and TestGetStatusHistoryRequest which
only checked that owner_access_key was no longer present. Trivial
attribute checks add no value over the schema definition itself.

Also clean up unused imports for the removed test classes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After collapsing service signatures and removing ``owner_access_key``
from action DTOs, propagate the rename to remaining call sites:

- Drop ``owner_access_key`` kwargs from session action constructors in
  service tests (MatchSessionsAction, GetStatusHistoryAction,
  DestroySessionAction, CompleteAction, GetSessionInfoAction,
  DownloadFilesAction, GetDirectAccessInfoAction, RenameSessionAction,
  GetContainerLogsAction, ListFilesAction, InterruptSessionAction).
- Rename ``user_uuid``/``access_key`` → ``owner_id`` (drop access_key)
  on ``SessionData`` constructions in tests.
- ``SessionTransitionInfo`` keeps ``access_key``; cache_invalidation
  reverted to ``info.access_key``.
- Terminator conftest now uses ``main_access_key`` for
  ``TerminatingSessionData``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TerminatingSessionData uses ``main_access_key`` (was access_key) in
  test_terminate_sessions.
- SessionData fixture in test_session_adapter uses ``owner_id`` and
  drops the dropped ``access_key`` field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cascading rebase pulled in stray files from a developer working
copy: tmp-vllm-model/, server.py, vllm.sh, model-definition.yaml,
docs/BA-5608-* and docs/model-deployment-zero-downtime.*, and the
scripts/ collection of one-off E2E scripts. None of these belong on
the release branch.

Removing them also unblocks ``pants tailor`` in CI, which had begun
suggesting BUILD additions for the orphan source tree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the breaking/BA-5650-H-rest-v1-owner-id branch from f448f27 to 7077091 Compare April 14, 2026 10:28
…abels

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine changed the base branch from refactor/BA-5650-G-action-service-owner-id to refactor/BA-5650-F-sokovan-owner-id April 15, 2026 01:41
@jopemachine jopemachine changed the title breaking(BA-5716): drop owner_access_key from REST v1 session API refactor(BA-5650): resolve owner_id in services and drop REST v1 owner_access_key [3/4] Apr 15, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine changed the title refactor(BA-5650): resolve owner_id in services and drop REST v1 owner_access_key [3/4] breaking(BA-5650): resolve owner_id in services and drop REST v1 owner_access_key [3/4] Apr 15, 2026
@jopemachine jopemachine added this to the 26.5 milestone Apr 15, 2026
@jopemachine
Copy link
Copy Markdown
Member Author

Squashed into PR #11051 — stacked PRs cannot pass CI independently with cross-cutting type changes.

@jopemachine jopemachine reopened this Apr 15, 2026
@jopemachine jopemachine changed the title breaking(BA-5650): resolve owner_id in services and drop REST v1 owner_access_key [3/4] refactor(BA-5650): unify session ownership to owner_id and main_access_key [1/2] Apr 15, 2026
@jopemachine jopemachine changed the base branch from refactor/BA-5650-F-sokovan-owner-id to main April 15, 2026 02:26
Cherry-pick test updates from slice I so that this branch
passes typecheck independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:common Related to Common component comp:manager Related to Manager component size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename sessions/kernels.user_uuid to owner_id and drop access_key columns

2 participants