Skip to content

breaking(BA-5653): drop sessions/kernels access_key columns#11040

Closed
jopemachine wants to merge 28 commits into
refactor/BA-5650-I-remainingfrom
feat/BA-5653-drop-session-access-key-columns
Closed

breaking(BA-5653): drop sessions/kernels access_key columns#11040
jopemachine wants to merge 28 commits into
refactor/BA-5650-I-remainingfrom
feat/BA-5653-drop-session-access-key-columns

Conversation

@jopemachine
Copy link
Copy Markdown
Member

@jopemachine jopemachine commented Apr 14, 2026

📚 Stack

Merge from top to bottom. Intermediate slices may not build standalone; the final refactor tip is #11051, and #11040 is the schema-drop follow-up on top of it.

Summary

Breaking schema change. Alembic migration 8c1d2e3f4a5b_drop_session_kernel_access_key drops sessions.access_key and kernels.access_key columns. Downstream callers resolve the owner's main_access_key from the users table. user_uuid column stays. ORM drop of the access_key mapped column and call-site migration follow in subsequent commits on this branch.

Resolves BA-5653. Part of epic BA-5650.


📚 Documentation preview 📚: https://sorna--11040.org.readthedocs.build/en/11040/


📚 Documentation preview 📚: https://sorna-ko--11040.org.readthedocs.build/ko/11040/


BA-5650 Series: Split Rationale

Overall goal: migrate the session owner identifier from access_key (keypair) to owner_id (user UUID), and drop the sessions.access_key / kernels.access_key columns.

Split criteria: layer + dependency order. Bottom-up (DB helpers → service → API) so the destructive column drop can land safely at the end.

Order Issue Layer Role
1 BA-5709 Foundation Add get_main_access_key_by_id and related resolver helpers (everything else depends on this)
2 BA-5710 RBAC / Permission Rename UserPermission.user_uuid → owner_id; add main_access_key field
3 BA-5711 Data Rename SessionData.user_uuid → owner_id; Row adapters; GQL node
4 BA-5712 Repository Collapse SessionRepository, SessionDBSource, creators signatures
5 BA-5713 Scheduler predicates / drf / options; scheduler repo; stream/events shims
6 BA-5714 Sokovan allocation / lifecycle / workload, launcher, controller, executor
7 BA-5715 Service Drop owner_id from 21 read/control Actions; resolve via current_user()
8 BA-5716 API (v1) Remove owner_access_key from REST v1 DTOs (breaking)
9 BA-5717 Tests / Cleanup Test suite updates; remaining ORM / gql_legacy touch-ups
10 BA-5653 Schema Alembic migration — rename/drop columns (destructive, must land last)

Why this split

  • Dependencies: the BA-5709 helpers are required before 3–7 can recover access_key from owner_id. BA-5653 (destructive) only runs once every reader has migrated off the dropped column.
  • Review size: a single-shot change would touch hundreds of files. Slicing by layer keeps each PR focused on one concern.
  • Rollback safety: steps 1–7 are no-op on external behavior, step 8 is the API breaking change, step 10 is the schema drop. Each step is independently revertible.

@github-actions github-actions Bot added size:L 100~500 LoC area:docs Documentations comp:manager Related to Manager component comp:common Related to Common component comp:webserver Related to Web Server component require:db-migration Automatically set when alembic migrations are added or updated labels Apr 14, 2026
@jopemachine jopemachine changed the base branch from feat/BA-5650-session-owner-id-signatures to refactor/BA-5650-I-remaining April 14, 2026 07:25
@jopemachine jopemachine force-pushed the feat/BA-5653-drop-session-access-key-columns branch from ceb31a4 to c0b9566 Compare April 14, 2026 07:34
@github-actions github-actions Bot added size:M 30~100 LoC and removed size:L 100~500 LoC labels Apr 14, 2026
@jopemachine jopemachine force-pushed the refactor/BA-5650-I-remaining branch from d0c6db5 to 7cad928 Compare April 14, 2026 07:39
@jopemachine jopemachine force-pushed the feat/BA-5653-drop-session-access-key-columns branch from c0b9566 to 95d059c Compare April 14, 2026 07:40
@jopemachine jopemachine requested a review from Copilot April 14, 2026 07:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a breaking schema change to remove redundant access_key snapshot columns from the sessions and kernels tables, aligning persistence with the newer approach of resolving the owner’s main_access_key via the users table.

Changes:

  • Add an Alembic migration to drop sessions.access_key and kernels.access_key (with best-effort index cleanup).
  • Regenerate/update GraphQL schema reference docs (enum ordering change only).
  • Add a breaking-change note documenting the dropped columns.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/ai/backend/manager/models/alembic/versions/8c1d2e3f4a5b_drop_session_kernel_access_key.py Migration to drop access_key columns from sessions/kernels.
docs/manager/graphql-reference/v2-schema.graphql Documentation-only schema output change (enum value ordering).
docs/manager/graphql-reference/supergraph.graphql Documentation-only supergraph output change (enum value ordering).
changes/11040.breaking.md Release-note entry for the breaking schema change.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jopemachine jopemachine force-pushed the refactor/BA-5650-I-remaining branch from 5798c3f to e2a1508 Compare April 14, 2026 07:52
@jopemachine jopemachine force-pushed the feat/BA-5653-drop-session-access-key-columns branch from 5e98b29 to e620fed Compare April 14, 2026 07:52
Comment thread changes/11040.breaking.md Outdated
@jopemachine jopemachine force-pushed the refactor/BA-5650-I-remaining branch from 8cc4f1a to bf40534 Compare April 14, 2026 08:23
jopemachine and others added 9 commits April 14, 2026 19:28
Clean up the ``changes/BA-5650-{D,E,F}.misc.md`` files that were
superseded when each slice's news fragment was renamed to the assigned
PR number (e.g. ``changes/11046.enhance.md``). These stragglers made it
onto downstream branches during the cascade rebases.
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData
  now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G).
- ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and
  drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's
  ``access_key`` field is now an empty string until the read-time
  resolver lands.
- ``cache_invalidation`` uses ``info.access_key`` (the field name on
  SessionTransitionInfo).
- ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction``
  no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter.
- ``ModelServingRepository.get_session_by_id`` drops the now-removed
  positional ``owner_access_key`` arg from ``SessionRow.get_session``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
REST v1 session endpoints no longer accept owner_access_key. The
delegation field is replaced with owner_id (user UUID) and is honored
only on the three session-creation endpoints
(create_from_template / create_from_params / create_cluster). Read and
control endpoints always act as the authenticated caller.

- common/dto/manager/session/request.py: drop the owner_access_key
  field from Create/Destroy/Restart/GetContainerLogs/GetStatusHistory
  request DTOs; add owner_id to the three creation DTOs.
- api/rest/session/handler.py: remove all 26 resolve_access_key_scope
  calls, drop the AuthProcessors dependency, and renumber log format
  placeholders that dropped the owner argument.
- api/rest/v2/session/handler.py: drop the redundant user_ctx /
  access_key arguments from shutdown_service / get_logs / update.
- api/adapters/session.py: update adapter call sites accordingly.
- api/rest/tree.py: drop the auth= argument from SessionHandler().

Test updates for the corresponding DTO assertions and component
fixtures are included.
Final slice of the BA-5650 stack split. Contains:

- Remaining ORM touch-ups: models/endpoint/row.py,
  models/keypair/row.py, repositories/scheduler/{repository,db_source}.py,
  api/adapters/vfolder.py, api/gql_legacy/endpoint.py,
  api/gql_legacy/routing.py.
- Test updates that depend on the action/service/DTO renames already
  landed in earlier slices: adapter/session, scheduler repositories,
  session lifecycle/service, sokovan scheduler suite,
  compute_sessions handler, dependency injection tests.
- Autouse ``_user_context`` fixture under
  tests/unit/manager/services/session/conftest.py so service tests work
  without the auth middleware.
Co-authored-by: octodog <mu001@lablup.com>
This slice was carrying unrelated work that crept in during the
original PR split (prometheus client, valkey route health,
deployment auto_activate, web ssl_enabled, auth client_type_id,
gql_legacy routing health_status). All such files are reverted to
the slice-H base.

Also fixes:
- Use session.user.main_access_key (loaded via selectinload) instead
  of session.main_access_key in scheduler db_source — SessionRow has
  no main_access_key attribute.
- UserService delegate path now resolves target_main_access_key via
  UserRepository.get_main_access_key_by_id and forwards it to
  delegate_endpoint_ownership.
- Drop stale changes/BA-5650-H.breaking.md (slice H ships 11050.breaking.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Resolve conflict markers in scheduler db_source by keeping the
  user-table-joined ``main_access_key`` lookup (single source of truth)
  rather than the obsolete ``session.access_key`` snapshot.
- Drop duplicate ``resolve_main_access_keys`` definition introduced
  by the rebase.
- Use ``spec.access_key`` (SessionCreationSpec) where applicable —
  the spec dataclass keeps ``access_key`` for now.
- Test fixture ``_make_session_data`` now passes ``owner_id`` instead
  of relying on attribute assignment after construction.
- Use ``main_access_key`` on PendingSessionData in provisioner test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the refactor/BA-5650-I-remaining branch from 9fbd63d to d95ce09 Compare April 14, 2026 10:31
jopemachine and others added 16 commits April 14, 2026 19:31
``SessionRepository`` and the underlying ``SessionDBSource`` now take
``owner_id: UUID`` on every method that previously accepted
``owner_access_key: AccessKey``. Affects:

- ``get_session_validated``
- ``match_sessions``
- ``update_session_name``
- ``find_dependency_sessions`` / ``_find_dependent_sessions``
- ``get_target_session_ids``
- ``get_session_with_group``

The matching ``dependency_graph`` helpers and ``creators`` are updated
in lockstep. Service-layer callers still pass ``owner_access_key``
temporarily; they will be migrated in the next slice.
Rename access_key -> main_access_key on sokovan data types
(SessionAllocation, PreparedSessionData, SessionDataForPull,
SessionDataForStart, SessionWorkload) and update every sokovan caller
accordingly. Affects:

- sokovan/data/{allocation,lifecycle,workload}.py
- sokovan/scheduler/handlers/lifecycle/*
- sokovan/scheduler/handlers/maintenance/sweep_sessions.py
- sokovan/scheduler/provisioner/{provisioner,sequencers,validators}/*
- sokovan/scheduler/launcher/launcher.py
- sokovan/scheduler/post_processors/cache_invalidation.py
- sokovan/scheduler/fair_share/aggregator.py
- sokovan/scheduling_controller/{preparers,scheduling_controller}.py
- sokovan/deployment/{executor,route}.py

No external behavior change.
Clean up the ``changes/BA-5650-{D,E,F}.misc.md`` files that were
superseded when each slice's news fragment was renamed to the assigned
PR number (e.g. ``changes/11046.enhance.md``). These stragglers made it
onto downstream branches during the cascade rebases.
- Tests for SessionEnqueueData / KernelEnqueueData / TerminatingSessionData
  now use ``main_access_key`` and ``owner_id`` (renamed in slice F/G).
- ``SessionAdapter._session_data_to_node`` reads ``data.owner_id`` and
  drops the obsolete ``data.access_key``; the SessionMetadata GQL DTO's
  ``access_key`` field is now an empty string until the read-time
  resolver lands.
- ``cache_invalidation`` uses ``info.access_key`` (the field name on
  SessionTransitionInfo).
- ``ShutdownServiceAction``/``GetContainerLogsAction``/``RenameSessionAction``
  no longer accept ``owner_access_key``; drop the kwarg in the GQL adapter.
- ``ModelServingRepository.get_session_by_id`` drops the now-removed
  positional ``owner_access_key`` arg from ``SessionRow.get_session``.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final slice of the BA-5650 stack split. Contains:

- Remaining ORM touch-ups: models/endpoint/row.py,
  models/keypair/row.py, repositories/scheduler/{repository,db_source}.py,
  api/adapters/vfolder.py, api/gql_legacy/endpoint.py,
  api/gql_legacy/routing.py.
- Test updates that depend on the action/service/DTO renames already
  landed in earlier slices: adapter/session, scheduler repositories,
  session lifecycle/service, sokovan scheduler suite,
  compute_sessions handler, dependency injection tests.
- Autouse ``_user_context`` fixture under
  tests/unit/manager/services/session/conftest.py so service tests work
  without the auth middleware.
This slice was carrying unrelated work that crept in during the
original PR split (prometheus client, valkey route health,
deployment auto_activate, web ssl_enabled, auth client_type_id,
gql_legacy routing health_status). All such files are reverted to
the slice-H base.

Also fixes:
- Use session.user.main_access_key (loaded via selectinload) instead
  of session.main_access_key in scheduler db_source — SessionRow has
  no main_access_key attribute.
- UserService delegate path now resolves target_main_access_key via
  UserRepository.get_main_access_key_by_id and forwards it to
  delegate_endpoint_ownership.
- Drop stale changes/BA-5650-H.breaking.md (slice H ships 11050.breaking.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Resolve conflict markers in scheduler db_source by keeping the
  user-table-joined ``main_access_key`` lookup (single source of truth)
  rather than the obsolete ``session.access_key`` snapshot.
- Drop duplicate ``resolve_main_access_keys`` definition introduced
  by the rebase.
- Use ``spec.access_key`` (SessionCreationSpec) where applicable —
  the spec dataclass keeps ``access_key`` for now.
- Test fixture ``_make_session_data`` now passes ``owner_id`` instead
  of relying on attribute assignment after construction.
- Use ``main_access_key`` on PendingSessionData in provisioner test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
REST v1 session endpoints no longer accept owner_access_key. The
delegation field is replaced with owner_id (user UUID) and is honored
only on the three session-creation endpoints
(create_from_template / create_from_params / create_cluster). Read and
control endpoints always act as the authenticated caller.

- common/dto/manager/session/request.py: drop the owner_access_key
  field from Create/Destroy/Restart/GetContainerLogs/GetStatusHistory
  request DTOs; add owner_id to the three creation DTOs.
- api/rest/session/handler.py: remove all 26 resolve_access_key_scope
  calls, drop the AuthProcessors dependency, and renumber log format
  placeholders that dropped the owner argument.
- api/rest/v2/session/handler.py: drop the redundant user_ctx /
  access_key arguments from shutdown_service / get_logs / update.
- api/adapters/session.py: update adapter call sites accordingly.
- api/rest/tree.py: drop the auth= argument from SessionHandler().

Test updates for the corresponding DTO assertions and component
fixtures are included.
Final slice of the BA-5650 stack split. Contains:

- Remaining ORM touch-ups: models/endpoint/row.py,
  models/keypair/row.py, repositories/scheduler/{repository,db_source}.py,
  api/adapters/vfolder.py, api/gql_legacy/endpoint.py,
  api/gql_legacy/routing.py.
- Test updates that depend on the action/service/DTO renames already
  landed in earlier slices: adapter/session, scheduler repositories,
  session lifecycle/service, sokovan scheduler suite,
  compute_sessions handler, dependency injection tests.
- Autouse ``_user_context`` fixture under
  tests/unit/manager/services/session/conftest.py so service tests work
  without the auth middleware.
- Resolve conflict markers in scheduler db_source by keeping the
  user-table-joined ``main_access_key`` lookup (single source of truth)
  rather than the obsolete ``session.access_key`` snapshot.
- Drop duplicate ``resolve_main_access_keys`` definition introduced
  by the rebase.
- Use ``spec.access_key`` (SessionCreationSpec) where applicable —
  the spec dataclass keeps ``access_key`` for now.
- Test fixture ``_make_session_data`` now passes ``owner_id`` instead
  of relying on attribute assignment after construction.
- Use ``main_access_key`` on PendingSessionData in provisioner test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final slice of the BA-5650 stack split. Contains:

- Remaining ORM touch-ups: models/endpoint/row.py,
  models/keypair/row.py, repositories/scheduler/{repository,db_source}.py,
  api/adapters/vfolder.py, api/gql_legacy/endpoint.py,
  api/gql_legacy/routing.py.
- Test updates that depend on the action/service/DTO renames already
  landed in earlier slices: adapter/session, scheduler repositories,
  session lifecycle/service, sokovan scheduler suite,
  compute_sessions handler, dependency injection tests.
- Autouse ``_user_context`` fixture under
  tests/unit/manager/services/session/conftest.py so service tests work
  without the auth middleware.
- Resolve conflict markers in scheduler db_source by keeping the
  user-table-joined ``main_access_key`` lookup (single source of truth)
  rather than the obsolete ``session.access_key`` snapshot.
- Drop duplicate ``resolve_main_access_keys`` definition introduced
  by the rebase.
- Use ``spec.access_key`` (SessionCreationSpec) where applicable —
  the spec dataclass keeps ``access_key`` for now.
- Test fixture ``_make_session_data`` now passes ``owner_id`` instead
  of relying on attribute assignment after construction.
- Use ``main_access_key`` on PendingSessionData in provisioner test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final slice of the BA-5650 stack split. Contains:

- Remaining ORM touch-ups: models/endpoint/row.py,
  models/keypair/row.py, repositories/scheduler/{repository,db_source}.py,
  api/adapters/vfolder.py, api/gql_legacy/endpoint.py,
  api/gql_legacy/routing.py.
- Test updates that depend on the action/service/DTO renames already
  landed in earlier slices: adapter/session, scheduler repositories,
  session lifecycle/service, sokovan scheduler suite,
  compute_sessions handler, dependency injection tests.
- Autouse ``_user_context`` fixture under
  tests/unit/manager/services/session/conftest.py so service tests work
  without the auth middleware.
Schema-only step following the BA-5650 refactor stack. The access_key
column is removed from both the sessions and kernels tables;
downstream code resolves the owner's main_access_key from the users
table at read time (keypair-scoped concurrency tracking, resource
policy lookups, agent RPC payloads). The user_uuid column stays on
both tables as the canonical owner reference.

Adds alembic migration 8c1d2e3f4a5b_drop_session_kernel_access_key
on top of 2a531e0c528e.
- Drop ix_kernels_unique_sess_token index by name and drop columns
  directly, instead of probing inspector for index names.
- Tighten 11040.breaking.md to follow the project's single-sentence
  fragment convention.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop conflict markers in scheduler db_source and user/service.py
  introduced by rebase against the updated slice I (preserve the
  user-table-joined main_access_key resolution).
- Drop duplicate ``resolve_main_access_keys`` definition.
- Use ``spec.access_key`` for SessionCreationSpec in scheduler db_source.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the feat/BA-5653-drop-session-access-key-columns branch from 0a3d4b4 to 23aeac6 Compare April 14, 2026 10:32
Co-authored-by: octodog <mu001@lablup.com>
@jopemachine jopemachine force-pushed the refactor/BA-5650-I-remaining branch from 10229f8 to edba2e7 Compare April 15, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:common Related to Common component comp:manager Related to Manager component comp:webserver Related to Web Server component require:db-migration Automatically set when alembic migrations are added or updated size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants