Skip to content

feat(auth,prompts,inference): multi-tenancy MVP for MaaS deployments#5614

Draft
franciscojavierarceo wants to merge 2 commits intoogx-ai:mainfrom
franciscojavierarceo:worktree-multi-tenancy-mvp
Draft

feat(auth,prompts,inference): multi-tenancy MVP for MaaS deployments#5614
franciscojavierarceo wants to merge 2 commits intoogx-ai:mainfrom
franciscojavierarceo:worktree-multi-tenancy-mvp

Conversation

@franciscojavierarceo
Copy link
Copy Markdown
Collaborator

@franciscojavierarceo franciscojavierarceo commented Apr 24, 2026

Summary

Implements Phase 1 multi-tenancy support for MaaS, llm-d, and vLLM deployments.

  • attribute_headers on UpstreamHeaderAuthConfig — maps multiple HTTP headers to attribute categories (e.g., X-MaaS-Group → teams, X-MaaS-Subscription → namespaces). Values merge with the existing attributes_header field. Enables MaaS Authorino integration where identity is spread across multiple upstream headers.
  • Prompts migrated from KVStore to AuthorizedSqlStore — prompts now have row-level access control via owner_principal and access_attributes, matching the pattern used by conversations, responses, and other stateful resources. Breaking change: existing KV-stored prompts must be recreated.
  • fairness_header_attribute on vLLM config — injects x-gateway-inference-fairness-id on outgoing API calls from the authenticated user's attributes. Used by llm-d EPP Flow Control for per-tenant fair scheduling. Implemented as a _get_extra_request_headers() hook on OpenAIMixin so the pattern is reusable by other providers.

Files changed (12 files, +486 / -133)

Area Files
Auth core/datatypes.py, core/server/auth_providers.py
Prompts core/prompts/prompts.py, core/storage/datatypes.py, core/stack.py
Inference providers/remote/inference/vllm/config.py, providers/remote/inference/vllm/vllm.py, providers/utils/inference/openai_mixin.py
Docs docs/docs/providers/inference/remote_vllm.mdx (auto-generated)
Tests tests/unit/server/test_auth_upstream_header.py, tests/unit/providers/inference/test_remote_vllm.py, tests/unit/prompts/prompts/conftest.py

Design decisions

  1. _get_extra_request_headers() hook vs. direct injection — the RFC suggested injecting the fairness header directly in the vLLM adapter. Instead, this PR adds a hook on OpenAIMixin that any OpenAI-compatible provider can override, using the SDK's extra_headers kwarg on create() calls.

  2. Agent state KV key prefixing skippedpersistence_store in providers/inline/responses/builtin/impl.py is dead code (initialized, never read/written). Actual state uses ResponsesStore which already backs AuthorizedSqlStore.

  3. set_default_version crash safety — new default is set before clearing old defaults, so a crash mid-operation leaves two defaults (recoverable) rather than zero (data loss).

Known limitations (Phase 2)

The following resources still use plain KVStore with no tenant isolation:

Resource Risk Notes
Connectors High May store auth credentials; migrate to AuthorizedSqlStore
Batch state High Contains job results with user data
Vector store metadata High All vector_io providers (FAISS, Chroma, Qdrant, etc.)
Distribution registry Medium Leaks provider info; arguably admin-only
Quota tracking Low Per-client rate limiting; should not be tenant-scoped

Test plan

  • Unit tests: uv run pytest tests/unit/ -x --tb=short (56 new + existing tests pass)
  • Pre-commit hooks: uv run pre-commit run --all-files
  • Integration tests (replay): uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --suite responses
  • Verify attribute_headers with MaaS Authorino headers
  • Verify fairness_header_attribute sends header to llm-d EPP

Generated with Claude Code

franciscojavierarceo and others added 2 commits April 23, 2026 23:02
…oyments

Add multi-header identity mapping for upstream gateway auth (attribute_headers),
migrate prompts from KVStore to AuthorizedSqlStore for tenant-scoped access
control, and add llm-d fairness header propagation through a per-request
header hook in OpenAIMixin.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…rness header tests

Reorder set_default_version to set the new default before clearing old ones,
preventing a crash from leaving zero defaults. Add unit tests for the vLLM
fairness header injection via _get_extra_request_headers covering all code paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 24, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant