feat(auth,prompts,inference): multi-tenancy MVP for MaaS deployments#5614
Draft
franciscojavierarceo wants to merge 2 commits intoogx-ai:mainfrom
Draft
feat(auth,prompts,inference): multi-tenancy MVP for MaaS deployments#5614franciscojavierarceo wants to merge 2 commits intoogx-ai:mainfrom
franciscojavierarceo wants to merge 2 commits intoogx-ai:mainfrom
Conversation
…oyments Add multi-header identity mapping for upstream gateway auth (attribute_headers), migrate prompts from KVStore to AuthorizedSqlStore for tenant-scoped access control, and add llm-d fairness header propagation through a per-request header hook in OpenAIMixin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…rness header tests Reorder set_default_version to set the new default before clearing old ones, preventing a crash from leaving zero defaults. Add unit tests for the vLLM fairness header injection via _get_extra_request_headers covering all code paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Contributor
|
This pull request has merge conflicts that must be resolved before it can be merged. @franciscojavierarceo please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Phase 1 multi-tenancy support for MaaS, llm-d, and vLLM deployments.
attribute_headersonUpstreamHeaderAuthConfig— maps multiple HTTP headers to attribute categories (e.g.,X-MaaS-Group → teams,X-MaaS-Subscription → namespaces). Values merge with the existingattributes_headerfield. Enables MaaS Authorino integration where identity is spread across multiple upstream headers.owner_principalandaccess_attributes, matching the pattern used by conversations, responses, and other stateful resources. Breaking change: existing KV-stored prompts must be recreated.fairness_header_attributeon vLLM config — injectsx-gateway-inference-fairness-idon outgoing API calls from the authenticated user's attributes. Used by llm-d EPP Flow Control for per-tenant fair scheduling. Implemented as a_get_extra_request_headers()hook onOpenAIMixinso the pattern is reusable by other providers.Files changed (12 files, +486 / -133)
core/datatypes.py,core/server/auth_providers.pycore/prompts/prompts.py,core/storage/datatypes.py,core/stack.pyproviders/remote/inference/vllm/config.py,providers/remote/inference/vllm/vllm.py,providers/utils/inference/openai_mixin.pydocs/docs/providers/inference/remote_vllm.mdx(auto-generated)tests/unit/server/test_auth_upstream_header.py,tests/unit/providers/inference/test_remote_vllm.py,tests/unit/prompts/prompts/conftest.pyDesign decisions
_get_extra_request_headers()hook vs. direct injection — the RFC suggested injecting the fairness header directly in the vLLM adapter. Instead, this PR adds a hook onOpenAIMixinthat any OpenAI-compatible provider can override, using the SDK'sextra_headerskwarg oncreate()calls.Agent state KV key prefixing skipped —
persistence_storeinproviders/inline/responses/builtin/impl.pyis dead code (initialized, never read/written). Actual state usesResponsesStorewhich already backsAuthorizedSqlStore.set_default_versioncrash safety — new default is set before clearing old defaults, so a crash mid-operation leaves two defaults (recoverable) rather than zero (data loss).Known limitations (Phase 2)
The following resources still use plain
KVStorewith no tenant isolation:Test plan
uv run pytest tests/unit/ -x --tb=short(56 new + existing tests pass)uv run pre-commit run --all-filesuv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --setup gpt --suite responsesattribute_headerswith MaaS Authorino headersfairness_header_attributesends header to llm-d EPPGenerated with Claude Code