Skip to content

docs: KubeAIRunway Hub - Multi-Instance Portal Implementation Plan#67

Open
sozercan wants to merge 2 commits into
mainfrom
hub-implementation-plan
Open

docs: KubeAIRunway Hub - Multi-Instance Portal Implementation Plan#67
sozercan wants to merge 2 commits into
mainfrom
hub-implementation-plan

Conversation

@sozercan

Copy link
Copy Markdown
Member

Summary

This PR adds the implementation plan for KubeAIRunway Hub — a central portal for managing multiple KubeAIRunway instances across clusters with OAuth authentication.

Key Features Planned

  • OAuth login — Azure Entra ID + GitHub (admin-configurable)
  • Portal-as-proxy — Users never see cluster credentials
  • Multi-instance management — Register and monitor multiple clusters
  • Role-based access — Admin assigns users/groups to instances + namespaces
  • Azure Entra group sync — Automatic access mapping from security groups
  • Instance health dashboard — GPU capacity, status, active deployments
  • Namespace isolation — Users restricted to assigned namespaces
  • Credential security — Azure Key Vault via Secrets Store CSI driver
  • Auto-refresh — Rotated credentials picked up without pod restart

Architecture

  • App database: PostgreSQL (HA-ready)
  • Credential storage: Azure Key Vault via Secrets Store CSI
  • Auth: OAuth 2.0 with PKCE (Entra) / Authorization Code (GitHub)
  • Sessions: JWT access + refresh tokens

Implementation Phases

  1. Database & Data Model (PostgreSQL schema, ORM, repositories)
  2. OAuth Authentication (Entra + GitHub providers, session management)
  3. Instance Management (CSI credential loading, instance registry, cluster proxy)
  4. RBAC & Group Sync (roles, namespace isolation, Entra group mapping)
  5. Frontend Hub UI (OAuth login, instance dashboard, scoped views, admin pages)
  6. Deployment & Configuration (K8s manifests, credential auto-refresh)

Future Work (noted, not in scope)

  • Audit logging
  • API tokens / personal access tokens
  • GPU quotas per namespace/user
  • Cost visibility / chargeback per user

See docs/hub-implementation-plan.md for the full plan.

Add comprehensive implementation plan for extending KubeAIRunway with:
- Central portal/hub for multi-instance management
- OAuth authentication (Azure Entra ID + GitHub)
- Portal-as-proxy architecture (users never see cluster credentials)
- Azure Key Vault via Secrets Store CSI for credential storage
- PostgreSQL for app data (users, roles, sessions)
- Role-based access control with namespace isolation
- Azure Entra group sync for automatic access mapping
- Instance health dashboard with GPU capacity visibility
- Credential auto-refresh on rotation

@aramase aramase left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.

Comment thread docs/hub-implementation-plan.md Outdated
- Auto-refresh of rotated credentials (CSI volume watch)

### Future Work (noted, not implemented)
- Audit logging (all user actions to PostgreSQL)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be v1 because it's a multi-tenant access proxy

Comment thread docs/hub-implementation-plan.md Outdated
- Create `backend/src/services/oauth/` directory with provider interface
- Define `OAuthProvider` interface: `getAuthUrl()`, `exchangeCode()`, `getUserInfo()`, `refreshToken()`
- Implement Azure Entra ID provider:
- OIDC discovery (`https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- OIDC discovery (`https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration`)
- OIDC discovery (`https://login.microsoftonline.com/{tenant}/v2.0/.well-known/openid-configuration`)

Comment thread docs/hub-implementation-plan.md Outdated
- Token exchange, refresh token handling
- Extract user info + group memberships from ID token / `/me/memberOf` Graph API
- Implement GitHub provider:
- OAuth App flow (authorization code)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PKCE for this flow as well?

Comment thread docs/hub-implementation-plan.md Outdated
- Implement GitHub provider:
- OAuth App flow (authorization code)
- Token exchange via `https://github.com/login/oauth/access_token`
- User info from `https://api.github.com/user`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the email field will be null if the user has set their email to private. The plan should be:

  • Request the user:email scope
  • Call GET https://api.github.com/user/emails to retrieve verified emails
  • Select the primary verified email as the user identifier

Comment thread docs/hub-implementation-plan.md Outdated
- OIDC discovery (`https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration`)
- Authorization code + PKCE flow
- Token exchange, refresh token handling
- Extract user info + group memberships from ID token / `/me/memberOf` Graph API

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread docs/hub-implementation-plan.md Outdated
- OAuth App flow (authorization code)
- Token exchange via `https://github.com/login/oauth/access_token`
- User info from `https://api.github.com/user`
- Org/team membership for group-based access (optional)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is "groups" support for Entra users, so don't we need this for parity for GitHub users?

Comment thread docs/hub-implementation-plan.md Outdated
- Parse kubeconfig files into usable K8s client configs
- File watcher (fs.watch) for auto-refresh when CSI driver rotates secrets
- In-memory cache of parsed credentials, invalidated on file change
- Convention: each file in the mount path = one cluster's credentials, filename = instance identifier

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is filename = instance identifier the best approach here? this creates a tight coupling between AKV secret name and instances.name.

I think we can rely on instances.credential_ref to map to the file name explicitly rather than relying on name convention, and have reconcilation checks.

Comment thread docs/hub-implementation-plan.md Outdated
- Create `backend/src/services/cluster-proxy.ts`:
- Accept requests with instance context (from user's session)
- Validate user has access to target instance + namespace (RBAC check)
- Forward API calls to target cluster's KubeAIRunway using stored credentials

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an allowlist of API paths? what happens if the request is /api/v1/secrets and the stored credentials ends up having broad permissions?

- Move audit logging from Future Work to v1 scope (multi-tenant proxy requirement)
- Fix OIDC discovery URL to use v2.0 endpoint
- Add PKCE explicitly for GitHub OAuth flow
- Handle GitHub private emails via user:email scope + /user/emails API
- Add Entra group overage claim handling (>150 groups fallback to Graph API)
- Promote GitHub org/team sync from optional to required (parity with Entra)
- Use credential_ref for CSI file mapping instead of filename convention
- Add API path allowlist for cluster proxy to prevent credential over-privilege
- Add audit_log table schema in Technical Notes
- Update testing strategy to cover new security features

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants