Skip to content

Conversation

hiento09
Copy link
Contributor

@hiento09 hiento09 commented Oct 2, 2025

This pull request introduces a comprehensive set of changes to support AI model management and Kubernetes integration in the API gateway. The main additions are new domain models for AI models, a Kubernetes service for cluster and GPU management, and the necessary dependency injection wiring to expose these features via HTTP APIs. The changes are grouped below by theme.

AI Model Management:

  • Added a new models domain package (apps/jan-api-gateway/application/app/domain/organization/models/model.go) defining types for model creation, resource requirements, model status, filtering, and cluster/GPU resource summaries. This enables structured management of AI models within organizations.

Kubernetes Integration:

  • Introduced a new KubernetesService (apps/jan-api-gateway/application/app/infrastructure/kubernetes/kubernetes_service.go) that provides cluster connectivity, CRD checks, GPU node/resource discovery, and storage class validation. This service is essential for deploying and managing models on Kubernetes clusters.

Dependency Injection Wiring:

  • Registered the new ModelService and Kubernetes-related services (NewKubernetesService, NewModelDeploymentManager) in the service and infrastructure provider sets, making them available for use throughout the application. [1] [2]

HTTP API Exposure:

  • Added new HTTP route providers for models and Kubernetes APIs, enabling external access to model management and cluster status endpoints.

General Integration:

  • Updated imports in the relevant provider and route files to include the new model and Kubernetes modules, ensuring all new functionality is properly wired into the application. [1] [2]

Issue: #128

@hiento09 hiento09 self-assigned this Oct 2, 2025
@hiento09 hiento09 force-pushed the feat/organization-models branch from 6a3e45a to b2defa0 Compare October 2, 2025 04:51
@hiento09
Copy link
Contributor Author

hiento09 commented Oct 2, 2025

Example: request

curl -X 'POST' \
  'http://localhost:64185/v1/organization/models' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer *****' \
  -H 'Content-Type: application/json' \
  -d '{
  "command": [
    "sh", "-c", "python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8000 --uvicorn-log-level warning --model janhq/Jan-v1-2509 --served-model-name jan-v1-4b --max-num-batched-tokens 16384 --enable-auto-tool-choice --max-model-len 131072 --tool-call-parser hermes --reasoning-parser qwen3 --compilation-config '\''{\"cudagraph_mode\":\"FULL_AND_PIECEWISE\",\"compile_sizes\":[1,2,4]}'\'' --async-scheduling --api-server-count 4"
  ],
  "description": "jan-v1-4b",
  "display_name": "jan-v1-4b",
  "gpu_count": 1,
  "image": "registry.menlo.ai/dockerhub/vllm/vllm-openai:v0.10.2",
  "initial_delay_seconds": 600,
  "name": "jan-v1-4b",
  "replicas": 1,
  "storage_size": 30,
  "tags": []
}'

Response:

{
  "model": {
    "id": "jan-v1-4b",
    "organization_id": 1,
    "display_name": "jan-v1-4b",
    "description": "jan-v1-4b",
    "status": "creating",
    "version": "",
    "requirements": {
      "cpu": "1",
      "memory": "2Gi",
      "gpu": {
        "min_vram": "8Gi",
        "preferred_vram": "16Gi",
        "gpu_type": "nvidia",
        "min_gpus": 1,
        "max_gpus": 1
      }
    },
    "namespace": "jan-models",
    "deployment_name": "jan-v1-4b",
    "service_name": "jan-v1-4b",
    "tags": [],
    "managed": true,
    "created_at": "2025-10-02T04:24:27.173970526Z",
    "updated_at": "2025-10-02T04:24:27.173970831Z",
    "created_by_user_id": "user_igjyupzhnmh56bh9x3hj2v80"
  }
}

@hiento09
Copy link
Contributor Author

hiento09 commented Oct 9, 2025

Close due to domain route conflict

@hiento09 hiento09 closed this Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant