| Author | Bokeum Kim (bkkim@lablup.com) |
|---|---|
| Status | Accepted |
| Created | 2025-07-14 |
| Created-Version | |
| Target-Version | |
| Implemented-Version |
GraphQL Schema for new Model Service
Following up of BEP-1009 Model Serving Registry, this BEP proposes a comprehensive GraphQL schema for the new model serving service in Backend.AI
The GraphQL schema introduces two primary entities and their supporting types to manage model serving infrastructure:
-
ModelDeployment: The top-level entity representing a model service deployment. It manages:
- Active revision and revision history
- Deployment strategy (Rolling, Blue-Green, Canary)
- Public endpoint configuration and custom domain
- Domain and project associations
- Replica management with auto-scaling rules
-
ModelRevision: Represents an immutable version of a model within a deployment. It handles:
- Container image and runtime configuration
- Model storage and mounting configuration
- Cluster configuration and resource groups
- Resource requirements and runtime variant
- Service-specific configurations
- Additional volume mounts
A deployment is the top-level concept that manages multiple revisions.
id: Unique identifiername: Deployment name (unique within domain)endpointUrl: Public URL for service access (e.g., https://llama-3.model.ai)preferredDomainName(optional): Custom domain name preferencestatus: Deployment status- ACTIVE, HIBERNATED, PROVISIONING, UPDATING, FAILED, DESTROYING, DESTROYED
openToPublic: Whether accessible to publictags: Tags for model service
replicaManagement:desiredReplicaCount: Number of replicas to be achievedreplicas: List ofModelReplicas managed by deploymentroutings: Routing information of each replica
deploymentStrategytype: Types of Model Deploy Strategyconfig: Configs needed for deployment strategy (e.g., maxSurge)
revision: Current active ModelRevision that deployment referencesrevisionHistory: List of previous ModelRevisions deployment referenced
domain: Backend.AI domainproject: Owning project (group)createdUser: User who created the deploymentrevisions: All revisions belonging to this deploymentaccessTokens: Access Tokens for endpoint url
createdAt: Creation timestampupdatedAt: Last update timestamp
enum DeploymentStatus {
INACTIVE
ACTIVE
}
type ReplicaManagement {
desiredReplicaCount: Int!
replicas: [ModelReplica!]!
autoScalingRules: [AutoScalingRule!]!
}
type ModelReplica {
name: String!
revision: ModelRevision!
routings(first: Int, after: String): RoutingConnection!
}
enum DeploymentStrategyType {
ROLLING
BLUE_GREEN
CANARY
}
union StrategyConfig = RollingConfig | BlueGreenConfig | CanaryConfig
type DeploymentStrategy {
type: DeploymentStrategyType!
config: StrategyConfig
}
type ModelDeployment {
id: ID!
name: String!
endpointUrl: String
preferredDomainName: String
status: DeploymentStatus!
openToPublic: Boolean!
tags: [String!]!
revision: ModelRevision
revisionHistory: [ModelRevision!]!
replicaManagement: ReplicaManagement!
deploymentStrategy: DeploymentStrategy!
domain: Domain!
project: Project!
createdUser: User!
resourceGroup: ResourceGroup!
accessTokens: [AccessToken!]!
createdAt: DateTime!
updatedAt: DateTime!
}type ReplicaMetric {
... # TODO: Fill up metric fields
}
type DeploymentMetrics {
replicaMetrics: [ReplicaMetric!]!
}
type Query {
# Deployment Queries
deployments(filter: DeploymentFilter, limit: Int, offset: Int): [ModelDeployment!]!
deployment(id: ID!): Deployment
# Replica Queries
replica(id: ID!): ModelReplica
# Metrics
deploymentMetrics(id: ID!, filter: DeploymentMetricsFilter): [DeploymentMetrics!]!
}
type DeploymentMetrics {
replicaMetrics: [ReplicaMetric!]!
}
type Subscription {
# Real-time Updates
deploymentStatusChanged(deploymentId: ID!): ModelDeployment!
replicaStatusChanged(revisionId: ID!): ModelReplica!
metricsUpdated(deploymentId: ID!): DeploymentMetrics!
}
type Mutation {
# Deployment Management
createModelDeployment(input: CreateModelServingDeploymentInput! ): CreateModelDeploymentPayload!
updateModelDeployment(input: UpdateModelServingDeploymentInput! ): UpdateModelDeploymentPayload!
deleteModelServingDeployment(id: ID!): ID!
}
A revision represents a specific version of a model service. It contains session launch configs which is immutable. If you want to update certain value of revision, user must publish new revision
id: Unique identifiername: Revision name (optional)tags: List of revision tagsstatus: Revision status (ACTIVE, INACTIVE)
clusterConfig:mode: Cluster mode ('single-node' or 'multi-node')size: Number of nodes in cluster
resourceConfig: Resource requirements and additional optionsresourceSlots: Required resource slot informationcpumemextra
resourceOpts: Additional resource options (e.g., {"shmem": "64m"})shmemextra
resourceGroup: resource group for revision
ModelRuntimeConfig: Runtime type and service configurationruntimeVariant: Runtime type (VLLM, SGLANG, NVIDIA, MOJO, etc.)serviceConfig: Service-specific configuration (e.g., for vLLM: max_model_length, parallelism, etc.)environ: Container environment variables (JSON)
modelVFolderConfig: Model file and mount informationvfolder: Virtual Folder where the model is storedmountDestination: Mount path inside the container (default: /models)definitionPath: Model definition file path (default: model-definition.yaml)
mounts: List of additional volume mounts (each item: vfolderId, destination, type, permission)image: Container image information used
errorData: Error information if failedcreatedAt: Creation timestamp
scalar JSONString
enum ClusterMode {
SINGLE_NODE
MULTI_NODE
}
type ClusterConfig {
mode: ClusterMode!
size: Int!
}
enum MountPermission {
READ_ONLY
READ_WRITE
}
type Mount {
vfolderId: ID!
destination: String!
type: MountType!
permission: MountPermission!
}
type ModelVFolderConfig {
vfolder: VirtualFolderNode!
mountDestination: String!
definitionPath: String!
}
type ResourceSlots {
cpu: Int!
mem: String!
extra: JSONString
}
type ResourceOpts {
shmem: String
extra: JSONString
}
type ResourceConfig {
resourceGroup: ResourceGroup!
resourceSlots: ResourceSlots!
resourceOpts: ResourceOpts
}
union ServiceConfig = VLLMServiceConfig | SGLANGServiceConfig | NVIDIAServiceConfig | MOJOServiceConfig | CustomServiceConfig
type ModelRuntimeConfig {
runtimeVariant: String!
serviceConfig: ServiceConfig
environ: JSONString
}
type ModelRevision {
id: ID!
name: String
tags: [String!]!
clusterConfig: ClusterConfig!
resourceConfig: ResourceConfig!
modelRuntimeConfig: ModelRuntimeConfig!
modelVFolderConfig: ModelVFolderConfig!
mounts: [Mount!]!
image: Image!
# Error and Metadata
errorData: JSONString
createdAt: DateTime!
}
type Query {
# Revision Queries
revision(id: ID!): ModelRevision
revisions(filter: ModelRevisionFilter, order: ModelRevisionOrder, first, after): [Revision!]!
}
type Subscription {
# Real-time Updates
deploymentStatusChanged(deploymentId: ID!): Deployment!
replicaStatusChanged(revisionId: ID!): ReplicaInstance!
metricsUpdated(deploymentId: ID!): DeploymentMetrics!
}
type Mutation {
createModelRevision(input: CreateModelRevisionInput! ): CreateModelRevisionPayload!
}Fields required for creating a new deployment:
name: Deployment name (unique within domain)preferredDomainName(optional): Custom domain preferencedomain: Backend.AI domain name or IDproject: Project (group) ID or nameopenToPublic: Whether publicly accessibletags(optional): Tags for model serviceclusterConfig: Cluster configurationmode: Single-node or multi-nodesize: Number of nodes in cluster
deploymentStrategy: Deployment strategy configurationtype: ROLLING, BLUE_GREEN, or CANARYconfig: Strategy-specific configuration
initialRevision: Initial revision configuration
Fields required for creating a new revision:
name(optional): Revision nametags(optional): List of revision tagsimage: Container image informationname: Image name with tagarchitecture: CPU architecture
modelRuntimeConfig: Runtime configurationruntimeVariant: VLLM, SGLANG, NVIDIA, MOJO, or customserviceConfig: Service-specific configurationenviron(optional): Environment variables (JSON)
modelVFolderConfig: Model folder configurationvfolderId: Model VFolder IDmountDestination: Model mount path (default: /models)definitionPath: Model definition file path
mounts(optional): Additional volume mountsresourceConfig: Resource configurationresourceGroup: Resource group for deploymentresourceSlots: Resource requirementscpumemextra
resourceOpts(optional): Additional resource optionsshmemextra
query GetDeploymentDetails {
deployment(id: "deployment-uuid") {
id
name
endpointUrl
preferredDomainName
status
openToPublic
tags
revision {
id
name
tags
status
}
revisionHistory(
first: 10
orderBy: { field: CREATED_AT, direction: DESC }
) {
edges {
node {
id
name
tags
createdAt
}
}
pageInfo {
hasNextPage
endCursor
}
}
replicaManagement {
desiredReplicaCount
replicas {
name
revision {
id
name
}
routings(first: 10) {
edges {
node {
id
status
}
}
}
}
autoScalingRules {
id
metricType
threshold
}
}
clusterConfig {
mode
size
}
deploymentStrategy {
type
config
}
domain {
name
}
project {
name
}
createdUser {
email
}
createdAt
updatedAt
}
}enum OrderDirection {
ASC
DESC
}
enum DeploymentOrderField {
CREATED_AT
UPDATED_AT
}
input DeploymentOrderBy {
field: DeploymentOrderField!
direction: OrderDirection!
}
query ListDeployments {
deployments(
filter: {
status: ACTIVE
openToPublic: true
}
orderBy: { field: CREATED_AT, direction: DESC }
limit: 20
offset: 0
) {
id
name
endpointUrl
status
tags
openToPublic
revision {
id
name
tags
status
}
clusterConfig {
mode
size
}
replicaManagement {
desiredReplicaCount
replicas {
name
}
autoScalingRules {
id
metricType
threshold
}
}
}
}query GetRevisionDetails {
revision(id: "revision-uuid") {
id
name
tags
status
resourceConfig {
resourceGroup {
name
}
resourceSlots {
cpu
mem
}
resourceOpts {
shmem
}
}
modelRuntimeConfig {
runtimeVariant
serviceConfig
environ
}
modelVFolderConfig {
vfolder {
id
name
}
mountDestination
definitionPath
}
mounts {
vfolderId
destination
type
permission
}
image {
name
architecture
}
errorData
createdAt
}
}mutation CreateSimpleDeployment {
createModelDeployment(input: {
name: "llama-3-service"
domain: "default"
project: "ml-team-project-id"
openToPublic: true
tags: ["production", "llm"]
clusterConfig: {
mode: SINGLE_NODE
size: 1
}
deploymentStrategy: {
type: ROLLING
config: {
maxSurge: 1
maxUnavailable: 0
}
}
initialRevision: {
name: "initial"
tags: ["v1.0", "stable"]
image: {
name: "vllm:0.9.1"
architecture: "x86_64"
}
modelRuntimeConfig: {
runtimeVariant: "VLLM"
serviceConfig: {
maxModelLength: 4096
parallelism: {
ppSize: 1
tpSize: 2
}
}
environ: "{\"CUDA_VISIBLE_DEVICES\": \"0,1\"}"
}
modelVFolderConfig: {
vfolderId: "eeb8c377-15d2-4a16-8ed8-01215f3a5353"
mountDestination: "/models"
definitionPath: "model-definition.yaml"
}
mounts: [
{
vfolderId: "550e8400-e29b-41d4-a716-446655440001"
destination: "/data"
type: BIND
permission: READ_ONLY
}
]
resourceConfig: {
resourceGroup: {
name: "gpu-cluster"
}
resourceSlots: {
cpu: 8
mem: "8Gib"
}
resourceOpts: {
shmem: "64m"
}
}
}
}) {
deployment {
id
name
endpointUrl
status
tags
revision {
id
name
tags
status
}
}
}
}mutation CreateExpertDeployment {
createModelDeployment(input: {
name: "falcon-7b-optimized"
preferredDomainName: "falcon.mycompany.ai"
domain: "default"
project: "research-team-id"
openToPublic: false
tags: ["experimental", "falcon"]
clusterConfig: {
mode: MULTI_NODE
size: 3
}
deploymentStrategy: {
type: CANARY
config: {
canaryPercentage: 10
canaryDuration: "30m"
successThreshold: 95
}
}
initialRevision: {
name: "baseline"
tags: ["v1.0", "baseline"]
image: {
name: "python-tcp-app:3.9-ubuntu20.04"
architecture: "x86_64"
}
modelRuntimeConfig: {
runtimeVariant: "CUSTOM"
serviceConfig: {
extraCliParameters: "--trust-remote-code --enable-lora --gpu-memory-utilization 0.95"
}
environ: "{\"CUDA_VISIBLE_DEVICES\": \"0,1,2,3\"}"
}
modelVFolderConfig: {
vfolderId: "550e8400-e29b-41d4-a716-446655440000"
mountDestination: "/models"
definitionPath: "model-definition.yaml"
}
mounts: [
{
vfolderId: "7a83e195-7410-4768-a338-a949cef6be83"
destination: "/home/work/datasets"
type: BIND
permission: READ_WRITE
}
]
resourceConfig: {
resourceGroup: {
name: "gpu-premium"
}
resourceSlots: {
mem: "96g"
cpu: 16
}
resourceOpts: {
shmem: "128m"
}
}
}
}) {
deployment {
id
name
endpointUrl
preferredDomainName
tags
clusterConfig {
mode
size
}
deploymentStrategy {
type
config
}
}
}
}mutation CreateNewRevision {
createModelRevision(input: {
deploymentId: "deployment-uuid"
name: "optimized-version"
tags: ["v2.0", "optimized"]
image: {
name: "vllm:0.9.2"
architecture: "x86_64"
}
modelRuntimeConfig: {
runtimeVariant: "VLLM"
serviceConfig: {
maxModelLength: 8192
parallelism: {
ppSize: 2
tpSize: 4
}
extraCliParameters: "--enable-lora"
}
environ: "{\"CUDA_VISIBLE_DEVICES\": \"0,1,2,3\"}"
}
modelVFolderConfig: {
vfolderId: "550e8400-e29b-41d4-a716-446655440000"
mountDestination: "/models"
definitionPath: "model-definition.yaml"
}
mounts: []
resourceConfig: {
resourceGroup: {
name: "gpu-premium"
}
resourceSlots: {
mem: "96g"
cpu: 16
}
resourceOpts: {
shmem: "128m"
}
}
}) {
revision {
id
name
tags
status
createdAt
}
}
}mutation UpdateDeployment {
updateModelDeployment(input: {
id: "deployment-uuid"
openToPublic: true
tags: ["production", "llm", "updated"]
deploymentStrategy: {
type: BLUE_GREEN
config: {
autoPromotionEnabled: true
terminationWaitTime: 300
}
}
}) {
deployment {
id
name
openToPublic
tags
deploymentStrategy {
type
config
}
}
}
}mutation SwitchActiveRevision {
updateModelDeployment(input: {
id: "deployment-uuid"
activeRevisionId: "new-revision-uuid"
}) {
deployment {
id
name
revision {
id
name
tags
status
}
revisionHistory {
id
name
tags
}
}
}
}mutation DeleteDeployment {
deleteModelServingDeployment(id: "deployment-uuid") {
deletedId
}
}-
For Users:
Users will benefit from a more flexible and robust model serving experience, including easier deployment management, versioning, and scaling. The public endpoint and domain configuration features simplify access and integration. -
For Developers:
Developers gain a well-structured GraphQL API for managing model deployments and revisions. The schema supports advanced deployment strategies (e.g., canary, blue-green), resource configuration, and traffic management, reducing operational complexity.