You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/ai/backend/manager/sokovan/deployment/README.md
+104-8Lines changed: 104 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,7 @@ DeploymentCoordinator acts as the top-level orchestrator of deployment lifecycle
74
74
75
75
### DeploymentController
76
76
77
-
DeploymentController performs the actual control logic for deployments. It handles basic CRUD operations such as creating, updating, and deleting deployments, and manages the replica count for each deployment. It automatically generates session definitions tailored to deployment types (vLLM, TGI, SGLang, etc.) and applies configured auto-scaling policies to deployments.
77
+
DeploymentController performs the actual control logic for deployments. It handles basic CRUD operations such as creating, updating, and deleting deployments, and manages the replica count for each deployment. It automatically generates final revision spec tailored to deployment types (vLLM, TGI, SGLang, etc.) with deployment user request, service-definition toml file, and applies configured auto-scaling policies to deployments.
78
78
79
79
**Key Methods:**
80
80
-`create_deployment()`: Creates a new deployment
@@ -87,25 +87,121 @@ DeploymentController performs the actual control logic for deployments. It handl
87
87
```
88
88
1. Receive deployment request
89
89
↓
90
-
2. Select Definition Generator
90
+
2. Create draft deployment dataclass with draft model revision (DeploymentCreationDraft)
91
91
↓
92
-
3. Create session creation request
92
+
3. Select Revision Generator based on runtime variant
93
93
↓
94
-
4. Request validation from SchedulingController
94
+
4. Generate final deployment creator
95
+
├─ Load service definition from vfolder (if exists)
96
+
├─ Merge service definition with API request
97
+
└─ Validate the final revision
95
98
↓
96
-
5. Request session creation from Scheduler
99
+
5. Validate session specification with SchedulingController
97
100
↓
98
-
6. Save deployment record (PENDING)
101
+
6. Request session creation from Scheduler
99
102
↓
100
-
7. Request route creation from RouteController
103
+
7. Save deployment record (PENDING)
104
+
↓
105
+
8. Request route creation from RouteController
101
106
```
102
107
103
108
## Definition Generators
104
109
105
-
Definition Generator is a strategy pattern that generates appropriate session creation requests based on deployment type. Since each deployment type (vLLM, TGI, SGLang, NIM, etc.) has different images, environment variables, and resource requirements, it is designed to abstract these differences and handle them through a consistent interface.
110
+
Definition Generators transform finalized model revisions into runtime-specific session configurations. Each deployment type (vLLM, TGI, SGLang, NIM, etc.) requires different container images, environment variables, and resource requirements. Definition Generators abstract these runtime-specific differences and provide a consistent interface for session creation.
106
111
107
112
> **Note**: Specific configurations for each deployment type will be managed as DB-based fixtures in the future.
108
113
114
+
**Supported Runtime Variants:**
115
+
-**vLLM**: Optimized for vLLM inference engine
116
+
-**HUGGINGFACE_TGI**: Hugging Face Text Generation Inference
117
+
-**SGLANG**: SGLang inference framework
118
+
-**NIM**: NVIDIA Inference Microservices
119
+
-**MODULAR_MAX**: Modular MAX inference engine
120
+
-**CMD**: Custom command-based execution
121
+
-**CUSTOM**: User-defined custom configurations
122
+
123
+
124
+
## Revision Generators
125
+
126
+
Revision Generators process draft model revisions from API requests and produce validated, deployment-ready model revision specifications. They handle the integration of service definitions stored in vfolders with user-provided parameters, implementing a sophisticated override mechanism.
127
+
128
+
**Key Responsibilities:**
129
+
- Load and parse service definition files from vfolders
130
+
- Merge service definitions with API request parameters
131
+
- Validate final revision specifications
132
+
- Support variant-specific validation rules
133
+
134
+
**Service Definition Override Mechanism:**
135
+
136
+
Service definitions are TOML files stored in model vfolders that provide default configurations for deployments. The override priority follows a three-level hierarchy:
137
+
138
+
```
139
+
1. Root-level service definition (lowest priority, base configuration)
If the API request specifies `resource_slots.gpu = 4`, the final value will be `4` (API request overrides all).
188
+
189
+
**Revision Generation Process:**
190
+
191
+
```
192
+
1. Load service-definition.toml from vfolder (if exists)
193
+
↓
194
+
2. Merge service definition with API request (draft revision)
195
+
├─ Service definition provides defaults
196
+
└─ API request overrides specific fields
197
+
↓
198
+
3. Validate final ModelRevisionSpec
199
+
↓
200
+
4. Return validated ModelRevisionSpec
201
+
```
202
+
203
+
> **Note**: Service definition files are optional. If no service definition exists, the API request must provide all required configuration. When a service definition is present, it serves as a template that users can selectively override through API parameters, reducing repetitive configuration for commonly deployed models.
204
+
109
205
## State-Specific Handlers
110
206
111
207
The deployment system provides specialized handlers for each deployment state, performing necessary operations for each state.
0 commit comments