Skip to content

Commit 81af7f0

Browse files
seedspiritHyeockJinKimCopilot
authored
docs(BA-3100): Document deployment revision generator in deployment README.md (#6872)
Co-authored-by: HyeockJinKim <hyeokjin@lablup.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 1e5b103 commit 81af7f0

2 files changed

Lines changed: 105 additions & 8 deletions

File tree

changes/6872.doc.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Document deployment revision generator in deployment README.md

src/ai/backend/manager/sokovan/deployment/README.md

Lines changed: 104 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ DeploymentCoordinator acts as the top-level orchestrator of deployment lifecycle
7474

7575
### DeploymentController
7676

77-
DeploymentController performs the actual control logic for deployments. It handles basic CRUD operations such as creating, updating, and deleting deployments, and manages the replica count for each deployment. It automatically generates session definitions tailored to deployment types (vLLM, TGI, SGLang, etc.) and applies configured auto-scaling policies to deployments.
77+
DeploymentController performs the actual control logic for deployments. It handles basic CRUD operations such as creating, updating, and deleting deployments, and manages the replica count for each deployment. It automatically generates final revision spec tailored to deployment types (vLLM, TGI, SGLang, etc.) with deployment user request, service-definition toml file, and applies configured auto-scaling policies to deployments.
7878

7979
**Key Methods:**
8080
- `create_deployment()`: Creates a new deployment
@@ -87,25 +87,121 @@ DeploymentController performs the actual control logic for deployments. It handl
8787
```
8888
1. Receive deployment request
8989
90-
2. Select Definition Generator
90+
2. Create draft deployment dataclass with draft model revision (DeploymentCreationDraft)
9191
92-
3. Create session creation request
92+
3. Select Revision Generator based on runtime variant
9393
94-
4. Request validation from SchedulingController
94+
4. Generate final deployment creator
95+
├─ Load service definition from vfolder (if exists)
96+
├─ Merge service definition with API request
97+
└─ Validate the final revision
9598
96-
5. Request session creation from Scheduler
99+
5. Validate session specification with SchedulingController
97100
98-
6. Save deployment record (PENDING)
101+
6. Request session creation from Scheduler
99102
100-
7. Request route creation from RouteController
103+
7. Save deployment record (PENDING)
104+
105+
8. Request route creation from RouteController
101106
```
102107

103108
## Definition Generators
104109

105-
Definition Generator is a strategy pattern that generates appropriate session creation requests based on deployment type. Since each deployment type (vLLM, TGI, SGLang, NIM, etc.) has different images, environment variables, and resource requirements, it is designed to abstract these differences and handle them through a consistent interface.
110+
Definition Generators transform finalized model revisions into runtime-specific session configurations. Each deployment type (vLLM, TGI, SGLang, NIM, etc.) requires different container images, environment variables, and resource requirements. Definition Generators abstract these runtime-specific differences and provide a consistent interface for session creation.
106111

107112
> **Note**: Specific configurations for each deployment type will be managed as DB-based fixtures in the future.
108113
114+
**Supported Runtime Variants:**
115+
- **vLLM**: Optimized for vLLM inference engine
116+
- **HUGGINGFACE_TGI**: Hugging Face Text Generation Inference
117+
- **SGLANG**: SGLang inference framework
118+
- **NIM**: NVIDIA Inference Microservices
119+
- **MODULAR_MAX**: Modular MAX inference engine
120+
- **CMD**: Custom command-based execution
121+
- **CUSTOM**: User-defined custom configurations
122+
123+
124+
## Revision Generators
125+
126+
Revision Generators process draft model revisions from API requests and produce validated, deployment-ready model revision specifications. They handle the integration of service definitions stored in vfolders with user-provided parameters, implementing a sophisticated override mechanism.
127+
128+
**Key Responsibilities:**
129+
- Load and parse service definition files from vfolders
130+
- Merge service definitions with API request parameters
131+
- Validate final revision specifications
132+
- Support variant-specific validation rules
133+
134+
**Service Definition Override Mechanism:**
135+
136+
Service definitions are TOML files stored in model vfolders that provide default configurations for deployments. The override priority follows a three-level hierarchy:
137+
138+
```
139+
1. Root-level service definition (lowest priority, base configuration)
140+
141+
2. Runtime variant-specific section (field-level override)
142+
143+
3. API request parameters (highest priority, explicit user input)
144+
```
145+
146+
**Service Definition Structure:**
147+
148+
```toml
149+
# service-definition.toml
150+
151+
# Root level - Default configuration for all variants
152+
[environment]
153+
image = "default-inference:latest"
154+
architecture = "x86_64"
155+
156+
[resource_slots]
157+
cpu = 4
158+
mem = "16gb"
159+
gpu = 1
160+
161+
[environ]
162+
LOG_LEVEL = "INFO"
163+
MAX_BATCH_SIZE = "32"
164+
165+
# vLLM variant - Overrides specific fields only
166+
[vllm.environment]
167+
image = "vllm-optimized:0.4.0"
168+
169+
[vllm.resource_slots]
170+
cpu = 8
171+
gpu = 2
172+
173+
[vllm.environ]
174+
VLLM_GPU_MEMORY_UTILIZATION = "0.95"
175+
```
176+
177+
**Override Resolution Example:**
178+
179+
For a vLLM deployment, the final configuration merges:
180+
- `environment.image`: `"vllm-optimized:0.4.0"` (from vllm variant)
181+
- `environment.architecture`: `"x86_64"` (from root, not overridden)
182+
- `resource_slots.cpu`: `8` (from vllm variant)
183+
- `resource_slots.mem`: `"16gb"` (from root, not overridden)
184+
- `resource_slots.gpu`: `2` (from vllm variant)
185+
- `environ`: `{LOG_LEVEL: "INFO", MAX_BATCH_SIZE: "32", VLLM_GPU_MEMORY_UTILIZATION: "0.95"}` (merged)
186+
187+
If the API request specifies `resource_slots.gpu = 4`, the final value will be `4` (API request overrides all).
188+
189+
**Revision Generation Process:**
190+
191+
```
192+
1. Load service-definition.toml from vfolder (if exists)
193+
194+
2. Merge service definition with API request (draft revision)
195+
├─ Service definition provides defaults
196+
└─ API request overrides specific fields
197+
198+
3. Validate final ModelRevisionSpec
199+
200+
4. Return validated ModelRevisionSpec
201+
```
202+
203+
> **Note**: Service definition files are optional. If no service definition exists, the API request must provide all required configuration. When a service definition is present, it serves as a template that users can selectively override through API parameters, reducing repetitive configuration for commonly deployed models.
204+
109205
## State-Specific Handlers
110206

111207
The deployment system provides specialized handlers for each deployment state, performing necessary operations for each state.

0 commit comments

Comments
 (0)