You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(model-serving): update for deployment-only create modal, revision UX, and detail page polish
Part of the recent main-branch docs catch-up plan (Work Item 1 + folded Work Item 9).
Covers numerous user-visible Model Serving changes merged to main between Nov 2025 and May 2026:
- Creating a Model Service: new deployment-only create modal flow (FR-2822) with resource-group auto-select; redesigned Add Revision modal exposing start command, env vars, runtime variant, resource group, preset, and auto-activate (FR-2826/2835/2836/2886/2888/2889/2891); revision name field removed.
- New H3: Preset Mode for Revision Creation (FR-2862/2863).
- Endpoint Detail Page: new deployment alerts — Deployment Ready with chat shortcut (FR-2830), Private Deployment with access-token shortcut (FR-2838), NoCurrentRevision warning (FR-2843).
- Deployment ID display + Visibility row with Public/Private BooleanTag (FR-2833/2834).
- More menu next to Edit, with DeleteFilled vs DeleteOutlined icon convention (FR-2846/2848).
- Replicas: Running/Terminated radio filter replaces enum status filter, new ModelReplica status fields (FR-2891/2904).
- Shared Memory (SHM) display in endpoint detail config (FR-2837).
- Revisions tab: revision number column, expanded filter/sort, unified "Apply" terminology (FR-2858/2902).
- Cross-reference to forthcoming Deployment Presets page (Work Item 11, lands in PR C).
Updated in all 4 languages (en/ko/ja/th). Screenshots are flagged with TODO markers for separate capture.
@@ -297,25 +297,66 @@ Click the `Start Service` button to open the service launcher and create a new m
297
297
298
298
## Creating a Model Service
299
299
300
-
### Service Launcher
300
+
Starting from the recent main branch, creating a model service is a two-step flow:
301
301
302
-
Click the `Start Service` button on the Serving page to open the service launcher.
302
+
1. **Create the deployment** — a lightweight container that defines the deployment's identity (name, visibility, deployment metadata, and resource group).
303
+
2. **Add a revision** — a configuration snapshot that defines what actually runs (start command, environment variables, runtime variant, image, resources, model storage).
303
304
304
-
#### Service Name and Basic Settings
305
+
Each deployment can hold many revisions. Only one revision is *current* (serving traffic) at a time, and you can switch between revisions from the Revisions tab on the Endpoint Detail Page.
305
306
306
-
First, provide a service name. The following fields are available:
307
+
### Create Deployment Modal
307
308
308
-
- **Service Name**: A unique name to identify the endpoint.
309
-
- **Open To Public**: This option allows access to the model service without any separate token. By default, it is disabled.
310
-
- **Model Storage Folder to Mount**: Select the storage folder containing the model files.
311
-
- **Inference Runtime Variant**: Selects the runtime variant for the model service. The available variants are dynamically loaded from the backend and may include `vLLM`, `SGLang`, `NVIDIA NIM`, `Modular MAX`, `Custom`, and others depending on your installation.
312
-
- **Environment / Version**: Configure the execution environment for the model service. Selecting a runtime variant automatically filters the environment images.
309
+
Click the `New Deployment` button on the Serving page to open the **Create Deployment** modal. The modal collects only deployment-level metadata; no revision is created at this point.
For runtime variants such as `vLLM`, `SGLang`, `NVIDIA NIM`, or `Modular MAX`, there is no need to configure a `model-definition` file in your model folder. Instead, the system handles the model configuration automatically based on the selected variant.
- **Deployment Name**: A unique name used to identify the deployment across the dashboard, API, and the endpoint URL.
317
+
- **Open To Public**: When enabled, the endpoint is reachable without an access token. When disabled, every request must carry a token. See [Access Tokens](#generating-tokens).
318
+
- **Resource Group**: The resource group where the deployment will run. If only one resource group is available to your project, the field is auto-selected and you can proceed without choosing one manually.
319
+
320
+
Click `Create Deployment` to create the deployment. You are then taken to the Endpoint Detail Page, where the **No Current Revision** warning is shown until you add the first revision.
321
+
322
+
### Add Revision
323
+
324
+
A revision captures every setting needed to run the inference server — image, start command, resources, model mounts, and environment variables. From the Endpoint Detail Page, click `Add Revision` to open the modal.
- **Runtime Variant**: The serving runtime that runs the model (for example, `vLLM`, `SGLang`, `NVIDIA NIM`, `Modular MAX`, or `Custom`). Choose `Custom` to define your own start command. Available variants are loaded dynamically from the backend.
332
+
- **Environment / Version**: The container image used for the inference server. Selecting a runtime variant filters this list down to compatible images.
333
+
- **Model Storage Folder to Mount**: The storage folder that contains the model files.
334
+
- **Start Command** (Custom variant): The command executed to launch the inference server. For non-Custom variants the runtime's default start command is applied automatically.
335
+
- **Environment Variables**: Key/value pairs passed to the inference server container.
336
+
- **Resource Preset**: A pre-configured bundle of CPU, memory, and accelerator allocations. Available presets are filtered by the deployment's resource group.
337
+
- **Auto-activate after adding**: When enabled, the new revision is applied immediately after it is created and replaces the current revision. When disabled, the revision is added in an inactive state and you can apply it later from the Revisions tab.
338
+
339
+
:::note
340
+
The revision **name** field has been removed. Each revision is identified by its auto-assigned revision number; see the [Revisions Tab](#revisions-tab) below.
341
+
:::
342
+
343
+
### Preset Mode for Revision Creation
344
+
345
+
When deployment presets are available, the Add Revision modal can run in **Preset mode**. Preset mode lets you start from a curated deployment preset instead of filling every field manually.
346
+
347
+
In Preset mode:
348
+
349
+
- A preset selector lists every deployment preset compatible with the deployment's runtime variant and resource group.
350
+
- Selecting a preset pre-fills the runtime variant, image, start command, environment variables, resource preset, and model storage selection from the preset's defaults.
351
+
- You can still adjust any field after the preset is applied — your edits are not written back to the preset.
352
+
- A **Deployment Preset Detail** link opens a side panel showing the preset's full configuration so you can verify what will be applied.
353
+
354
+
If your project has no compatible presets, the modal falls back to manual mode and you fill the fields directly. See [Deployment Presets](deployment_presets.md) for how to create and manage presets.
355
+
<!-- TODO: Cross-reference Work Item 11 — Deployment Presets page will be added in PR C -->
356
+
357
+
### Service Launcher (Legacy Fields)
358
+
359
+
The following subsections describe revision-level fields in detail. They apply both when adding a revision manually and when reviewing a preset before applying it.
319
360
320
361
#### Model Definition Mode (Custom Runtime Only)
321
362
@@ -481,36 +522,113 @@ follows:
481
522
482
523
## Endpoint Detail Page
483
524
484
-
Click on an endpoint name in the serving list to view detailed information about the model service.
525
+
Click on an endpoint name in the serving list to view detailed information about the deployment.
526
+
527
+
### Deployment Alerts
528
+
529
+
The Endpoint Detail Page shows contextual alert banners at the top, reflecting the current state of the deployment:
530
+
531
+
- **Deployment is ready**: Shown when the deployment is `HEALTHY`. Includes a **Start Chat** button as a shortcut to the LLM Chat Test interface so you can test the model without leaving the page.
- **Private deployment — use an access token to access the endpoint.**: Shown when **Open To Public** is disabled. Includes a shortcut to **Manage Access Tokens** so you can issue or copy a token. See [Access Tokens](#generating-tokens).
- **No revision is deployed — add a revision to activate this service.**: Shown when the deployment has no current revision. Click `Add Revision` to create the first revision and activate the service.
542
+
543
+
- **Preparing your service**: Shown while the deployment is being created or transitioning between states. Indicates the service is not yet ready to handle requests.
544
+
545
+

546
+
547
+
- **Not In Project**: Shown when the endpoint belongs to a different project than the currently selected one. The Edit button is disabled while this alert is active. Click the **Switch Project** button in the alert to switch to the correct project and manage the endpoint.
485
548
486
549
### Service Information
487
550
488
551
The Service Info card displays the following details:
489
552
490
-
- **Endpoint Name** and **Status**
491
-
- **Endpoint ID** and **Session Owner**
553
+
- **Deployment Name** and **Status**
554
+
- **Deployment ID** and **Session Owner**
555
+
- **Visibility**: Shown as a Public / Private tag. **Public** means the endpoint is reachable without an access token; **Private** means callers must supply a valid access token.
492
556
- **Number of Replicas**
493
-
- **Service Endpoint**: The URL for accessing the model service. For LLM services, an `LLM Chat Test` button is available.
494
-
- **Open To Public**: Whether the service is publicly accessible.
495
-
- **Resources**: The resource group and allocated CPU/Memory/GPU.
557
+
- **Service Endpoint**: The URL for accessing the deployment. For LLM deployments, a `Test in Chat` button is available.
558
+
- **Resource Group**: The resource group the deployment runs in. Resource group is now part of the deployment metadata (set once when the deployment is created) rather than per-revision.
559
+
- **Resources**: Allocated CPU, memory, accelerator, and **Shared Memory (SHM)**. The shared memory value is taken from the current revision and represents the size of `/dev/shm` available to the inference server — important for multi-GPU and multi-process inference workloads.
496
560
- **Model Storage**: The mounted model storage folder and mount destination.
497
561
- **Additional Mounts**: Any extra storage folders mounted.
498
562
- **Environment Variables**: Displayed as a code block.
499
563
- **Image**: The container image used for the service.
500
564
501
-
Click the `Edit` button on the Service Info card to navigate to the update launcher and modify the service settings.
565
+

566
+
<!-- TODO: Capture screenshot — Visibility row with Public/Private tag and Deployment ID -->
502
567
503
-
The Endpoint Detail Page displays contextual alert banners at the top, depending on the current state of the service:
- **Preparing your service**: Shown while the service is being deployed or transitioning between states. Indicates the service is not yet ready to handle requests.
571
+
#### More Menu (Edit and Delete)
506
572
507
-

573
+
The Service Info card's header exposes an **Edit** button alongside a **More** menu. The More menu currently contains the **Delete Deployment** action.
508
574
509
-
- **Service is ready**: Shown when the service is `HEALTHY`. Includes a **Start Chat** button as a shortcut to the LLM Chat Test interface.
575
+

576
+
<!-- TODO: Capture screenshot — More menu containing Delete action -->
510
577
511
-

578
+
The delete and trash icons across the Model Serving pages follow a strict convention:
512
579
513
-
- **Not In Project**: Shown when the endpoint belongs to a different project than the currently selected one. The Edit button is disabled while this alert is active. Click the **Switch Project** button in the alert to switch to the correct project and manage the endpoint.
580
+
- **Filled trash icon (`DeleteFilled`)** — *permanent delete*. Confirming opens a typed-confirmation modal where you must type the deployment's name before the OK button is enabled. There is no undo path.
581
+
- **Outlined trash icon (`DeleteOutlined`)** — *move to trash* (soft delete). Confirming sends the item to a trash bin from which it can be restored.
582
+
583
+
Always read the icon style before confirming a delete action.
584
+
585
+
### Replicas
586
+
587
+
The Replicas tab shows the routing nodes that make up the deployment. Replica entries are filtered by a **Running / Terminated** radio control at the top of the tab, which replaced the previous enum-based status filter.
588
+
589
+

590
+
<!-- TODO: Capture screenshot — Running/Terminated radio filter -->
591
+
592
+
- **Running**: Shows replicas that are currently provisioning, running, or otherwise active.
593
+
- **Terminated**: Shows replicas that have completed their lifecycle.
594
+
595
+
Each replica row carries three independent status fields:
596
+
597
+
- **Lifecycle Status**: Where the replica is in its lifecycle (for example, *Provisioning*, *Running*, *Terminating*).
598
+
- **Health Status**: The current health of the replica process (for example, *Healthy*, *Unhealthy*).
599
+
- **Traffic Status**: Whether the replica is currently serving requests.
600
+
601
+
Click on a replica node to open the session detail drawer, where you can view individual session details.
602
+
603
+
If a replica has encountered an error, clicking the error indicator on the row opens a JSON viewer modal that displays the raw error data. This is useful for diagnosing issues with individual replicas.
604
+
605
+

606
+
607
+
<a id="revisions-tab"></a>
608
+
609
+
### Revisions Tab
610
+
611
+
The Revisions tab lists every revision that has been added to the deployment, ordered by revision number.
612
+
613
+

614
+
<!-- TODO: Capture screenshot — revision history with revision number column + filter/sort -->
615
+
616
+
Columns include:
617
+
618
+
- **Revision Number**: An incrementing integer assigned in creation order. Lower numbers are older revisions. Each row also shows the underlying Revision ID for reference.
619
+
- **Status**: The current state of the revision (for example, *Active*, *Inactive*, *Applying*).
620
+
- **Runtime Variant**, **Image**, and **Resource Preset**: Summary of the revision's configuration.
621
+
- **Created At**
622
+
623
+
You can filter and sort the list by every visible column, including revision number, status, runtime variant, and creation timestamp.
624
+
625
+
#### Applying a Revision
626
+
627
+
Every row carries an **Apply** action. Clicking `Apply` makes that revision the **current** revision; the deployment begins serving traffic with the new configuration and the previously active revision becomes inactive. While the new revision is rolling out, the deployment shows a *The next revision is being applied.* alert and the apply action remains disabled to prevent overlapping applies.
628
+
629
+
:::note
630
+
The action is named **Apply** in every revision-related UI surface (row action, modal confirmation, alert text). Earlier terms such as *Activate* or *Promote* have been unified to **Apply**.
631
+
:::
514
632
515
633
<a id="revision-info"></a>
516
634
@@ -658,11 +776,11 @@ Click the `Edit` button on the endpoint detail page to modify a model service. T
658
776
The model service periodically runs a scheduler to adjust the routing
659
777
count to match the desired session count. However, this puts a burden on
660
778
the Backend.AI scheduler. Therefore, it is recommended to terminate the
661
-
model service if it is no longer needed. To terminate the model service,
662
-
click on the `Delete` button in the Controls column. A modal will appear asking
663
-
for confirmation to terminate the model service. Clicking `Delete`
664
-
will terminate the model service. The terminated model service will
665
-
appear in the **Destroyed** filter view.
779
+
deployment if it is no longer needed. To terminate the deployment, open
780
+
the **More** menu on the Service Info card and select **Delete Deployment**.
781
+
A typed-confirmation modal appears — type the deployment name to enable the
782
+
**Permanently Delete** button. The terminated deployment then appears in the
783
+
**Destroyed** filter view.
666
784
667
785

668
786
@@ -723,6 +841,8 @@ To use the model, you will need the following information:
723
841
724
842

725
843
844
+
<a id="model-store"></a>
845
+
726
846
## Model Store
727
847
728
848
The Model Store provides a card-based gallery of pre-configured models that you can browse, search, and deploy. You can access the Model Store from the sidebar menu.
0 commit comments