Skip to content

Commit c8166a7

Browse files
docs(ai): add external containers docs (#687)
* docs(ai): add external container manager docs This commit adds documentation for the external container manager feature now that it has been thoroughly tested and some issues have been fixed. --------- Co-authored-by: 0xb79 <[email protected]>
1 parent 40e3f58 commit c8166a7

File tree

1 file changed

+51
-2
lines changed

1 file changed

+51
-2
lines changed

ai/orchestrators/models-config.mdx

+51-2
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,10 @@ currently **recommended** models and their respective prices.
4040
{
4141
"pipeline": "audio-to-text",
4242
"model_id": "openai/whisper-large-v3",
43-
"price_per_unit": 12882811
43+
"price_per_unit": 12882811,
44+
"url": "<CONTAINER_URL>:<PORT>",
45+
"token": "<OPTIONAL_BEARER_TOKEN>",
46+
"capacity": 1
4447
},
4548
{
4649
"pipeline": "segment-anything-2",
@@ -65,7 +68,7 @@ currently **recommended** models and their respective prices.
6568
"model_id": "parler-tts/parler-tts-large-v1",
6669
"price_per_unit": 11,
6770
"pixels_per_unit": 1e2,
68-
"currency": "USD",
71+
"currency": "USD"
6972
}
7073
]
7174
```
@@ -93,6 +96,18 @@ currently **recommended** models and their respective prices.
9396
<ParamField path="optimization_flags" type="object">
9497
Optional flags to enhance performance (details below).
9598
</ParamField>
99+
<ParamField path="url" type="string" optional="true">
100+
Optional URL and port where the model container or custom container manager software is running.
101+
[See External Containers](#external-containers)
102+
</ParamField>
103+
<ParamField path="token" type="string">
104+
Optional token required to interact with the model container or custom container manager software.
105+
[See External Containers](#external-containers)
106+
</ParamField>
107+
<ParamField path="capacity" type="integer">
108+
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
109+
[See External Containers](#external-containers)
110+
</ParamField>
96111

97112
### Optimization Flags
98113

@@ -134,3 +149,37 @@ are available:
134149
loss**. The speedup becomes more pronounced as the number of inference steps
135150
increases. Cannot be used simultaneously with `SFAST`.
136151
</ParamField>
152+
153+
### External Containers
154+
155+
<Warning>
156+
This feature is intended for advanced users. Incorrect setup can lead to a
157+
lower orchestrator score and reduced fees. If external containers are used,
158+
it is the Orchestrator's responsibility to ensure the correct container with
159+
the correct endpoints is running behind the specified `url`.
160+
</Warning>
161+
162+
External containers can be for one model to stack on top of managed model containers,
163+
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
164+
can use external containers to extend the models served or fully replace the AI Worker managed model containers
165+
using the [Docker client Go library](https://pkg.go.dev/github.com/docker/docker/client)
166+
to start and stop containers specified at startup of the AI Worker.
167+
168+
External containers can be used by specifying the `url`, `capacity` and `token` fields in the
169+
model configuration. The only requirement is that the `url` specified responds as expected to the AI Worker same
170+
as the managed containers would respond (including http error codes). As long as the container management software
171+
acts as a pass through to the model container you can use any container management software to implement the custom
172+
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
173+
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
174+
manage container lifecycles based on request volume
175+
176+
177+
- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
178+
Inference requests will be forwarded to the `url` same as they are to the managed containers after startup.
179+
- The `capacity` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
180+
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
181+
negatively impact your selection by Gateways for future requests.
182+
- The `token` field is used to secure the model container `url` from unauthorized access and is strongly
183+
suggested to use if the containers are exposed to external networks.
184+
185+
We welcome feedback to improve this feature, so please reach out to us if you have suggestions to enable better experience running external containers.

0 commit comments

Comments
 (0)