You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs(ai): add external container manager docs
This commit adds documentation for the external container manager
feature now that it has been thoroughly tested and some issues have been
fixed.
---------
Co-authored-by: 0xb79 <[email protected]>
Optional URL and port where the model container or custom container manager software is running.
101
+
[See External Containers](#external-containers)
102
+
</ParamField>
103
+
<ParamFieldpath="token"type="string">
104
+
Optional token required to interact with the model container or custom container manager software.
105
+
[See External Containers](#external-containers)
106
+
</ParamField>
107
+
<ParamFieldpath="capacity"type="integer">
108
+
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
109
+
[See External Containers](#external-containers)
110
+
</ParamField>
96
111
97
112
### Optimization Flags
98
113
@@ -134,3 +149,37 @@ are available:
134
149
loss**. The speedup becomes more pronounced as the number of inference steps
135
150
increases. Cannot be used simultaneously with `SFAST`.
136
151
</ParamField>
152
+
153
+
### External Containers
154
+
155
+
<Warning>
156
+
This feature is intended for advanced users. Incorrect setup can lead to a
157
+
lower orchestrator score and reduced fees. If external containers are used,
158
+
it is the Orchestrator's responsibility to ensure the correct container with
159
+
the correct endpoints is running behind the specified `url`.
160
+
</Warning>
161
+
162
+
External containers can be for one model to stack on top of managed model containers,
163
+
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
164
+
can use external containers to extend the models served or fully replace the AI Worker managed model containers
165
+
using the [Docker client Go library](https://pkg.go.dev/github.com/docker/docker/client)
166
+
to start and stop containers specified at startup of the AI Worker.
167
+
168
+
External containers can be used by specifying the `url`, `capacity` and `token` fields in the
169
+
model configuration. The only requirement is that the `url` specified responds as expected to the AI Worker same
170
+
as the managed containers would respond (including http error codes). As long as the container management software
171
+
acts as a pass through to the model container you can use any container management software to implement the custom
172
+
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
173
+
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
174
+
manage container lifecycles based on request volume
175
+
176
+
177
+
- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
178
+
Inference requests will be forwarded to the `url` same as they are to the managed containers after startup.
179
+
- The `capacity` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
180
+
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
181
+
negatively impact your selection by Gateways for future requests.
182
+
- The `token` field is used to secure the model container `url` from unauthorized access and is strongly
183
+
suggested to use if the containers are exposed to external networks.
184
+
185
+
We welcome feedback to improve this feature, so please reach out to us if you have suggestions to enable better experience running external containers.
0 commit comments