You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
109
-
[See External Containers](#external-containers)
108
+
Optional capacity of the model. This is the number of inference tasks the
109
+
model can handle at the same time. This defaults to 1. [See External
110
+
Containers](#external-containers)
110
111
</ParamField>
111
112
112
113
### Optimization Flags
@@ -153,33 +154,43 @@ are available:
153
154
### External Containers
154
155
155
156
<Warning>
156
-
This feature is intended for advanced users. Incorrect setup can lead to a
157
-
lower orchestrator score and reduced fees. If external containers are used,
158
-
it is the Orchestrator's responsibility to ensure the correct container with
159
-
the correct endpoints is running behind the specified `url`.
157
+
This feature is intended for **advanced** users. Misconfiguration can reduce
158
+
orchestrator scores and earnings. Orchestrators are responsible for ensuring
159
+
the specified `url` points to a properly configured and operational container
160
+
with the correct endpoints.
160
161
</Warning>
161
162
162
-
External containers can be for one model to stack on top of managed model containers,
163
-
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
164
-
can use external containers to extend the models served or fully replace the AI Worker managed model containers
165
-
using the [Docker client Go library](https://pkg.go.dev/github.com/docker/docker/client)
166
-
to start and stop containers specified at startup of the AI Worker.
167
-
168
-
External containers can be used by specifying the `url`, `capacity` and `token` fields in the
169
-
model configuration. The only requirement is that the `url` specified responds as expected to the AI Worker same
170
-
as the managed containers would respond (including http error codes). As long as the container management software
171
-
acts as a pass through to the model container you can use any container management software to implement the custom
172
-
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
173
-
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
174
-
manage container lifecycles based on request volume
175
-
176
-
177
-
- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
178
-
Inference requests will be forwarded to the `url` same as they are to the managed containers after startup.
179
-
- The `capacity` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
180
-
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
181
-
negatively impact your selection by Gateways for future requests.
182
-
- The `token` field is used to secure the model container `url` from unauthorized access and is strongly
183
-
suggested to use if the containers are exposed to external networks.
184
-
185
-
We welcome feedback to improve this feature, so please reach out to us if you have suggestions to enable better experience running external containers.
0 commit comments