@@ -74,15 +74,15 @@ currently **recommended** models and their respective prices.
74
74
Optional flags to enhance performance (details below).
75
75
</ParamField >
76
76
<ParamField path = " url" type = " string" optional = " true" >
77
- Optional URL and port where the model container or custom container manager software is running.
77
+ Optional URL and port where the model container or custom container manager software is running.
78
78
[ See External Containers] ( #external-containers )
79
79
</ParamField >
80
80
<ParamField path = " token" type = " string" >
81
- Optional token required to interact with the model container or custom container manager software.
81
+ Optional token required to interact with the model container or custom container manager software.
82
82
[ See External Containers] ( #external-containers )
83
83
</ParamField >
84
84
<ParamField path = " capacity" type = " integer" >
85
- Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
85
+ Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
86
86
[ See External Containers] ( #external-containers )
87
87
</ParamField >
88
88
@@ -131,30 +131,30 @@ are available:
131
131
132
132
<Warning >
133
133
This feature is intended for advanced users. Incorrect setup can lead to a
134
- lower orchestrator score and reduced fees. If external containers are used,
135
- it is the Orchestrator's responsibility to ensure the correct container with
136
- the correct endpoints is running behind the specified ` url ` .
134
+ lower orchestrator score and reduced fees. If external containers are used,
135
+ it is the Orchestrator's responsibility to ensure the correct container with
136
+ the correct endpoints is running behind the specified ` url ` .
137
137
</Warning >
138
138
139
- External containers can be for one model to stack on top of managed model containers,
139
+ External containers can be for one model to stack on top of managed model containers,
140
140
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
141
141
can use external containers to extend the models served or fully replace the AI Worker managed model containers
142
142
using the [ Docker client Go library] ( https://pkg.go.dev/github.com/docker/docker/client )
143
- to start and stop containers specified at startup of the AI Worker.
144
-
143
+ to start and stop containers specified at startup of the AI Worker.
144
+
145
145
External containers can be used by specifying the ` url ` , ` capacity ` and ` token ` fields in the
146
146
model configuration. The only requirement is that the ` url ` specified responds as expected to the AI Worker same
147
147
as the managed containers would respond (including http error codes). As long as the container management software
148
- acts as a pass through to the model container you can use any container management software to implement the custom
149
- management of the runner containers including [ Kubernetes] ( https://kubernetes.io/ ) , [ Podman] ( https://podman.io/ ) ,
150
- [ Docker Swarm] ( https://docs.docker.com/engine/swarm/ ) , [ Nomad] ( https://www.nomadproject.io/ ) , or custom scripts to
148
+ acts as a pass through to the model container you can use any container management software to implement the custom
149
+ management of the runner containers including [ Kubernetes] ( https://kubernetes.io/ ) , [ Podman] ( https://podman.io/ ) ,
150
+ [ Docker Swarm] ( https://docs.docker.com/engine/swarm/ ) , [ Nomad] ( https://www.nomadproject.io/ ) , or custom scripts to
151
151
manage container lifecycles based on request volume
152
152
153
153
154
- - The ` url ` set will be used to confirm a model container is running at startup of the AI Worker using the ` /health ` endpoint.
154
+ - The ` url ` set will be used to confirm a model container is running at startup of the AI Worker using the ` /health ` endpoint.
155
155
Inference requests will be forwarded to the ` url ` same as they are to the managed containers after startup.
156
156
- The ` capacity ` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
157
- If auto scaling containers, take care that the startup time is fast if setting ` warm: true ` because slow response time will
157
+ If auto scaling containers, take care that the startup time is fast if setting ` warm: true ` because slow response time will
158
158
negatively impact your selection by Gateways for future requests.
159
159
- The ` token ` field is used to secure the model container ` url ` from unauthorized access and is strongly
160
160
suggested to use if the containers are exposed to external networks.
0 commit comments