Skip to content

Commit 026c8e3

Browse files
mjh1rickstaa
authored andcommitted
docs(ai): add Image to Text pipeline docs
This commit adds the image-to-text pipeline docs and updates the API reference.
1 parent bb2dd42 commit 026c8e3

File tree

6 files changed

+145
-16
lines changed

6 files changed

+145
-16
lines changed

ai/api-reference/image-to-text.mdx

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
openapi: post /image-to-text
3+
---
4+
5+
<Info>
6+
The default Gateway used in this guide is the public
7+
[Livepeer.cloud](https://www.livepeer.cloud/) Gateway. It is free to use but
8+
not intended for production-ready applications. For production-ready
9+
applications, consider using the [Livepeer Studio](https://livepeer.studio/)
10+
Gateway, which requires an API token. Alternatively, you can set up your own
11+
Gateway node or partner with one via the `ai-video` channel on
12+
[Discord](https://discord.gg/livepeer).
13+
</Info>
14+
15+
<Note>
16+
Please note that the exact parameters, default values, and responses may vary
17+
between models. For more information on model-specific parameters, please
18+
refer to the respective model documentation available in the [image-to-text
19+
pipeline](/ai/pipelines/image-to-text). Not all parameters might be available
20+
for a given model.
21+
</Note>

ai/orchestrators/models-config.mdx

+14-14
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,15 @@ currently **recommended** models and their respective prices.
7474
Optional flags to enhance performance (details below).
7575
</ParamField>
7676
<ParamField path="url" type="string" optional="true">
77-
Optional URL and port where the model container or custom container manager software is running.
77+
Optional URL and port where the model container or custom container manager software is running.
7878
[See External Containers](#external-containers)
7979
</ParamField>
8080
<ParamField path="token" type="string">
81-
Optional token required to interact with the model container or custom container manager software.
81+
Optional token required to interact with the model container or custom container manager software.
8282
[See External Containers](#external-containers)
8383
</ParamField>
8484
<ParamField path="capacity" type="integer">
85-
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
85+
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
8686
[See External Containers](#external-containers)
8787
</ParamField>
8888

@@ -131,30 +131,30 @@ are available:
131131

132132
<Warning>
133133
This feature is intended for advanced users. Incorrect setup can lead to a
134-
lower orchestrator score and reduced fees. If external containers are used,
135-
it is the Orchestrator's responsibility to ensure the correct container with
136-
the correct endpoints is running behind the specified `url`.
134+
lower orchestrator score and reduced fees. If external containers are used,
135+
it is the Orchestrator's responsibility to ensure the correct container with
136+
the correct endpoints is running behind the specified `url`.
137137
</Warning>
138138

139-
External containers can be for one model to stack on top of managed model containers,
139+
External containers can be for one model to stack on top of managed model containers,
140140
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
141141
can use external containers to extend the models served or fully replace the AI Worker managed model containers
142142
using the [Docker client Go library](https://pkg.go.dev/github.com/docker/docker/client)
143-
to start and stop containers specified at startup of the AI Worker.
144-
143+
to start and stop containers specified at startup of the AI Worker.
144+
145145
External containers can be used by specifying the `url`, `capacity` and `token` fields in the
146146
model configuration. The only requirement is that the `url` specified responds as expected to the AI Worker same
147147
as the managed containers would respond (including http error codes). As long as the container management software
148-
acts as a pass through to the model container you can use any container management software to implement the custom
149-
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
150-
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
148+
acts as a pass through to the model container you can use any container management software to implement the custom
149+
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
150+
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
151151
manage container lifecycles based on request volume
152152

153153

154-
- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
154+
- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
155155
Inference requests will be forwarded to the `url` same as they are to the managed containers after startup.
156156
- The `capacity` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
157-
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
157+
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
158158
negatively impact your selection by Gateways for future requests.
159159
- The `token` field is used to secure the model container `url` from unauthorized access and is strongly
160160
suggested to use if the containers are exposed to external networks.

ai/pipelines/image-to-text.mdx

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Image-to-Text
3+
---
4+
5+
## Overview
6+
7+
The `image-to-text` pipeline converts images into text captions. This pipeline is powered by the latest models in the HuggingFace [text-to-image](https://huggingface.co/models?pipeline_tag=text-to-image) pipeline.
8+
9+
<div align="center">
10+
11+
</div>
12+
13+
## Models
14+
15+
### Warm Models
16+
17+
The current warm model requested for the `image-to-text` pipeline is:
18+
19+
- [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large)
20+
21+
<Tip>
22+
For faster responses with different
23+
[image-to-text](https://huggingface.co/models?pipeline_tag=text-to-image)
24+
diffusion models, ask Orchestrators to load it on their GPU via the `ai-video`
25+
channel in [Discord Server](https://discord.gg/livepeer).
26+
</Tip>
27+
28+
### On-Demand Models
29+
30+
The following models have been tested and verified for the `image-to-text`
31+
pipeline:
32+
33+
<Note>
34+
If a specific model you wish to use is not listed, please submit a [feature
35+
request](https://github.com/livepeer/ai-worker/issues/new?assignees=&labels=enhancement%2Cmodel&projects=&template=model_request.yml)
36+
on GitHub to get the model verified and added to the list.
37+
</Note>
38+
39+
{/* prettier-ignore */}
40+
<Accordion title="Tested and Verified Diffusion Models">
41+
- [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large)
42+
</Accordion>
43+
44+
## Basic Usage Instructions
45+
46+
<Tip>
47+
For a detailed understanding of the `image-to-text` endpoint and to experiment
48+
with the API, see the [Livepeer AI API
49+
Reference](/ai/api-reference/image-to-text).
50+
</Tip>
51+
52+
To create an image caption using the `image-to-text` pipeline, submit a
53+
`POST` request to the Gateway's `image-to-text` API endpoint:
54+
55+
```bash
56+
curl -X POST "https://<GATEWAY_IP>/image-to-text" \
57+
-F model_id=Salesforce/blip-image-captioning-large \
58+
-F image=@<PATH_TO_FILE>
59+
```
60+
61+
In this command:
62+
63+
- `<GATEWAY_IP>` should be replaced with your AI Gateway's IP address.
64+
- `model_id` is the diffusion model to use.
65+
- `image` is the path to the image file to be captioned.
66+
67+
<Note>
68+
Maximum request size: 50 MB
69+
</Note>
70+
71+
For additional optional parameters, refer to the
72+
[Livepeer AI API Reference](/ai/api-reference/image-to-text).
73+
74+
## Orchestrator Configuration
75+
76+
To configure your Orchestrator to serve the `image-to-text` pipeline, refer to
77+
the [Orchestrator Configuration](/ai/orchestrators/get-started) guide.
78+
79+
### System Requirements
80+
81+
The following system requirements are recommended for optimal performance:
82+
83+
- [NVIDIA GPU](https://developer.nvidia.com/cuda-gpus) with **at least 12GB** of
84+
VRAM.
85+
86+
## API Reference
87+
88+
<Card
89+
title="API Reference"
90+
icon="rectangle-terminal"
91+
href="/ai/api-reference/image-to-text"
92+
>
93+
Explore the `image-to-text` endpoint and experiment with the API in the
94+
Livepeer AI API Reference.
95+
</Card>

ai/pipelines/overview.mdx

+7
Original file line numberDiff line numberDiff line change
@@ -89,4 +89,11 @@ pipelines:
8989
>
9090
The text-to-speech pipeline generates high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).
9191
</Card>
92+
<Card
93+
title="Image-to-Text"
94+
icon="message-dots"
95+
href="/ai/pipelines/image-to-text"
96+
>
97+
The image-to-text pipeline generates captions for input images, with an optional prompt to guide the process.
98+
</Card>
9299
</CardGroup>
+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
title: "Image To Text"
3+
openapi: "POST /api/beta/generate/image-to-text"
4+
---

mint.json

+4-2
Original file line numberDiff line numberDiff line change
@@ -539,7 +539,8 @@
539539
"ai/pipelines/segment-anything-2",
540540
"ai/pipelines/text-to-image",
541541
"ai/pipelines/text-to-speech",
542-
"ai/pipelines/upscale"
542+
"ai/pipelines/upscale",
543+
"ai/pipelines/image-to-text"
543544
]
544545
},
545546
{
@@ -605,7 +606,8 @@
605606
"ai/api-reference/image-to-video",
606607
"ai/api-reference/segment-anything-2",
607608
"ai/api-reference/upscale",
608-
"ai/api-reference/text-to-speech"
609+
"ai/api-reference/text-to-speech",
610+
"ai/api-reference/image-to-text"
609611
]
610612
}
611613
]

0 commit comments

Comments
 (0)