Skip to content

Commit b45add5

Browse files
committed
Add Image to Text docs
1 parent 5cf8673 commit b45add5

File tree

6 files changed

+136
-2
lines changed

6 files changed

+136
-2
lines changed

ai/api-reference/image-to-text.mdx

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
---
2+
openapi: post /image-to-text
3+
---
4+
5+
<Info>
6+
The default Gateway used in this guide is the public
7+
[Livepeer.cloud](https://www.livepeer.cloud/) Gateway. It is free to use but
8+
not intended for production-ready applications. For production-ready
9+
applications, consider using the [Livepeer Studio](https://livepeer.studio/)
10+
Gateway, which requires an API token. Alternatively, you can set up your own
11+
Gateway node or partner with one via the `ai-video` channel on
12+
[Discord](https://discord.gg/livepeer).
13+
</Info>
14+
15+
<Note>
16+
Please note that the exact parameters, default values, and responses may vary
17+
between models. For more information on model-specific parameters, please
18+
refer to the respective model documentation available in the [image-to-text
19+
pipeline](/ai/pipelines/image-to-text). Not all parameters might be available
20+
for a given model.
21+
</Note>

ai/orchestrators/models-config.mdx

+5
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,11 @@ currently **recommended** models and their respective prices.
5656
"price_per_unit": 11,
5757
"pixels_per_unit": 1e2,
5858
"currency": "USD",
59+
},
60+
{
61+
"pipeline": "image-to-text",
62+
"model_id": "Salesforce/blip-image-captioning-large",
63+
"price_per_unit": 4768371
5964
}
6065
]
6166
```

ai/pipelines/image-to-text.mdx

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Image-to-Text
3+
---
4+
5+
## Overview
6+
7+
The `image-to-text` pipeline converts images into text captions. This pipeline is powered by the latest models in the HuggingFace [text-to-image](https://huggingface.co/models?pipeline_tag=text-to-image) pipeline.
8+
9+
<div align="center">
10+
11+
</div>
12+
13+
## Models
14+
15+
### Warm Models
16+
17+
The current warm model requested for the `image-to-text` pipeline is:
18+
19+
- [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large)
20+
21+
<Tip>
22+
For faster responses with different
23+
[image-to-text](https://huggingface.co/models?pipeline_tag=text-to-image)
24+
diffusion models, ask Orchestrators to load it on their GPU via the `ai-video`
25+
channel in [Discord Server](https://discord.gg/livepeer).
26+
</Tip>
27+
28+
### On-Demand Models
29+
30+
The following models have been tested and verified for the `image-to-text`
31+
pipeline:
32+
33+
<Note>
34+
If a specific model you wish to use is not listed, please submit a [feature
35+
request](https://github.com/livepeer/ai-worker/issues/new?assignees=&labels=enhancement%2Cmodel&projects=&template=model_request.yml)
36+
on GitHub to get the model verified and added to the list.
37+
</Note>
38+
39+
{/* prettier-ignore */}
40+
<Accordion title="Tested and Verified Diffusion Models">
41+
- [Salesforce/blip-image-captioning-large](https://huggingface.co/Salesforce/blip-image-captioning-large)
42+
</Accordion>
43+
44+
## Basic Usage Instructions
45+
46+
<Tip>
47+
For a detailed understanding of the `image-to-text` endpoint and to experiment
48+
with the API, see the [Livepeer AI API
49+
Reference](/ai/api-reference/image-to-text).
50+
</Tip>
51+
52+
To create an image caption using the `image-to-text` pipeline, submit a
53+
`POST` request to the Gateway's `image-to-text` API endpoint:
54+
55+
```bash
56+
curl -X POST "https://<GATEWAY_IP>/image-to-text" \
57+
-F model_id=Salesforce/blip-image-captioning-large \
58+
-F image=@<PATH_TO_FILE>
59+
```
60+
61+
In this command:
62+
63+
- `<GATEWAY_IP>` should be replaced with your AI Gateway's IP address.
64+
- `model_id` is the diffusion model to use.
65+
- `image` is the path to the image file to be captioned.
66+
67+
<Note>
68+
Maximum request size: 50 MB
69+
</Note>
70+
71+
For additional optional parameters, refer to the
72+
[Livepeer AI API Reference](/ai/api-reference/image-to-text).
73+
74+
## Orchestrator Configuration
75+
76+
To configure your Orchestrator to serve the `image-to-text` pipeline, refer to
77+
the [Orchestrator Configuration](/ai/orchestrators/get-started) guide.
78+
79+
### System Requirements
80+
81+
The following system requirements are recommended for optimal performance:
82+
83+
- [NVIDIA GPU](https://developer.nvidia.com/cuda-gpus) with **at least 12GB** of
84+
VRAM.
85+
86+
## API Reference
87+
88+
<Card
89+
title="API Reference"
90+
icon="rectangle-terminal"
91+
href="/ai/api-reference/image-to-text"
92+
>
93+
Explore the `image-to-text` endpoint and experiment with the API in the
94+
Livepeer AI API Reference.
95+
</Card>

ai/pipelines/overview.mdx

+7
Original file line numberDiff line numberDiff line change
@@ -89,4 +89,11 @@ pipelines:
8989
>
9090
The text-to-speech pipeline generates high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc).
9191
</Card>
92+
<Card
93+
title="Image-to-Text"
94+
icon="message-dots"
95+
href="/ai/pipelines/image-to-text"
96+
>
97+
The image-to-text pipeline generates captions for input images, with an optional prompt to guide the process.
98+
</Card>
9299
</CardGroup>
+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
title: "Image To Text"
3+
openapi: "POST /api/beta/generate/image-to-text"
4+
---

mint.json

+4-2
Original file line numberDiff line numberDiff line change
@@ -539,7 +539,8 @@
539539
"ai/pipelines/segment-anything-2",
540540
"ai/pipelines/text-to-image",
541541
"ai/pipelines/text-to-speech",
542-
"ai/pipelines/upscale"
542+
"ai/pipelines/upscale",
543+
"ai/pipelines/image-to-text"
543544
]
544545
},
545546
{
@@ -604,7 +605,8 @@
604605
"ai/api-reference/image-to-video",
605606
"ai/api-reference/segment-anything-2",
606607
"ai/api-reference/upscale",
607-
"ai/api-reference/text-to-speech"
608+
"ai/api-reference/text-to-speech",
609+
"ai/api-reference/image-to-text"
608610
]
609611
}
610612
]

0 commit comments

Comments
 (0)