@@ -4,7 +4,10 @@ title: Image-to-Text
4
4
5
5
## Overview
6
6
7
- The ` image-to-text ` pipeline converts images into text captions. This pipeline is powered by the latest models in the HuggingFace [ text-to-image] ( https://huggingface.co/models?pipeline_tag=text-to-image ) pipeline.
7
+ The ` image-to-text ` pipeline converts images into text captions. This pipeline
8
+ is powered by the latest models in the HuggingFace
9
+ [ text-to-image] ( https://huggingface.co/models?pipeline_tag=text-to-image )
10
+ pipeline.
8
11
9
12
<div align = " center" >
10
13
@@ -19,10 +22,10 @@ The current warm model requested for the `image-to-text` pipeline is:
19
22
- [ Salesforce/blip-image-captioning-large] ( https://huggingface.co/Salesforce/blip-image-captioning-large )
20
23
21
24
<Tip >
22
- For faster responses with different
23
- [ image-to-text] ( https://huggingface.co/models?pipeline_tag=text-to-image )
24
- diffusion models, ask Orchestrators to load it on their GPU via the ` ai-video `
25
- channel in [ Discord Server] ( https://discord.gg/livepeer ) .
25
+ For faster responses with different
26
+ [ image-to-text] ( https://huggingface.co/models?pipeline_tag=text-to-image )
27
+ diffusion models, ask Orchestrators to load it on their GPU via the ` ai-video `
28
+ channel in [ Discord Server] ( https://discord.gg/livepeer ) .
26
29
</Tip >
27
30
28
31
### On-Demand Models
@@ -31,9 +34,9 @@ The following models have been tested and verified for the `image-to-text`
31
34
pipeline:
32
35
33
36
<Note >
34
- If a specific model you wish to use is not listed, please submit a [ feature
35
- request] ( https://github.com/livepeer/ai-worker/issues/new?assignees=&labels=enhancement%2Cmodel&projects=&template=model_request.yml )
36
- on GitHub to get the model verified and added to the list.
37
+ If a specific model you wish to use is not listed, please submit a [ feature
38
+ request] ( https://github.com/livepeer/ai-worker/issues/new?assignees=&labels=enhancement%2Cmodel&projects=&template=model_request.yml )
39
+ on GitHub to get the model verified and added to the list.
37
40
</Note >
38
41
39
42
{ /* prettier-ignore */ }
@@ -44,13 +47,13 @@ pipeline:
44
47
## Basic Usage Instructions
45
48
46
49
<Tip >
47
- For a detailed understanding of the ` image-to-text ` endpoint and to experiment
48
- with the API, see the [ Livepeer AI API
49
- Reference] ( /ai/api-reference/image-to-text ) .
50
+ For a detailed understanding of the ` image-to-text ` endpoint and to experiment
51
+ with the API, see the [ Livepeer AI API
52
+ Reference] ( /ai/api-reference/image-to-text ) .
50
53
</Tip >
51
54
52
- To create an image caption using the ` image-to-text ` pipeline, submit a
53
- ` POST ` request to the Gateway's ` image-to-text ` API endpoint:
55
+ To create an image caption using the ` image-to-text ` pipeline, submit a ` POST `
56
+ request to the Gateway's ` image-to-text ` API endpoint:
54
57
55
58
``` bash
56
59
curl -X POST " https://<GATEWAY_IP>/image-to-text" \
@@ -64,9 +67,7 @@ In this command:
64
67
- ` model_id ` is the diffusion model to use.
65
68
- ` image ` is the path to the image file to be captioned.
66
69
67
- <Note >
68
- Maximum request size: 50 MB
69
- </Note >
70
+ <Note >Maximum request size: 50 MB</Note >
70
71
71
72
For additional optional parameters, refer to the
72
73
[ Livepeer AI API Reference] ( /ai/api-reference/image-to-text ) .
@@ -80,16 +81,32 @@ the [Orchestrator Configuration](/ai/orchestrators/get-started) guide.
80
81
81
82
The following system requirements are recommended for optimal performance:
82
83
83
- - [ NVIDIA GPU] ( https://developer.nvidia.com/cuda-gpus ) with ** at least 12GB** of
84
- VRAM.
84
+ - [ NVIDIA GPU] ( https://developer.nvidia.com/cuda-gpus ) with ** at least 4GB** of
85
+ VRAM.
86
+
87
+
88
+ ## Recommended Pipeline Pricing
89
+
90
+ <Note >
91
+ We are planning to simplify the pricing in the future so orchestrators can set
92
+ one AI price per compute unit and have the system automatically scale based on
93
+ the model's compute requirements.
94
+ </Note >
95
+
96
+ The pricing for the ` image-to-text ` pipeline is based on competitor pricing.
97
+ However, we strongly encourage orchestrators to set their own pricing based on
98
+ their costs and requirements. Setting a competitive price will help attract more
99
+ jobs, as Gateways can set their maximum price for a job. The current recommended
100
+ pricing for this pipeline is ` 2.5e-10 USD ` per ** input pixel**
101
+ (` height * width ` ).
85
102
86
103
## API Reference
87
104
88
105
<Card
89
- title = " API Reference"
90
- icon = " rectangle-terminal"
91
- href = " /ai/api-reference/image-to-text"
106
+ title = " API Reference"
107
+ icon = " rectangle-terminal"
108
+ href = " /ai/api-reference/image-to-text"
92
109
>
93
- Explore the ` image-to-text ` endpoint and experiment with the API in the
94
- Livepeer AI API Reference.
110
+ Explore the ` image-to-text ` endpoint and experiment with the API in the
111
+ Livepeer AI API Reference.
95
112
</Card >
0 commit comments