Skip to content

Commit 40e3f58

Browse files
authored
refactor(ai): apply some small formatting changes (#686)
This commit applies some small formatting changes to cleanup the codebase. This commit simplifies the SAM2 pipeline docker image url.
1 parent 112b74e commit 40e3f58

6 files changed

+44
-24
lines changed

ai/pipelines/image-to-image.mdx

+2-1
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,8 @@ curl -X POST https://<GATEWAY_IP>/image-to-image \
126126
-F loras='{ "nerijs/pixel-art-xl": 1.2 }'
127127
```
128128

129-
You can find a list of available LoRa models for various models on [lora-studio](https://huggingface.co/spaces/enzostvs/lora-studio).
129+
You can find a list of available LoRa models for various models on
130+
[lora-studio](https://huggingface.co/spaces/enzostvs/lora-studio).
130131

131132
## Orchestrator Configuration
132133

ai/pipelines/image-to-video.mdx

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Image-to-video
2+
title: Image-to-Video
33
---
44

55
## Overview
@@ -34,7 +34,7 @@ graph LR
3434

3535
The current warm model requested for the `image-to-video` pipeline is:
3636

37-
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
37+
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
3838
An updated version of the stable-video-diffusion-img2vid-xt model with
3939
enhanced performance
4040
([limited-commercial use license](https://stability.ai/license)).
@@ -59,9 +59,9 @@ pipeline:
5959

6060
{/* prettier-ignore */}
6161
<Accordion title="Tested and Verified Diffusion Models">
62-
- [stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt):
62+
- [stable-video-diffusion-img2vid-xt](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt):
6363
A model by Stability AI designed for stable video diffusion from images ([limited-commercial use license](https://stability.ai/license)).
64-
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
64+
- [stabilityai/stable-video-diffusion-img2vid-xt-1-1](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1):
6565
An updated version of the stable-video-diffusion-img2vid-xt model with enhanced performance ([limited-commercial use license](https://stability.ai/license)).
6666
</Accordion>
6767

ai/pipelines/segment-anything-2.mdx

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Segment-anything-2
2+
title: Segment-Anything-2
33
---
44

55
## Overview
@@ -21,7 +21,7 @@ HuggingFace's
2121

2222
The current warm model requested for the `segment-anything-2` pipeline is:
2323

24-
- [facebook/sam2-hiera-large](https://huggingface.co/facebook/sam2-hiera-large):
24+
- [facebook/sam2-hiera-large](https://huggingface.co/facebook/sam2-hiera-large):
2525
The largest model in the Segment Anything 2 model suite, designed for the most
2626
accurate image segmentation.
2727

@@ -108,7 +108,7 @@ The following system requirements are recommended for optimal performance:
108108

109109
To serve the `segment-anything-2` pipeline, you must use a pipeline specific AI
110110
Runner container. Pull the required container from
111-
[Docker Hub](https://hub.docker.com/layers/livepeer/ai-runner/segment-anything-2/images/sha256-b47b04e31907670db673152c38221373e5d749173ed54f932f8d9f8ad5959a33?context=explore)
111+
[Docker Hub](https://hub.docker.com/r/livepeer/ai-runner/tags?name=segment-anything-2-latest)
112112
using the following command:
113113

114114
```bash

ai/pipelines/text-to-image.mdx

+2-2
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,14 @@ graph LR
3030

3131
The current warm model requested for the `text-to-image` pipeline is:
3232

33-
- [SG161222/RealVisXL_V4.0_Lightning](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning):
33+
- [SG161222/RealVisXL_V4.0_Lightning](https://huggingface.co/SG161222/RealVisXL_V4.0_Lightning):
3434
A streamlined version of RealVisXL_V4.0, designed for faster inference while
3535
still aiming for photorealism.
3636

3737
Furthermore, several Orchestrators are currently maintaining the following model
3838
in a ready state:
3939

40-
- [ByteDance/SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning):
40+
- [ByteDance/SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning):
4141
A high-performance diffusion model developed by ByteDance.
4242

4343
<Tip>

ai/pipelines/text-to-speech.mdx

+32-13
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,22 @@ title: Text-to-Speech
44

55
## Overview
66

7-
The text-to-speech endpoint in Livepeer utilizes [Parler-TTS](https://github.com/huggingface/parler-tts), specifically `parler-tts/parler-tts-large-v1`. This model can generate speech with customizable characteristics such as voice type, speaking style, and audio quality.
7+
The text-to-speech endpoint in Livepeer utilizes
8+
[Parler-TTS](https://github.com/huggingface/parler-tts), specifically
9+
`parler-tts/parler-tts-large-v1`. This model can generate speech with
10+
customizable characteristics such as voice type, speaking style, and audio
11+
quality.
812

913
## Basic Usage Instructions
1014

1115
<Tip>
12-
For a detailed understanding of the `text-to-speech` endpoint and to experiment
13-
with the API, see the [Livepeer AI API
16+
For a detailed understanding of the `text-to-speech` endpoint and to
17+
experiment with the API, see the [Livepeer AI API
1418
Reference](/ai/api-reference/text-to-speech).
1519
</Tip>
1620

17-
To use the text-to-speech feature, submit a POST request to the `/text-to-speech` endpoint. Here's an example of how to structure your request:
21+
To use the text-to-speech feature, submit a POST request to the
22+
`/text-to-speech` endpoint. Here's an example of how to structure your request:
1823

1924
```bash
2025
curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
@@ -28,29 +33,43 @@ curl -X POST "http://<GATEWAY_IP>/text-to-speech" \
2833

2934
### Request Parameters
3035

31-
- `model_id`: The ID of the text-to-speech model to use. Currently, this should be set to `"parler-tts/parler-tts-large-v1"`.
36+
- `model_id`: The ID of the text-to-speech model to use. Currently, this should
37+
be set to `"parler-tts/parler-tts-large-v1"`.
3238
- `text`: The text you want to convert to speech.
33-
- `description`: A description of the desired voice characteristics. This can include details about the speaker's voice, speaking style, and audio quality.
39+
- `description`: A description of the desired voice characteristics. This can
40+
include details about the speaker's voice, speaking style, and audio quality.
3441

3542
### Voice Customization
3643

37-
You can customize the generated voice by adjusting the `description` parameter. Some aspects you can control include:
44+
You can customize the generated voice by adjusting the `description` parameter.
45+
Some aspects you can control include:
3846

3947
- Speaker identity (e.g., "Jon's voice")
4048
- Speaking style (e.g., "monotone", "expressive")
4149
- Speaking speed (e.g., "slightly fast")
4250
- Audio quality (e.g., "very close recording", "no background noise")
4351

44-
The checkpoint was trained on 34 speakers. The full list of available speakers includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan, Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa, Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce, and Emily.
52+
The checkpoint was trained on 34 speakers. The full list of available speakers
53+
includes: Laura, Gary, Jon, Lea, Karen, Rick, Brenda, David, Eileen, Jordan,
54+
Mike, Yann, Joy, James, Eric, Lauren, Rose, Will, Jason, Aaron, Naomie, Alisa,
55+
Patrick, Jerry, Tina, Jenna, Bill, Tom, Carol, Barbara, Rebecca, Anna, Bruce,
56+
and Emily.
4557

46-
However, the models performed better with certain speakers. A list of the top 20 speakers for each model variant, ranked by their average speaker similarity scores can be found [here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)
58+
However, the models performed better with certain speakers. A list of the top 20
59+
speakers for each model variant, ranked by their average speaker similarity
60+
scores can be found
61+
[here](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md#speaker-consistency)
4762

4863
## Limitations and Considerations
4964

50-
- The maximum length of the input text may be limited. For long-form content, you will need to split your text into smaller chunks. The training default configuration in parler-tts is max 30sec, max text length 600 characters.
51-
https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training
52-
- While the model supports various voice characteristics, the exact replication of a specific speaker's voice is not guaranteed.
53-
- The quality of the generated speech can vary based on the complexity of the input text and the specificity of the voice description.
65+
- The maximum length of the input text may be limited. For long-form content,
66+
you will need to split your text into smaller chunks. The training default
67+
configuration in parler-tts is max 30sec, max text length 600 characters.
68+
https://github.com/huggingface/parler-tts/blob/main/training/README.md#3-training
69+
- While the model supports various voice characteristics, the exact replication
70+
of a specific speaker's voice is not guaranteed.
71+
- The quality of the generated speech can vary based on the complexity of the
72+
input text and the specificity of the voice description.
5473

5574
## Orchestrator Configuration
5675

ai/pipelines/upscale.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ graph LR
3030

3131
The current warm model requested for the `upscale` pipeline is:
3232

33-
- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler):
33+
- [stabilityai/stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler):
3434
A text-guided upscaling diffusion model trained on large LAION images,
3535
offering enhanced resolution and controlled noise addition.
3636

0 commit comments

Comments
 (0)