latent-to-image #469

hlky · 2025-02-07T13:32:16Z

Context: https://huggingface.slack.com/archives/C065E480NN9/p1738859836468269 (internal)

apolinario · 2025-02-07T14:06:30Z

api_inference_community/validation.py

@@ -196,8 +198,12 @@ def check_inputs(inputs, tag):
 IMAGE_OUTPUTS = {
    "image-to-image",
    "text-to-image",
+    "latent-to-image",


Not sure about the new tag, as it maps to the existing tasks and I think we probably don't want to create this as an official task while this is experimental. But maybe this is harmless to exist only in the inference API

Was added for

api-inference-community/api_inference_community/routes.py

Lines 229 to 241 in 329d9bb

elif task in IMAGE_OUTPUTS:

image = outputs

image_format = parse_accept(accept, IMAGE)

buffer = io.BytesIO()

image.save(buffer, format=image_format.upper())

buffer.seek(0)

img_bytes = buffer.read()

return Response(

img_bytes,

headers=headers,

status_code=200,

media_type=f"image/{image_format}",

)

Seems to be the only usage of IMAGE_OUTPUTS

We also need the task in KNOWN_TASKS (via TENSOR_INPUTS)

api-inference-community/api_inference_community/routes.py

Lines 96 to 102 in 329d9bb

if task not in KNOWN_TASKS:

msg = f"The task `{task}` is not recognized by api-inference-community"

logger.error(msg)

# Special case: despite the fact that the task comes from environment (which could be considered a service

# config error, thus triggering a 500), this var indirectly comes from the user

# so we choose to have a 400 here

return JSONResponse({"error": msg}, status_code=400)

Should be harmless for this to exist but I agree we don't want it to be an official task while this is experimental.

We can maybe find ways to "override" another task for the time being (without involving another task)

.gitignore

Narsil · 2025-02-10T15:29:49Z

api_inference_community/routes.py

@@ -128,7 +129,9 @@ async def pipeline_route(request: Request) -> Response:
        return JSONResponse({"error": str(e)}, status_code=500)

    try:
-        inputs, params = normalize_payload(payload, task, sampling_rate=sampling_rate)
+        inputs, params = normalize_payload(
+            payload, task, sampling_rate=sampling_rate, headers=headers


Probably not a good place to put the tensor shape in the headers. Why not just put it in the payload ?

I know base64 isn't perfect, but I'd rather go multipart upload than do something like this

Also changing this requires a release of the api-inference-community package which I think will slow the whole process down (because I will be way more picky about what gets actually merged since it's published as a package therefore harder to take down).

Can we start with base64 and change afterwards for more optimal ?

Changed to use base64.

Minimal example

import io import base64 import requests import torch from PIL import Image from safetensors.torch import _tobytes inputs = torch.randn([1, 4, 64, 64], generator=torch.Generator().manual_seed(0)) shape = list(inputs.shape) dtype = str(inputs.dtype).split(".")[-1] tensor_data = base64.b64encode(_tobytes(inputs, "inputs")).decode("utf-8") parameters = {"shape": shape, "dtype": dtype} data = {"inputs": tensor_data, "parameters": parameters} response = requests.post("http://127.0.0.1:8000/", json=data) Image.open(io.BytesIO(response.content))

Generation example

Using Lykon/dreamshaper-8 finetune here for generation, can use stable-diffusion-v1-5/stable-diffusion-v1-5 for VAE.

import io import base64 import requests import torch from PIL import Image from safetensors.torch import _tobytes from diffusers import DiffusionPipeline pipeline = DiffusionPipeline.from_pretrained( "Lykon/dreamshaper-8", safety_checker=None, torch_dtype=torch.float16, variant="fp16", ).to("cuda") prompt = "a cute cat sitting beside the sea beach." inputs = pipeline( prompt=prompt, negative_prompt="bad quality, worse quality, degenerate quality", output_type="latent", ).images shape = list(inputs.shape) dtype = str(inputs.dtype).split(".")[-1] tensor_data = base64.b64encode(_tobytes(inputs, "inputs")).decode("utf-8") parameters = {"shape": shape, "dtype": dtype} data = {"inputs": tensor_data, "parameters": parameters} response = requests.post("http://127.0.0.1:8000/", json=data) Image.open(io.BytesIO(response.content))

Narsil · 2025-02-10T15:30:57Z

api_inference_community/validation.py

@@ -196,8 +198,12 @@ def check_inputs(inputs, tag):
 IMAGE_OUTPUTS = {
    "image-to-image",
    "text-to-image",
+    "latent-to-image",


We can maybe find ways to "override" another task for the time being (without involving another task)

Narsil · 2025-02-10T15:31:31Z

api_inference_community/validation.py

+    if sys.byteorder == "big":
+        arr = torch.from_numpy(arr.numpy().byteswap(inplace=False))


We can definitely assume byteorder is little.

Co-authored-by: Nicolas Patry <[email protected]>

latent-to-image

7846906

apolinario reviewed Feb 7, 2025

View reviewed changes

apolinario requested a review from Narsil February 7, 2025 15:13

hlky added 2 commits February 10, 2025 11:34

needs_upcasting, unscale/denormalize

74cbf4d

inputs

556a1c8

hlky mentioned this pull request Feb 10, 2025

[WIP] InferenceClient latent-to-image huggingface/huggingface_hub#2846

Closed

Narsil reviewed Feb 10, 2025

View reviewed changes

hlky and others added 3 commits February 10, 2025 15:37

Update .gitignore

752ca2c

Co-authored-by: Nicolas Patry <[email protected]>

use base64

cbab395

make

05e2aa8

Narsil approved these changes Feb 12, 2025

View reviewed changes

Narsil merged commit 79e3e17 into huggingface:main Feb 12, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

latent-to-image #469

latent-to-image #469

Uh oh!

hlky commented Feb 7, 2025

Uh oh!

apolinario Feb 7, 2025

Uh oh!

hlky Feb 7, 2025

Uh oh!

Narsil Feb 10, 2025

Uh oh!

Uh oh!

Narsil Feb 10, 2025

Uh oh!

Narsil Feb 10, 2025

Uh oh!

hlky Feb 10, 2025

Uh oh!

Narsil Feb 10, 2025

Uh oh!

Narsil Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

	elif task in IMAGE_OUTPUTS:
	image = outputs
	image_format = parse_accept(accept, IMAGE)
	buffer = io.BytesIO()
	image.save(buffer, format=image_format.upper())
	buffer.seek(0)
	img_bytes = buffer.read()
	return Response(
	img_bytes,
	headers=headers,
	status_code=200,
	media_type=f"image/{image_format}",
	)

	if task not in KNOWN_TASKS:
	msg = f"The task `{task}` is not recognized by api-inference-community"
	logger.error(msg)
	# Special case: despite the fact that the task comes from environment (which could be considered a service
	# config error, thus triggering a 500), this var indirectly comes from the user
	# so we choose to have a 400 here
	return JSONResponse({"error": msg}, status_code=400)

		if sys.byteorder == "big":
		arr = torch.from_numpy(arr.numpy().byteswap(inplace=False))

latent-to-image #469

latent-to-image #469

Uh oh!

Conversation

hlky commented Feb 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!