api inference mini fork #109

oOraph · 2025-04-16T08:21:38Z

Possibility to override some inference params (related to diffusion) so that the default inference is ok when user does not specify any such params
Multi task support with one deployment (example: sentence-similarity + sentence-embeddings)
api-inference compat env var
memory footprint reducing: lazy imports of memory greedy libs (transformers) + uvicorn replaced by gunicorn (to fix the kick and run trick we use on idle workers ~ respawn self killed workers -> most immediate method we found to release memory without digging in all the different lib / alloc mechanics)

oOraph · 2025-05-05T07:52:30Z

src/huggingface_inference_toolkit/diffusers_utils.py

+            if default_num_steps:
+                kwargs["num_inference_steps"] = int(default_num_steps)
+
+        if "guidance_scale" not in kwargs:


useful for sd 3.5 turbo -> we want guidance scale 0 by default (e.g when not specified by user) because the num steps is too low, so that generated images are ok

oOraph · 2025-05-05T07:54:13Z

src/huggingface_inference_toolkit/env_utils.py

@@ -20,3 +23,7 @@ def strtobool(val: str) -> bool:
    raise ValueError(
        f"Invalid truth value, it should be a string but {val} was provided instead."
    )
+
+
+def api_inference_compat():


with this env var we intend to handle the small response differences between the api inference widgets on the hub and on endpoints ui. TODO: we should probably unify both widgets instead

oOraph · 2025-05-05T07:58:52Z

src/huggingface_inference_toolkit/webservice_starlette.py

+        Route("/predict", predict, methods=["POST"]),
+        Route("/metrics", metrics, methods=["GET"]),
+    ]
+    if api_inference_compat():


I only activated multi task for api inference (as a test) but we may want to remove this condition and just always support it if we're satisified with it)

actually thinking about it -> we may want a separate env var (and keep deactivated by default for regular users, and provide an option for it in endpoints instead) because the pod may consume more ram than expected (due to the pipeline duplications) with this route

Signed-off-by: Raphael Glon <[email protected]>

…cale defaults to 0 when num steps <=4 Signed-off-by: Raphael Glon <[email protected]>

Signed-off-by: Raphael Glon <[email protected]>

More flexibility than an exact string match since there can be some additional params Signed-off-by: Raphael Glon <[email protected]>

…ard compat Signed-off-by: Raphael Glon <[email protected]>

…ion for sentence transformers Signed-off-by: Raphael Glon <[email protected]>

no reason not to accept it Signed-off-by: Raphael Glon <[email protected]>

Signed-off-by: Raphael Glon <[email protected]>

…tch size dim Signed-off-by: Raphael Glon <[email protected]>

Signed-off-by: Raphael Glon <[email protected]>

return an error instead Signed-off-by: Raphael Glon <[email protected]>

… we do not know what to do with Signed-off-by: Raphael Glon <[email protected]>

backported and adapted from https://github.com/huggingface/api-inference-community/blob/main/docker_images/diffusers/app/idle.py 1. adding gunicorn instead of uvicorn to allow for wsgi/asgi workers to easily be suppressed when idle whithout stopping the entire service -> easy way to release memory whithout digging into the depth of the imported modules 2. memory consuming libs lazy load (transformers, diffusers, sentence_transformers) 3. pipeline lazy load as well The first 'cold start' request tends to be a bit slower than others but the footprint is reduced to the minimum when idle Signed-off-by: Raphael Glon <[email protected]>

To optimize build time and enhance layer reuse Signed-off-by: Raphael Glon <[email protected]>

Signed-off-by: Raphael Glon <[email protected]>

oOraph changed the title ~~Dev/api inference mini fork~~ api inference mini fork Apr 17, 2025

oOraph force-pushed the dev/api-inference-mini-fork branch 5 times, most recently from b93b802 to 7f17bb6 Compare May 2, 2025 14:05

oOraph commented May 5, 2025

View reviewed changes

oOraph requested review from co42 and alvarobartt May 5, 2025 07:57

oOraph commented May 5, 2025

View reviewed changes

oOraph force-pushed the dev/api-inference-mini-fork branch 5 times, most recently from 459f3b6 to 39db7c6 Compare May 9, 2025 09:27

oOraph force-pushed the dev/api-inference-mini-fork branch from c71a4c5 to c5565c2 Compare May 13, 2025 16:16

oOraph force-pushed the dev/api-inference-mini-fork branch 3 times, most recently from 67491ac to 0818705 Compare May 23, 2025 12:53

oOraph added 10 commits June 9, 2025 12:05

customize default num inference steps

6d08e97

Signed-off-by: Raphael Glon <[email protected]>

default content type env var

7e01334

Signed-off-by: Raphael Glon <[email protected]>

default accept env var

2d39740

Signed-off-by: Raphael Glon <[email protected]>

content type case ignore

d41f536

Signed-off-by: Raphael Glon <[email protected]>

Diffusers, txt2img (and img2img when supported), make sure guidance s…

b0f1b2d

…cale defaults to 0 when num steps <=4 Signed-off-by: Raphael Glon <[email protected]>

api inference compat response

77e870a

Signed-off-by: Raphael Glon <[email protected]>

fix: content-type and accept parsing

fc71ab9

More flexibility than an exact string match since there can be some additional params Signed-off-by: Raphael Glon <[email protected]>

Multi task support + /pipeline/<task> support for api-inference backw…

60745f3

…ard compat Signed-off-by: Raphael Glon <[email protected]>

substitute /pipeline/sentence-embeddings to /pipeline/feature-extract…

ba52d1e

…ion for sentence transformers Signed-off-by: Raphael Glon <[email protected]>

application/octet-stream support in content type deserialization

33d23f3

no reason not to accept it Signed-off-by: Raphael Glon <[email protected]>

oOraph added 12 commits June 9, 2025 12:05

fix(api inference): compat for text-classification token-classification

5bbf5a9

Signed-off-by: Raphael Glon <[email protected]>

fix: token classification api-inference-compat

422c7b2

Signed-off-by: Raphael Glon <[email protected]>

add timm dependency (for object detection)

ae367fd

Signed-off-by: Raphael Glon <[email protected]>

fix(api-inference): feature-extraction, flatten array, discard the ba…

1565769

…tch size dim Signed-off-by: Raphael Glon <[email protected]>

minor: make quality

3c75bcb

Signed-off-by: Raphael Glon <[email protected]>

install hf_xet

f6e1f85

Signed-off-by: Raphael Glon <[email protected]>

fix: avoid returning none as a serializer

d14b5c7

return an error instead Signed-off-by: Raphael Glon <[email protected]>

fix: de/serializer is not optional, do not support content type which…

603ce84

… we do not know what to do with Signed-off-by: Raphael Glon <[email protected]>

Dockerfile refacto: split requirements and source code layers

bb1eded

To optimize build time and enhance layer reuse Signed-off-by: Raphael Glon <[email protected]>

fix: minor, idle unload distinguish sleep time and timeout

088cad0

Signed-off-by: Raphael Glon <[email protected]>

fix: image segmentation on hf inference

2eda42a

Signed-off-by: Raphael Glon <[email protected]>

oOraph force-pushed the dev/api-inference-mini-fork branch from 1ea58f3 to 2eda42a Compare June 9, 2025 10:06

oOraph added 3 commits June 9, 2025 13:40

feat(hf-inference): disable custom handler

0bdb7c2

Signed-off-by: Raphael Glon <[email protected]>

minor: dockerfile

3daa1ad

Signed-off-by: Raphael Glon <[email protected]>

quality check

bb2a6c3

Signed-off-by: Raphael Glon <[email protected]>

oOraph force-pushed the dev/api-inference-mini-fork branch from b5fa0ea to bb2a6c3 Compare June 11, 2025 13:17

fix tests

a781375

Signed-off-by: Raphael Glon <[email protected]>

oOraph requested a review from XciD June 12, 2025 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

api inference mini fork #109

api inference mini fork #109

Uh oh!

oOraph commented Apr 16, 2025 •

edited

Loading

Uh oh!

oOraph May 5, 2025

Uh oh!

oOraph May 5, 2025

Uh oh!

oOraph May 5, 2025 •

edited

Loading

Uh oh!

oOraph May 5, 2025

Uh oh!

Uh oh!

api inference mini fork #109

Are you sure you want to change the base?

api inference mini fork #109

Uh oh!

Conversation

oOraph commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oOraph May 5, 2025

Choose a reason for hiding this comment

Uh oh!

oOraph May 5, 2025

Choose a reason for hiding this comment

Uh oh!

oOraph May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oOraph May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oOraph commented Apr 16, 2025 •

edited

Loading

oOraph May 5, 2025 •

edited

Loading