Skip to content

api inference mini fork #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

api inference mini fork #109

wants to merge 18 commits into from

Conversation

oOraph
Copy link
Contributor

@oOraph oOraph commented Apr 16, 2025

  • Possibility to override some inference params (related to diffusion) so that the default inference is ok when user does not specify any such params
  • Multi task support with one deployment (example: sentence-similarity + sentence-embeddings)
  • api-inference compat env var

@oOraph oOraph changed the title Dev/api inference mini fork api inference mini fork Apr 17, 2025
@oOraph oOraph force-pushed the dev/api-inference-mini-fork branch 5 times, most recently from b93b802 to 7f17bb6 Compare May 2, 2025 14:05
if default_num_steps:
kwargs["num_inference_steps"] = int(default_num_steps)

if "guidance_scale" not in kwargs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useful for sd 3.5 turbo -> we want guidance scale 0 by default (e.g when not specified by user) because the num steps is too low, so that generated images are ok

@@ -20,3 +23,7 @@ def strtobool(val: str) -> bool:
raise ValueError(
f"Invalid truth value, it should be a string but {val} was provided instead."
)


def api_inference_compat():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this env var we intend to handle the small response differences between the api inference widgets on the hub and on endpoints ui. TODO: we should probably unify both widgets instead

@oOraph oOraph requested review from co42 and alvarobartt May 5, 2025 07:57
Route("/predict", predict, methods=["POST"]),
Route("/metrics", metrics, methods=["GET"]),
]
if api_inference_compat():
Copy link
Contributor Author

@oOraph oOraph May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only activated multi task for api inference (as a test) but we may want to remove this condition and just always support it if we're satisified with it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually thinking about it -> we may want a separate env var (and keep deactivated by default for regular users, and provide an option for it in endpoints instead) because the pod may consume more ram than expected (due to the pipeline duplications) with this route

@oOraph oOraph force-pushed the dev/api-inference-mini-fork branch 5 times, most recently from 459f3b6 to 39db7c6 Compare May 9, 2025 09:27
oOraph added 14 commits May 13, 2025 18:16
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
…cale defaults to 0 when num steps <=4

Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
More flexibility than an exact string match since there can be some additional params

Signed-off-by: Raphael Glon <[email protected]>
…ion for sentence transformers

Signed-off-by: Raphael Glon <[email protected]>
no reason not to accept it

Signed-off-by: Raphael Glon <[email protected]>
oOraph added 4 commits May 13, 2025 18:16
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
return an error instead

Signed-off-by: Raphael Glon <[email protected]>
… we do not know what to do with

Signed-off-by: Raphael Glon <[email protected]>
@oOraph oOraph force-pushed the dev/api-inference-mini-fork branch from c71a4c5 to c5565c2 Compare May 13, 2025 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant