-
Notifications
You must be signed in to change notification settings - Fork 18
api inference mini fork #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b93b802
to
7f17bb6
Compare
if default_num_steps: | ||
kwargs["num_inference_steps"] = int(default_num_steps) | ||
|
||
if "guidance_scale" not in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful for sd 3.5 turbo -> we want guidance scale 0 by default (e.g when not specified by user) because the num steps is too low, so that generated images are ok
@@ -20,3 +23,7 @@ def strtobool(val: str) -> bool: | |||
raise ValueError( | |||
f"Invalid truth value, it should be a string but {val} was provided instead." | |||
) | |||
|
|||
|
|||
def api_inference_compat(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this env var we intend to handle the small response differences between the api inference widgets on the hub and on endpoints ui. TODO: we should probably unify both widgets instead
Route("/predict", predict, methods=["POST"]), | ||
Route("/metrics", metrics, methods=["GET"]), | ||
] | ||
if api_inference_compat(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only activated multi task for api inference (as a test) but we may want to remove this condition and just always support it if we're satisified with it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually thinking about it -> we may want a separate env var (and keep deactivated by default for regular users, and provide an option for it in endpoints instead) because the pod may consume more ram than expected (due to the pipeline duplications) with this route
459f3b6
to
39db7c6
Compare
c71a4c5
to
c5565c2
Compare
67491ac
to
0818705
Compare
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
…cale defaults to 0 when num steps <=4 Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
More flexibility than an exact string match since there can be some additional params Signed-off-by: Raphael Glon <[email protected]>
…ard compat Signed-off-by: Raphael Glon <[email protected]>
…ion for sentence transformers Signed-off-by: Raphael Glon <[email protected]>
no reason not to accept it Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
…tch size dim Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
return an error instead Signed-off-by: Raphael Glon <[email protected]>
… we do not know what to do with Signed-off-by: Raphael Glon <[email protected]>
backported and adapted from https://github.com/huggingface/api-inference-community/blob/main/docker_images/diffusers/app/idle.py 1. adding gunicorn instead of uvicorn to allow for wsgi/asgi workers to easily be suppressed when idle whithout stopping the entire service -> easy way to release memory whithout digging into the depth of the imported modules 2. memory consuming libs lazy load (transformers, diffusers, sentence_transformers) 3. pipeline lazy load as well The first 'cold start' request tends to be a bit slower than others but the footprint is reduced to the minimum when idle Signed-off-by: Raphael Glon <[email protected]>
To optimize build time and enhance layer reuse Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
1ea58f3
to
2eda42a
Compare
Signed-off-by: Raphael Glon <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.