-
Notifications
You must be signed in to change notification settings - Fork 16
api inference mini fork #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
oOraph
commented
Apr 16, 2025
•
edited
Loading
edited
- Possibility to override some inference params (related to diffusion) so that the default inference is ok when user does not specify any such params
- Multi task support with one deployment (example: sentence-similarity + sentence-embeddings)
- api-inference compat env var
b93b802
to
7f17bb6
Compare
if default_num_steps: | ||
kwargs["num_inference_steps"] = int(default_num_steps) | ||
|
||
if "guidance_scale" not in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useful for sd 3.5 turbo -> we want guidance scale 0 by default (e.g when not specified by user) because the num steps is too low, so that generated images are ok
@@ -20,3 +23,7 @@ def strtobool(val: str) -> bool: | |||
raise ValueError( | |||
f"Invalid truth value, it should be a string but {val} was provided instead." | |||
) | |||
|
|||
|
|||
def api_inference_compat(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this env var we intend to handle the small response differences between the api inference widgets on the hub and on endpoints ui. TODO: we should probably unify both widgets instead
Route("/predict", predict, methods=["POST"]), | ||
Route("/metrics", metrics, methods=["GET"]), | ||
] | ||
if api_inference_compat(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only activated multi task for api inference (as a test) but we may want to remove this condition and just always support it if we're satisified with it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually thinking about it -> we may want a separate env var (and keep deactivated by default for regular users, and provide an option for it in endpoints instead) because the pod may consume more ram than expected (due to the pipeline duplications) with this route
459f3b6
to
39db7c6
Compare
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
…cale defaults to 0 when num steps <=4 Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
More flexibility than an exact string match since there can be some additional params Signed-off-by: Raphael Glon <[email protected]>
…ard compat Signed-off-by: Raphael Glon <[email protected]>
…ion for sentence transformers Signed-off-by: Raphael Glon <[email protected]>
no reason not to accept it Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
…tch size dim Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
Signed-off-by: Raphael Glon <[email protected]>
return an error instead Signed-off-by: Raphael Glon <[email protected]>
… we do not know what to do with Signed-off-by: Raphael Glon <[email protected]>
c71a4c5
to
c5565c2
Compare