-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: introduce APIs for retrieving chat completion requests #2145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sorry for my ignorance, but how is this supposed to work in a multi-user environment to return only user-specific completions and not every log for all users? Or is the assumption that only one user per llama-stack server is supported ? (sorry if this is a stupid question, still making my way through lls) |
I think this is the key argument and it makes sense. There are a few things to discuss here:
An alternate design could be to double down on Telemetry itself and say (in the spec, somehow) what each API call must do as Telemetry side-effects with standardized attributes. Then, you can lean on the more general Telemetry APIs for querying. ==== |
no stupid questions :) We'll follow up with attribute based auth support, a la what was done in agents API: #1737 |
I used
We could have it under the corresponding API's namespace, but I guess the path wouldn't be RESTful given the above? I.e. for inference, we could have (GET /inference/openai_chat_completions) for 'list openai chat completions'? Maybe this is fine as it's not that different from
Yes, agreed. |
@ehhuang yeah I don't think we should use a new namespace ( Relatedly, assuming we standardize on to the OpenAI inference APIs, do we need |
Ok, I can move it to /inference and get rid of openai_ prefix assuming the stack chat_completions will soon be deprecated |
Thanks for getting back to me :-). I understand the authentication part. What I still don't get is how a user gets back only her own chat completions and not chat completions from every user. Looking at
for The difference between telemetry and conversational logs, IMO, is that telemetry is a global concern, with access on an admin level, whereas conversational logs target the end user persona and need to be specific for each user. For that (and other reasons), I wouldn't put user-specific logs under the telemetry category. Global logs, though, are definitely observability features (like traces and metrics). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the name “log” makes me think of server logs, but that might just be my perspective. Because of that, this feels more like an admin-level API. While I fully understand the UI use case, could we also explore other scenarios that would justify making this API “public” rather than keeping it as an internal-only interface?
I get that using SQLite to store logs is a simple starting point, but in a real-world scenario, this new API would be a strong candidate for Redis or MongoDB, as the data will likely grow quickly.
Also, I thought we had agreed to have one PR for the API design and a separate PR for the implementation. :)
Lastly, we still need tests! 😄 Thanks!
|
||
|
||
class LogConfig(BaseModel): | ||
log_db_path: str = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_db_path: str = Field( | |
log_db_path: Path = Field( |
Use Path instead or str?
llama_stack/apis/log/log.py
Outdated
async def store_chat_completion(self, chat_completion: ChatCompletion) -> None: ... | ||
|
||
@webmethod(route="/log/openai_chat_completions", method="GET") | ||
async def list_chat_completions(self) -> ListChatCompletionsResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should return a PaginatedResponse
.
def convert_response_to_chat_completion(response: ChatCompletionResponse) -> ChatCompletion: | ||
from rich.pretty import pprint | ||
|
||
pprint(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
conn.commit() | ||
cursor.close() | ||
|
||
async def list_chat_completions(self) -> ListChatCompletionsResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should paginate the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+100
conn.row_factory = sqlite3.Row | ||
cursor = conn.cursor() | ||
|
||
cursor.execute("SELECT * FROM chat_completions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a range with pagination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if a particular user should be able to see all chat_completions
coming from all users, too, except when LLS is operated in a single-user setup. (but see my comments above, too)
log
APIs
Thank you all for the feedback so far. As discussed, this will not be a separate |
24b46cc
to
966fc32
Compare
96194a1
to
dfdd9ef
Compare
I've updated the PR to only include API definition and the description to reflect such. API implementation will be in a follow-up PR. |
@@ -759,7 +759,7 @@ def _build_operation(self, op: EndpointOperation) -> Operation: | |||
) | |||
|
|||
return Operation( | |||
tags=[op.defining_class.__name__], | |||
tags=[op.defining_class.__name__ if op.defining_class.__name__ != "InferenceProvider" else "Inference"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to make this general somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we stick some attribute in the protocol perhaps?
Just a note as we think about OpenAI-API compatibility for inference, we'll need the following API endpoints along with the parameters and such from the API spec (for example https://platform.openai.com/docs/api-reference/chat/list) if we're aiming for 100% compatibility:
If we're following that API spec, then it also makes decisions for us on things like pagination and sort keys. This isn't to say we have to follow it exactly, as these others methods are not as widely used in the wild that I've seen compared to the endpoints to create chat completions and responses. It may be worth at least considering since we want overlapping functionality. |
This is indeed a discussion point. Note the APIs here are not under /v1/openai/, and hence they're not compatible. I opted to have it not under openai since 1) OpenAI's list chat completions don't return input messages, and 2) it has fields that's not immediately applicable to us (e.g. system_fingerprint, service_tier, etc.). An alternative is to make this openai-compliant and add another field Thoughts? cc @ashwinb |
Yes let's just go for OpenAI compatibility at this point for the inference-related APIs. If they can make it work, we can make it work too. I had missed that angle when I looked at it first. |
c2ab26c
to
6093af5
Compare
I updated the PR to make it fully OAI compliant and under openai/v1/. The returned data has an additional |
@@ -759,7 +759,7 @@ def _build_operation(self, op: EndpointOperation) -> Operation: | |||
) | |||
|
|||
return Operation( | |||
tags=[op.defining_class.__name__], | |||
tags=[getattr(op.defining_class, "API_NAMESPACE", op.defining_class.__name__)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
""" | ||
|
||
API_NAMESPACE: str = "Inference" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is perfect
What does this PR do?
This PR introduces APIs to retrieve past chat completion requests, which will be used in the LS UI.
Our current
Telemetry
is ill-suited for this purpose as it's untyped so we'd need to filter by obscure attribute names, making it brittle.Since these APIs are 'provided by stack' and don't need to be implemented by inference providers, we introduce a new InferenceProvider class, containing the existing inference protocol, which is implemented by inference providers.
The APIs are OpenAI-compliant, with an additional
input_messages
field.Test Plan
This PR just adds the API and marks them provided_by_stack. S
tart stack server -> doesn't crash