About a standard request API for the package #111
Replies: 4 comments 5 replies
-
|
So on initial review (cursory) the biggest bottleneck might be the prompt style, which is different then just API request format. Let me give you an example. I am now training an R tuned version of starchat (https://huggingface.co/HuggingFaceH4/starchat-beta), which itself would be a good default hugging-face model to suggest to users by the way. the promt needs to be formatted as follows: prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"where this is just one of many prompt formats models behind the HF api might expect, a competing code model that smaller and faster (replit: https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B ) has an entirely different prompt format. So we need a way for the user to specify the prompt format, or ideally for devs to tell the package what prompt format to use. we want to preven having to hardcode all this per model within a specific API... |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for your well-thought-out proposal! I appreciate the effort you've put into detailing your thoughts and structuring your ideas. It's clear that you've given this a lot of thought. Here are my thoughts:
Here are a few things that might be worth considering:
|
Beta Was this translation helpful? Give feedback.
-
|
This may deserve its own discussion thread, but what about local models? I don't know how to do that yet without taking on a |
Beta Was this translation helpful? Give feedback.
-
|
I have a (very much incomplete) start on the S3 structure over here: #109 Building on what you already have @calderonsamuel, I propose to use these class names based on recommendations in the S3 Chapter of Advanced R: Perform Request: Process Response: Overall, I like where we're headed. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
All right, it took a bit longer than I expected but here it is. The flowcharts were made with mermaid, you can expand them to full width with the
<->button in the top right section of the image. Have in mind that this intends to start a discussion, and any function or class name is not definitive. Also for some reason Github added indentation in some code chunks and you might notice it.This is a high level representation of how I think we should handle the process. This should include every API that exists now for every LLM task that uses an user text prompt. Have in mind that this doesn't include the internal logic of the chat app, only how we get to the response.
flowchart LR classDef Starter fill:#fcc classDef Process fill:#fe9 classDef Input fill:#def classDef Decision fill:#fc9 classDef SubG fill:#fe94 process_initialize([Start]):::Starter --> process_build("Build \nrequest \nskeleton"):::Process --> process_perform("Perform \nrequest"):::Process --> process_standarize("Standarize \nresponse"):::Process --> process_finalize([End]):::StarterWe can expand a little to include expected output per step. See that the output of the whole process is a standard response (standard in the sense that we have a consistent structure). Once we get a standard response, we can assume that we can have a standard process for handling the response inside the chat app or inside any other service/addin.
flowchart LR classDef Starter fill:#fcc classDef Process fill:#fe9 classDef Input fill:#def classDef Decision fill:#fc9 classDef SubG fill:#fe94 process_initialize([Start]):::Starter --> process_build("Build \nrequest \nskeleton"):::Process --> input_skeleton[/"standard \nrequest \nskeleton"/]:::Input --> process_perform("Perform \nrequest"):::Process --> input_response[/"raw \nresponse"/]:::Input --> process_standarize("Standarize \nresponse"):::Process --> input_standard[/"standard \nresponse\n"/]:::Input --> process_finalize([End]):::StarterAt this point, we could start considering that standarization could be enforced trough an S3 class. This means that there should be an standard (and hopefully obvious) way of handling any necessary input for the request. For that, we need to stablish a common list of possible inputs.
If we zoom on every process we get the following:
%%{ init: { 'flowchart': { 'curve': 'stepAfter' } } }%% flowchart LR classDef Starter fill:#fcc classDef Process fill:#fe9 classDef Input fill:#def classDef Decision fill:#fc9 classDef SubG fill:#fe94 subgraph sub_build ["Build request skeleton"] direction TB input_url[/base url/] input_model[/model/] input_prompt[/prompt/] input_api_key[/API KEY/] input_history[/"message history"/] input_extra[/"extra parameters"/] input_stream[/"stream"/] process_url("set url") process_auth("set \nauthentication") process_model("specify \nmodel") process_prompt("specify \nprompt") process_history("provide \nhistory") process_stream("set \nstream \noption") process_extra("add \nextra \nparams") process_url:::Process --> process_auth:::Process --> process_model:::Process --> process_prompt:::Process --> process_history:::Process --> process_stream:::Process --> process_extra:::Process input_url:::Input --> process_url input_api_key:::Input --> process_auth input_model:::Input --> process_model input_prompt:::Input --> process_prompt input_history:::Input --> process_history input_stream:::Input --> process_stream input_extra:::Input --> process_extra end subgraph sub_perform ["Perform request"] direction TB process_translate("translate request"):::Process process_callback("set callback"):::Process process_send_request("send request"):::Process input_callback[/callback/]:::Input dec_stream{"is stream?"}:::Decision process_translate --> dec_stream dec_stream -->|yes| process_callback --> process_send_request input_callback --> process_callback dec_stream -->|no| process_send_request end subgraph sub_standarize ["Standarize response"] direction TB process_resp_process("process response"):::Process process_resp_standarize("standarize"):::Process process_resp_process --> process_resp_standarize end process_initialize([Start]) input_request_return[/"standard \nrequest \nskeleton"/] input_response[/"raw \nresponse"/]:::Input input_standard_response[/"standard \nresponse"/]:::Input process_finalize([End]) process_initialize:::Starter --> sub_build:::SubG --> input_request_return:::Input --> sub_perform:::SubG --> input_response --> sub_standarize:::SubG --> input_standard_response --> process_finalize:::StarterBuilding the request
For building the request skeleton we need:
If we translate this to an R list, it would look like this for an OpenAI request:
For a HuggingFace inference API request it would have the same structure:
As you can see, the required parameters for the standard requests are things that we can assume any LLM API would require. The
extraparameters is open to any specific parameter they accept. An S3 class that wraps this would basically be just a constructor for the structure and names.For any specific API we just need to create a subclass that checks the types of the
extraparameters. It could also define sensible defaults for its skeleton.Performing the request
Here is where the work is heavier. Having a standard request skeleton, we need to:
For now, we need to do both in a single function (because we use
{curl}for streaming and{httr2}for non streaming, so we can't just pipe the perform process.With this in mind, we could define a generic
llm_request_perform().Then, for the OpenAI API we could perform like the following code chunk. When the request requires streaming and the
stream_handleris not provided, it will fail. We return both the skeleton and the response, to be able to standarize the response later.It is worth noting that the stream handler would probably have to be specifically tailored to a subclass of
llm_request_skeleton(e.g.llm_skeleton_openai). Also have in mind that the return value ofllm_request_perform()will have itsresponseelement in a not standard format.I haven't write the code necessary, but I think it would be worth to create an S3 class for this responses too.
Standarizing the response
Now we just need to grab the response and standarize it. That will mostly mean that we give it the same structure as a standard request skeleton, so we can re-use it for new requests or for any service. Of course, it would be better to have an S3 generic to standarize the response (or just getting the last response).
Of course, all this is just a high overview open to discussion and potentially buggy. I would love to read multiple feedback.
Beta Was this translation helpful? Give feedback.
All reactions