About a standard request API for the package #111

calderonsamuel · 2023-06-09T16:49:52Z

calderonsamuel
Jun 9, 2023
Collaborator

All right, it took a bit longer than I expected but here it is. The flowcharts were made with mermaid, you can expand them to full width with the <-> button in the top right section of the image. Have in mind that this intends to start a discussion, and any function or class name is not definitive. Also for some reason Github added indentation in some code chunks and you might notice it.

This is a high level representation of how I think we should handle the process. This should include every API that exists now for every LLM task that uses an user text prompt. Have in mind that this doesn't include the internal logic of the chat app, only how we get to the response.

flowchart LR

    classDef Starter fill:#fcc
    classDef Process fill:#fe9
    classDef Input fill:#def
    classDef Decision fill:#fc9
    classDef SubG fill:#fe94

    process_initialize([Start]):::Starter -->
    process_build("Build \nrequest \nskeleton"):::Process --> 
    process_perform("Perform \nrequest"):::Process -->
    process_standarize("Standarize \nresponse"):::Process -->
    process_finalize([End]):::Starter

We can expand a little to include expected output per step. See that the output of the whole process is a standard response (standard in the sense that we have a consistent structure). Once we get a standard response, we can assume that we can have a standard process for handling the response inside the chat app or inside any other service/addin.

flowchart LR

    classDef Starter fill:#fcc
    classDef Process fill:#fe9
    classDef Input fill:#def
    classDef Decision fill:#fc9
    classDef SubG fill:#fe94

    process_initialize([Start]):::Starter -->
    process_build("Build \nrequest \nskeleton"):::Process --> 
    input_skeleton[/"standard \nrequest \nskeleton"/]:::Input -->
    process_perform("Perform \nrequest"):::Process -->
    input_response[/"raw \nresponse"/]:::Input -->
    process_standarize("Standarize \nresponse"):::Process -->
    input_standard[/"standard \nresponse\n"/]:::Input -->
    process_finalize([End]):::Starter

At this point, we could start considering that standarization could be enforced trough an S3 class. This means that there should be an standard (and hopefully obvious) way of handling any necessary input for the request. For that, we need to stablish a common list of possible inputs.

If we zoom on every process we get the following:

%%{ init: { 'flowchart': { 'curve': 'stepAfter' } } }%%
flowchart LR

    classDef Starter fill:#fcc
    classDef Process fill:#fe9
    classDef Input fill:#def
    classDef Decision fill:#fc9
    classDef SubG fill:#fe94

    subgraph sub_build ["Build request skeleton"]
        direction TB

        input_url[/base url/]
        input_model[/model/]
        input_prompt[/prompt/]
        input_api_key[/API KEY/]
        input_history[/"message history"/]
        input_extra[/"extra parameters"/]
        input_stream[/"stream"/]

        process_url("set url")
        process_auth("set \nauthentication")
        process_model("specify \nmodel")
        process_prompt("specify \nprompt")
        process_history("provide \nhistory")
        process_stream("set \nstream \noption")
        process_extra("add \nextra \nparams")

        

        process_url:::Process -->
            process_auth:::Process -->
            process_model:::Process -->
            process_prompt:::Process -->
            process_history:::Process -->
            process_stream:::Process -->
            process_extra:::Process 

        input_url:::Input --> process_url
        input_api_key:::Input --> process_auth
        input_model:::Input --> process_model
        input_prompt:::Input --> process_prompt
        input_history:::Input --> process_history
        input_stream:::Input --> process_stream
        input_extra:::Input --> process_extra

    end


    subgraph sub_perform ["Perform request"]
        direction TB

        process_translate("translate request"):::Process
        process_callback("set callback"):::Process
        process_send_request("send request"):::Process
        input_callback[/callback/]:::Input

        dec_stream{"is stream?"}:::Decision

        process_translate -->
            dec_stream 
        
        dec_stream -->|yes| process_callback --> process_send_request
        input_callback --> process_callback
        dec_stream -->|no| process_send_request

    end

    subgraph sub_standarize ["Standarize response"]
        direction TB

        process_resp_process("process response"):::Process
        process_resp_standarize("standarize"):::Process

        process_resp_process -->
            process_resp_standarize

    end

    process_initialize([Start])
    input_request_return[/"standard \nrequest \nskeleton"/]
    input_response[/"raw \nresponse"/]:::Input
    input_standard_response[/"standard \nresponse"/]:::Input
    process_finalize([End])

    process_initialize:::Starter -->
    sub_build:::SubG --> 
    input_request_return:::Input --> 
    sub_perform:::SubG -->
    input_response -->
    sub_standarize:::SubG -->
    input_standard_response -->
    process_finalize:::Starter

Building the request

For building the request skeleton we need:

A base url
API KEY
model
prompt
message history (only relevant for chat completions)
stream boolean (for now, only relevant for chat completions)
extra parameters

If we translate this to an R list, it would look like this for an OpenAI request:

list(
	url = "https://api.openai.com/v1/chat/completions",
	api_key = "<OPENAI_API_KEY>",
	model = "gpt-3.5-turbo",
	prompt = "Count from 1 to 5",
	history = list(
		list(
			role = "system",
			content = "You are an R chat assistant"
		)
	),
	stream = FALSE,
	extra = list(
		max_tokens = 1000,
		n = 1
	)
)

For a HuggingFace inference API request it would have the same structure:

list(
	url = "https://api-inference.huggingface.co/models",
	api_key = "<HF_API_KEY>",
	model = "microsoft/DialoGPT-large",
	prompt = "Can you explain why ?",
	history = list(
		list(
			role = "user",
			content = "Which movie is the best ?"
		),
		list(
			role = "assistant",
			content = "It's Die Hard for sure."
		)
	),
	stream = FALSE,
	extra = list(
		max_length = 1000,
		temperature = 1
	)
)

As you can see, the required parameters for the standard requests are things that we can assume any LLM API would require. The extra parameters is open to any specific parameter they accept. An S3 class that wraps this would basically be just a constructor for the structure and names.

new_llm_request_skeleton <- function(url, api_key, model, prompt, history, 
							stream, ..., class = character()) {
	assertthat::assert_that(
		rlang::is_scalar_character(url),
		rlang::is_scalar_character(api_key) && api_key != "",
		rlang::is_scalar_character(model),
		rlang::is_scalar_character(prompt),
		rlang::is_list(history),
		rlang::bool(stream),
		msg = "Not a valid skeleton"
	)
	
	structure(
		list(
			url = url,
			api_key = api_key,
			model = model,
			prompt = prompt,
			history = history,
			stream = stream,
			extra = list(
				...
			)
		),
		class = c(class, "llm_request_skeleton")
	)
}

For any specific API we just need to create a subclass that checks the types of the extra parameters. It could also define sensible defaults for its skeleton.

new_llm_skeleton_openai <- function(
	url = "https://api.openai.com/v1/chat/completions",
	api_key = "<OPENAI_API_KEY>",
	model = "gpt-3.5-turbo",
	prompt = "Count from 1 to 5",
	history = list(
		list(
			role = "system",
			content = "You are an R chat assistant"
		)
	),
	stream = FALSE,
	max_tokens = 1000,
	n = 1
) {
	new_llm_request_skeleton(url, api_key, model, prompt, history, stream, # standard
							 max_tokens, n, # extras
							 class = "llm_skeleton_openai")
}

Performing the request

Here is where the work is heavier. Having a standard request skeleton, we need to:

Transform that into a valid request for an specific API and,
If needed, handle streaming.

For now, we need to do both in a single function (because we use {curl} for streaming and {httr2} for non streaming, so we can't just pipe the perform process.

With this in mind, we could define a generic llm_request_perform().

llm_request_perform <- function(x, ...) {
	UseMethod("llm_request_perform")
}

llm_request_perform.llm_request_skeleton <- function(x, ...) {
	# Not sure is this collides with the method below, but I'm trying to convey that
	# this should not work by itself
	stop("Translation only allowed for subclasses of `llm_request_skeleton`")
}

Then, for the OpenAI API we could perform like the following code chunk. When the request requires streaming and the stream_handler is not provided, it will fail. We return both the skeleton and the response, to be able to standarize the response later.

llm_request_perform.llm_skeleton_openai <- function(x, stream_handler = NULL) {
	
	# Translate request 
	messages <- c(
		x$history,
		list(
			list(role = "user", content = x$prompt)
		)
	)
	
	body <- list(
      "model" = x$model,
      "stream" = TRUE,
      "messages" = messages,
      "max_tokens" = x$extra$max_tokens,
      "n" = x$extra$n
    )
	
	# Perform request
	response <- NULL
	
	if (x$stream) {
		assertthat::assert_that(!is.null(stream_handler), msg = "This request needs a stream handler")
		
		headers <- list(
	      "Content-Type" = "application/json",
	      "Authorization" = paste0("Bearer ", x$api_key)
	    )
		
		handle <- curl::new_handle() |> 
			curl::handle_setheaders(.list = headers) |> 
			curl::handle_setopt(
				postfields = jsonlite::toJSON(body, auto_unbox = TRUE)
			)
		
		curl::curl_fetch_stream(
	      url = x$url,
	      fun = \(i) {
	        element <- rawToChar(i)
	        # this method could communicate with a shiny session
	        stream_handler$handle_streamed_element(element) 
	      },
	      handle = handle
	    )
		
		# this doesn't exist yet, but you get the idea
		response <- stream_handler$response_value()
		
	} else {
		
		response <- httr2::request(x$url) |> 
			httr2::req_auth_bearer_token(x$api_key) |> 
			httr2::req_body_json(body) |> 
			httr2::req_perform()
	}
	
	# return value
	structure(
		list(
			skeleton = x,
			response = response
		)
		class = "llm_response_openai"
	)
}

It is worth noting that the stream handler would probably have to be specifically tailored to a subclass of llm_request_skeleton (e.g. llm_skeleton_openai). Also have in mind that the return value of llm_request_perform() will have its response element in a not standard format.

I haven't write the code necessary, but I think it would be worth to create an S3 class for this responses too.

Standarizing the response

Now we just need to grab the response and standarize it. That will mostly mean that we give it the same structure as a standard request skeleton, so we can re-use it for new requests or for any service. Of course, it would be better to have an S3 generic to standarize the response (or just getting the last response).

llm_standarize_response.llm_response_openai <- function(x) {
	skeleton <- x$skeleton
	last_response <- llm_get_last_response(x) # another nice generic to have
	
	new_history <- c(
		skeleton$history,
		list(
			list(role = "user", content = skeleton$prompt),
			list(role = "assistant", content = last_response)
		)
	)
	
	skeleton$history <- new_history
	skeleton$prompt <- NULL # remove the last prompt
	
	# return value
	skeleton
}

Of course, all this is just a high overview open to discussion and potentially buggy. I would love to read multiple feedback.

MichelNivard · 2023-06-12T19:14:17Z

MichelNivard
Jun 12, 2023
Maintainer

So on initial review (cursory) the biggest bottleneck might be the prompt style, which is different then just API request format. Let me give you an example. I am now training an R tuned version of starchat (https://huggingface.co/HuggingFaceH4/starchat-beta), which itself would be a good default hugging-face model to suggest to users by the way. the promt needs to be formatted as follows:

prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"

where {query} is the user prompt, the assistant will then end its message with <|end|> which is the model EOS token.

this is just one of many prompt formats models behind the HF api might expect, a competing code model that smaller and faster (replit: https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B ) has an entirely different prompt format.

So we need a way for the user to specify the prompt format, or ideally for devs to tell the package what prompt format to use. we want to preven having to hardcode all this per model within a specific API...

1 reply

calderonsamuel Jun 13, 2023
Collaborator Author

From my understanding this would be handled in the "translate request" step inside the "perform" process. Right now the LLM open source space feels like wild west with new ideas and styles coming out very often, which is interesting and exciting btw. However, it makes me feel like, in the near future, we would need to be constantly keeping up with new features and versions (and models). Do you feel the same? Maybe I don't have enough understanding of the space.

I think a sensible middle ground would be to document (or even provide infrastructure) for using sub-subclasses

gptstudio_request_skeleton > gptstudio_request_huggingface > gptstudio_request_hf_starchat
gptstudio_request_skeleton > gptstudio_request_huggingface > gptstudio_request_hf_replit

There would be a generic gptstudio_request_translate() with methods for both hf subclasses; and a generic gptstudio_request_perform(), which would have a method for the gptstudio_request_huggingface class.

Of course, if those models require support for stream at some point it will be more challenging to implement.

Let me know what you think about this.

JamesHWade · 2023-06-12T22:37:29Z

JamesHWade
Jun 12, 2023
Collaborator

Thank you for your well-thought-out proposal! I appreciate the effort you've put into detailing your thoughts and structuring your ideas. It's clear that you've given this a lot of thought.

Here are my thoughts:

Your mermaid diagrams are intuitive and clear. These would be valuable to have, especially for newcomers to our codebase who might be trying to understand how our system is structured.
Standardization of the LLM API is a great initiative. Enforcing a standardized structure for the API requests and responses is certainly the way to go.
The proposed object structure is a good one. I like that you've designed a structure that works for both OpenAI and HuggingFace API requests. Additionally, it makes it more extensible in case we want to add more APIs to the system in the future.
For the S3 class and constructor, it's a nice idea to have the constructor do the initial validation. This would prevent invalid skeletons from being created and would catch issues at the very beginning of the request process.
Performing the request will take some time to get right. The code you've shared seems to be a good starting point. I agree that handling streaming could be complex, but the initial setup you've proposed is a reasonable starting point. @MichelNivard makes a good point for HuggingFace. I need to better understand just how many prompt templates we're likely to encounter. It should be easier for commercial APIs.
Standarizing the response is a must and will make the package both more maintainable and extensible.

Here are a few things that might be worth considering:

Edge Cases and Error Handling: While creating the standard structure, we need to consider various edge cases and error conditions. For instance, what happens when the server is not reachable? Or when the API key is expired? Today, we handle that through error handling when performing the request and with the check_api() function. Should that become part of the workflow or do you consider implied?
Code Maintainability: This standardization is great. We should also ensure that the code doesn't become too complex or hard to maintain. One way to do this could be to modularize and isolate different parts of the request and response handling. Maybe we could use OpenAI, MakerSuite, HuggingFace, and Anthropic as representative examples to design from? I have minimal working examples to build from.
Testing: With these changes, it's a good time to rethink and maybe revamp our testing approach.

2 replies

calderonsamuel Jun 13, 2023
Collaborator Author

I think it is a good idea to make explicit the error handling in the flowchart.

For the requests skeleton, the gptstudio_request_skeleton constructor only checks for types and some lengths. I would expect the subclasses constructors (or validators) to check for malformed skeletons (for example an gptstudio_request_openai constructor should fail if provided with a max_tokens of 10000000).

For performing the requests we can think of something similar. {httr2} provides some error handling infrastructure for httr2_requests. For now, we can use them for no stream requests. Currently I don't know how we can do that for {curl} and stream requests.

Processing the response, in my mind, is a matter of getting a raw response and transform it into a new gptstudio_request_skeleton, so it should benefit from the same validators as the requests skeletons.

I totally agree with you about code maintainability and testing. In fact, I think it would be awesome to provide file generated infrastructure for every module. I'm thinking of what you get with golem::add_module(with_test = TRUE) or htmlwidgets::scaffoldWidget(). Both of them provide you with all the files you need and you just need to write your logic in code.

JamesHWade Jun 13, 2023
Collaborator

I love the golem-like approach. That would be great. Another one to use for inspiration is create custom recipes for tidymodels.

JamesHWade · 2023-06-12T22:41:52Z

JamesHWade
Jun 12, 2023
Collaborator

This may deserve its own discussion thread, but what about local models? I don't know how to do that yet without taking on a {reticulate} dependency, which I'd love to avoid if possible.

1 reply

calderonsamuel Jun 13, 2023
Collaborator Author

I haven't tried any local model yet and I don't think I will soon because my machine is very very basic. Do you have any code examples on how are people implementing them and how the request-response flow looks like?

JamesHWade · 2023-06-12T23:51:13Z

JamesHWade
Jun 12, 2023
Collaborator

I have a (very much incomplete) start on the S3 structure over here: #109

Building on what you already have @calderonsamuel, I propose to use these class names based on recommendations in the S3 Chapter of Advanced R:

Perform Request: gptstudio_request_perform() accepts a gptstudio_request_skeleton class. For example, the openai class would be gptstudio_request_openai.

Process Response: gptstudio_process_response() accepts a gpstudio_response_skeleton class. For example, the openai class would be gptstudio_response_openai.

Overall, I like where we're headed.

1 reply

calderonsamuel Jun 18, 2023
Collaborator Author

Awesome! My only comment on this would be to prefer package_object_verb() (or viceversa) consistently. With this, gptstudio_process_response() would become gptstudio_response_process()

About a standard request API for the package #111

Uh oh!

Uh oh!

calderonsamuel Jun 9, 2023 Collaborator

Building the request

Performing the request

Standarizing the response

Replies: 4 comments · 5 replies

Uh oh!

MichelNivard Jun 12, 2023 Maintainer

Uh oh!

calderonsamuel Jun 13, 2023 Collaborator Author

Uh oh!

JamesHWade Jun 12, 2023 Collaborator

Uh oh!

calderonsamuel Jun 13, 2023 Collaborator Author

Uh oh!

JamesHWade Jun 13, 2023 Collaborator

Uh oh!

JamesHWade Jun 12, 2023 Collaborator

Uh oh!

calderonsamuel Jun 13, 2023 Collaborator Author

Uh oh!

JamesHWade Jun 12, 2023 Collaborator

Uh oh!

calderonsamuel Jun 18, 2023 Collaborator Author

calderonsamuel
Jun 9, 2023
Collaborator

Replies: 4 comments 5 replies

MichelNivard
Jun 12, 2023
Maintainer

calderonsamuel Jun 13, 2023
Collaborator Author

JamesHWade
Jun 12, 2023
Collaborator

calderonsamuel Jun 13, 2023
Collaborator Author

JamesHWade Jun 13, 2023
Collaborator

JamesHWade
Jun 12, 2023
Collaborator

calderonsamuel Jun 13, 2023
Collaborator Author

JamesHWade
Jun 12, 2023
Collaborator

calderonsamuel Jun 18, 2023
Collaborator Author