Description
Source Code
Labels: A-ai-api, C-enhancement, E-easy, javascript
RFC: Inference API
Comming from PR #436 the Inference API is an user friendly interface that allows developers easily run their own models using the power of the low level onnx rust backend
.
It's based on two core componenents RawSession
and RawTensor
RawSession
: A low levelSupabase.ai.Session
that can execute any.onnx
model. It's recommended for use cases where need more control of the pre/pos-processing steps as well when need to executelinear regression
,tabular classification
and self-made models.
For common tasks like
nlp
,audio
orcomputer-vision
. The huggingface/transformers.js is recommended, since it already does all the pre/pos-processing stuff.
RawTensor
: A low level data representation of the model input/output. Inference API's Tensors are fully compatible with Transformers.js Tensors. It means that developers can still be using the high-lavel abstractions thattransformers.js
provides, like:.sum()
,.normalize()
,.min()
Examples:
Simple utilization:
Loading a RawSession
:
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
// or using the model file url direclty
const session = await RawSession.fromUrl("https://example.com/model.onnx");
Executing a RawSession
with RawTensor
:
const session = await RawSession.fromUrl("https://example.com/model.onnx");
// Prepare the input tensors
const inputs = {
input1: new RawTensor("float32", [1.0, 2.0, 3.0], [1, 3]),
input2: new RawTensor("float32", [4.0, 5.0, 6.0], [1, 3]),
};
const outputs = await session.run(inputs);
console.log(outputs.output1); // Output tensor
Generating embeddings from scratch:
This example demonstrates how Inference API can be used to complex scenarios while taking advantage of Transformers.js high-level functions
import { Tensor } from "@huggingface/transformers.js";
const { RawTensor, RawSession } = Supabase.ai;
const session = await RawSession.fromHuggingFace('Supabase/gte-small');
// Example only, in real 'feature-extraction' tensors are given from the tokenizer step.
// Consider 'n' as the batch size
const inputs = {
input_ids: new RawTensor('float32', [1, 2, 3...], [n, 2]),
attention_mask: new RawTensor('float32', [...], [n, 2]),
// @ts-ignore: mixing Tensors from both
token_types_ids: new Tensor('float32', [...], [n, 2])
};
const { last_hidden_state } = await session.run(inputs);
// Using `transformers.js` APIs
const hfTensor = Tensor.mean_pooling(last_hidden_state, inputs.attention_mask).normalize();
return hfTensor.tolist();
Self-made models
This example ilustrate how users can train their own model and execute it direclty from edge-runtime
Here you can check a Deployable example of it, with the current Supa stack
The model was trained to expect the following object payload
[
{
"Model_Year": 2021,
"Engine_Size": 2.9,
"Cylinders": 6,
"Fuel_Consumption_in_City": 13.9,
"Fuel_Consumption_in_City_Hwy": 10.3,
"Fuel_Consumption_comb": 12.3,
"Smog_Level": 3,
},
{
"Model_Year": 2023,
"Engine_Size": 2.4,
"Cylinders": 4,
"Fuel_Consumption_in_City": 9.9,
"Fuel_Consumption_in_City_Hwy": 7.0,
"Fuel_Consumption_comb": 8.6,
"Smog_Level": 3,
}
]
Then the model inference can done inside a common Edge Function
const { RawTensor, RawSession } = Supabase.ai;
// Custom filename on Hugging Face, default: 'model_quantized.onnx'
const session = await RawSession.fromHuggingFace('kallebysantos/vehicle-emission', {
path: {
modelFile: 'model.onnx',
},
});
Deno.serve(async (req: Request) => {
const carsBatchInput = await req.json();
// Parsing objects to tensor input
const inputTensors = {};
session.inputs.forEach((inputKey) => {
const values = carsBatchInput.map((item) => item[inputKey]);
// This model uses `float32` tensors, but could variate to mixed types
inputTensors[inputKey] = new RawTensor('float32', values, [values.length, 1]);
});
const { emissions } = await session.run(inputTensors);
return Response.json({ result: emissions }); // [ 289.01, 199.53]
});
Type definitions
This typescript definitions should be added to supabase/functions-js
declare namespace Supabase {
/**
* Provides AI related APIs
*/
export interface Ai {
/** Provides an user friendly interface for the low level *onnx backend API*.
* A `RawSession` can execute any *onnx* model, but we only recommend it for `tabular` or *self-made* models, where you need mode control of model execution and pre/pos-processing.
* Consider a high-level implementation like `@huggingface/transformers.js` for generic tasks like `nlp`, `computer-vision` or `audio`.
*
* **Example:**
* ```typescript
* const session = await RawSession.fromHuggingFace('Supabase/gte-small');
* // const session = await RawSession.fromUrl("https://example.com/model.onnx");
*
* // Prepare the input tensors
* const inputs = {
* input1: new Tensor("float32", [1.0, 2.0, 3.0], [3]),
* input2: new Tensor("float32", [4.0, 5.0, 6.0], [3]),
* };
*
* // Run the model
* const outputs = await session.run(inputs);
*
* console.log(outputs.output1); // Output tensor
* ```
*/
readonly RawSession: typeof RawSession;
/** A low level representation of model input/output.
* Supabase's `Tensor` is totally compatible with `@huggingface/transformers.js`'s `Tensor`. It means that you can use its high-level API to apply some common operations like `sum()`, `min()`, `max()`, `normalize()` etc...
*
* **Example: Generating embeddings from scratch**
* ```typescript
* import { Tensor } from "@huggingface/transformers.js";
* const { RawTensor, RawSession } = Supabase.ai;
*
* const session = await RawSession.fromHuggingFace('Supabase/gte-small');
*
* // Example only, in real 'feature-extraction' tensors are given from the tokenizer step.
* const inputs = {
* input_ids: new RawTensor('float32', [...], [n, 2]),
* attention_mask: new RawTensor('float32', [...], [n, 2]),
* token_types_ids: new Tensor('float32', [...], [n, 2]) // Hugging face tensor
* };
*
* const { last_hidden_state } = await session.run(inputs);
*
* // Using `transformers.js` APIs
* const hfTensor = HFTensor.mean_pooling(last_hidden_state, inputs.attention_mask).normalize();
*
* return hfTensor.tolist();
*
* ```
*/
readonly RawTensor: typeof RawTensor;
}
/**
* Provides AI related APIs
*/
export const ai: Ai;
export type TensorDataTypeMap = {
float32: Float32Array | number[];
float64: Float64Array | number[];
string: string[];
int8: Int8Array | number[];
uint8: Uint8Array | number[];
int16: Int16Array | number[];
uint16: Uint16Array | number[];
int32: Int32Array | number[];
uint32: Uint32Array | number[];
int64: BigInt64Array | number[];
uint64: BigUint64Array | number[];
bool: Uint8Array | number[];
};
export type TensorMap = { [key: string]: Tensor<keyof TensorDataTypeMap> };
export class RawTensor<T extends keyof TensorDataTypeMap> {
/** Type of the tensor. */
type: T;
/** The data stored in the tensor. */
data: TensorDataTypeMap[T];
/** Dimensions of the tensor. */
dims: number[];
/** The total number of elements in the tensor. */
size: number;
constructor(type: T, data: TensorDataTypeMap[T], dims: number[]);
}
export class RawSession {
/** The underline session's ID.
* Session's ID are unique for each loaded model, it means that even if a session is constructed twice its will share the same ID.
*/
id: string;
/** A list of all input keys the model expects. */
inputs: string[];
/** A list of all output keys the model will result. */
outputs: string[];
/** Loads a ONNX model session from source URL.
* Sessions are loaded once, then will keep warm cross worker's requests
*/
static fromUrl(source: string | URL): Promise<RawSession>;
/** Loads a ONNX model session from **HuggingFace** repository.
* Sessions are loaded once, then will keep warm cross worker's requests
*/
static fromHuggingFace(repoId: string, opts?: {
/**
* @default 'https://huggingface.co'
*/
hostname?: string | URL;
path?: {
/**
* @default '{REPO_ID}/resolve/{REVISION}/onnx/{MODEL_FILE}?donwload=true'
*/
template?: string;
/**
* @default 'main'
*/
revision?: string;
/**
* @default 'model_quantized.onnx'
*/
modelFile?: string;
};
}): Promise<RawSession>;
/** Run the current session with the given inputs.
* Use `inputs` and `outputs` properties to know the required inputs and expected results for the model session.
*
* @param inputs The input tensors required by the model.
* @returns The output tensors generated by the model.
*
* @example
* ```typescript
* const session = await RawSession.fromUrl("https://example.com/model.onnx");
*
* // Prepare the input tensors
* const inputs = {
* input1: new RawTensor("float32", [1.0, 2.0, 3.0], [3]),
* input2: new RawTensor("float32", [4.0, 5.0, 6.0], [3]),
* };
*
* // Run the model
* const outputs = await session.run(inputs);
*
* console.log(outputs.output1); // Output tensor
* ```
*/
run(inputs: TensorMap): Promise<TensorMap>;
}
}
Some ideas that could be also implemented:
- Add Supabase Storage integration
- Possibility to edit request headers for external authentication
- Fine control of the Session Id
- Model size constraints, check the size before downloading the model