feat: openAI Files API Implementation#1982
feat: openAI Files API Implementation#1982sivanantha321 wants to merge 26 commits intoenvoyproxy:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1982 +/- ##
==========================================
+ Coverage 84.38% 84.48% +0.10%
==========================================
Files 130 133 +3
Lines 17985 18510 +525
==========================================
+ Hits 15176 15638 +462
- Misses 1867 1903 +36
- Partials 942 969 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/retest |
|
Related Documentation 3 document(s) may need updating based on files changed in this PR: Envoy's Space openai
|
65c9775 to
c26796c
Compare
|
/retest |
2 similar comments
|
/retest |
|
/retest |
| if f.File == nil { | ||
| return nil, "", fmt.Errorf("file is required") | ||
| } | ||
| filename := "anonymous_file" |
There was a problem hiding this comment.
do we want annoymous_file name? should we reject when filename is not provided by user?
There was a problem hiding this comment.
This behaviour is taken from official OpenAI SDK. https://github.com/openai/openai-go/blob/cbf83a69541f646ec84071a0040f3ca524c2238f/internal/apiform/encoder.go#L366
| } | ||
| } | ||
|
|
||
| if f.ExpiresAfter.Seconds != 0 { |
There was a problem hiding this comment.
may need a upper bound(30 days) check for expiresAfter.Seconds per
The number of seconds after the anchor time that the file will expire. Must be between 3600 (1 hour) and 2592000 (30 days).
minimum3600
maximum2592000
There was a problem hiding this comment.
The limit may change for different providers. So I haven't added the check here.
| // Any of "uploaded", "processed", "error". | ||
| // | ||
| // Deprecated: deprecated | ||
| Status FileObjectStatus `json:"status"` |
There was a problem hiding this comment.
status is deprecated, maybe we don't add it here at all?
There was a problem hiding this comment.
I have added it for compatibility reasons. I can remove it if you want.
| debugLogEnabled: debugLogEnabled, | ||
| enableRedaction: enableRedaction, | ||
| processorFactories: make(map[string]ProcessorFactory), | ||
| processorFactories: make([]*RouteProcessorMapper, 0, 20), |
There was a problem hiding this comment.
any particular reason to set capacity 20?
There was a problem hiding this comment.
no particular reason. It is to avoid repeated reallocation as the default capacity is usually low.
| } | ||
|
|
||
| // RequestBody implements [OpenAIRetrieveFileTranslator.RequestBody]. | ||
| func (o *openAIToOpenAITranslatorV1RetrieveFile) RequestBody(reqHeaders map[string]string, original []byte, _ *struct{}, forceBodyMutation bool) ( |
There was a problem hiding this comment.
do we need to decode id for subsequent request routing to right backend when retrieve file?
f1c2db2 to
aca203a
Compare
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
- Refactor server to support method-based routing for processors using regex. - Introduce new tracing capabilities for OpenAI file operations including retrieval and deletion. - Implement translators for OpenAI file API endpoints: retrieve, retrieve content and delete files. Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…penAI Files API Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
- Update Translator.RequestBody signature to include request headers in internal/translator/translator.go. - Update translator implementations to accept the new signature in all OpenAI/Anthropic/Cohere translators. - Update translator tests to match the new signature across those files. File upload now requires model_name - Enforce presence of model_name in file upload multipart parsing in internal/endpointspec/endpointspec.go. - Encode the model into file IDs on create responses and rewrite Content-Length in internal/translator/openai_file.go. Decode file/batch IDs from request path (header‑only requests) - Add path‑based decoding for file/batch requests and set model + original/decoded ID headers in internal/extproc/processor_impl.go. - Add new header keys in internal/internalapi/internalapi.go. Update extproc mocks/tests for the new headers and decoding behavior. - Use decoded ID for routing + return original ID on retrieve/delete/content - Route retrieve/delete/content requests using decoded file IDs and echo original IDs in responses in internal/translator/openai_file.go Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…y management Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…res_after handling Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…erences Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…corders Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
- Implemented new processor filters for file operations including creation, retrieval, and deletion. - Added tests for file processing routes and header manipulations. - Introduced encoding and decoding functions for file IDs with model names. - Enhanced tracing capabilities for file-related operations. - Added comprehensive unit tests for the new functionality in the translator package. Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…ty; add unit tests for noOpMetrics Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…upstream Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…pstream Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Updates setup sequence to install AI Gateway after Envoy Gateway CRDs but before waiting for the Envoy Gateway deployment. This ensures required cluster roles are available for successful initialization and avoids startup issues related to missing permissions. Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Standardizes the extra parameter for file uploads by renaming the field from 'model_name' to 'model' across code, error messages, and tests. Improves consistency with API conventions and clarifies usage in documentation and validation logic. Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
…stants Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
aca203a to
0d46a2f
Compare
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
|
/retest |
Description
This PR implements the OpenAI Files API (https://platform.openai.com/docs/api-reference/files) in the Envoy AI Gateway, adding support for four file operations. The Files API is a prerequisite for the Batch Processing API (proposal #7: https://github.com/envoyproxy/ai-gateway/blob/main/docs/proposals/007-batch-processing/proposal.md), as batch jobs reference files uploaded via this API.
Implemented Endpoints
POST/v1/filesmultipart/form-datawith purpose, expiration, and amodelextra field for routingGET/v1/files/{file_id}GET/v1/files/{file_id}/contentDELETE/v1/files/{file_id}File ID Encoding & Multi-Backend Routing
A core challenge of the Files API in a gateway context is routing stickiness: once a file is uploaded to a specific backend (e.g., OpenAI, Azure), all subsequent operations on that file (retrieve, retrieve content, delete) must be routed to the same backend. Otherwise, the backend will return a "file not found" error.
To solve this without requiring the client to pass extra routing headers on every request, the gateway encodes the model/backend information directly into the file ID returned to the client. This is inspired by LiteLLM's approach [1].
Encoding Format
Key properties:
file-,batch_) so the ID looks valid to clients and SDKsRouting Flow — Example API Usage
Step 1: Upload a file — Client includes
modelas an extra multipart field:The gateway routes to the backend mapped to
gpt-4o-mini, and on the response, encodes the file ID:{ "id": "file-aWQ6ZmlsZS1hYmMxMjM7bW9kZWw6Z3B0LTRvLW1pbmk", "object": "file", "purpose": "fine-tune", "filename": "training_data.jsonl" }Step 2: Retrieve the file — Client uses the encoded ID as-is:
curl https://gateway.example.com/v1/files/file-aWQ6ZmlsZS1hYmMxMjM7bW9kZWw6Z3B0LTRvLW1pbmk \ -H "Authorization: Bearer $API_KEY"The gateway:
model: gpt-4o-miniandoriginal ID: file-abc123GET /v1/files/file-abc123to the upstreamStep 3: Retrieve file content:
curl https://gateway.example.com/v1/files/file-aWQ6ZmlsZS1hYmMxMjM7bW9kZWw6Z3B0LTRvLW1pbmk/content \ -H "Authorization: Bearer $API_KEY"Step 4: Delete the file:
curl -X DELETE https://gateway.example.com/v1/files/file-aWQ6ZmlsZS1hYmMxMjM7bW9kZWw6Z3B0LTRvLW1pbmk \ -H "Authorization: Bearer $API_KEY"The same decode → route → re-encode pattern applies for all operations.
Implementation Details
The encoding/decoding is implemented via two functions in
internal/translator/util.go:EncodeIDWithModel(id, modelName, idType)— used inResponseBodyofCreateFiletranslator to encode the file ID returned by the upstreamDecodeFileID(encodedID)— used in the processor/server layer to extract the model name and original ID from incoming requestsInternal headers (
OriginalFileIDHeaderKey,DecodedFileIDHeaderKey) carry the original and decoded file IDs through the request processing pipeline so that translators for Retrieve, RetrieveContent, and Delete can:Tracing: Non-Standard OpenInference Approach
The OpenInference specification [2] does not define semantic conventions for file operations. OpenInference has well-defined span types for LLM/chat completions [3], embeddings [4], etc., but file upload/retrieve/delete are not part of the specification.
This PR implements tracing for file operations using a non-standard adaptation of the OpenInference conventions:
CreateFileRecorderCreateFileopenai.FileNewParamsopenai.FileObjectoutput.file_idon successRetrieveFileRecorderRetrieveFilestruct{}(no body)openai.FileObjectoutput.file_idon successRetrieveFileContentRecorderRetrieveFileContentstruct{}(no body)struct{}(raw bytes)DeleteFileRecorderDeleteFilestruct{}(no body)openai.FileDeletedoutput.file_idon successWhat's non-standard:
openinference.span.kind = "LLM"andllm.system = "openai"— borrowing from existing OpenInference conventions even though file operations are not LLM inference calls. This is because OpenInference has no dedicated span kind for file/storage operations.trace.SpanKindInternal(consistent with other OpenInference spans in the codebase).openinference.span.kindandllm.systemare set, since file operations don't have model/prompt/token semantics.output.file_idandoutput.mime_typewhich are not part of the OpenInference spec.RetrieveFileContentrecords no output attributes at all, since the response is raw binary file data.NoopChunkRecordersince file operations are never streamed.Key Changes
API Schema Types (
internal/apischema/openai/openai.go):FileNewParams,FileObject,FileDeleted,FilePurpose,FileObjectPurpose,FileObjectStatus,FileNewParamsExpiresAfterUnmarshalMultipart/MarshalMultipartmethods onFileNewParamsfor handlingmultipart/form-dataencodingEndpoint Specs (
internal/endpointspec/endpointspec.go):CreateFileEndpointSpec,RetrieveFileEndpointSpec,RetrieveFileContentEndpointSpec,DeleteFileEndpointSpecParseBodysignature extended withrequestHeaders map[string]stringparameter to support Content-Type parsing for multipart requestsCreateFileEndpointSpec.ParseBodyvalidatesmultipart/form-dataContent-Type and boundary, then parses the body viaUnmarshalMultipartstruct{}since these are body-less operations (GET/DELETE)Server Routing (
internal/extproc/server.go):/v1/files/{file_id}) and method differentiation (GET vs DELETE on the same path)Processor (
internal/extproc/processor_impl.go):ProcessRequestHeaderssupport for body-less requests (GET/DELETE/HEAD) so routing can be initialized without waiting for a request bodyinitRequestextracted as a shared initialization path for both header-phase and body-phase processingTranslators (
internal/translator/openai_file.go):EncodeIDWithModel/DecodeFileIDininternal/translator/util.gofor multi-backend routing via encoded file IDsMetrics (
internal/metrics/noop_metrics.go):NoopMetrics/NoopMetricsFactoryfor endpoints that don't yet have dedicated metricsTests:
EncodeIDWithModel/DecodeFileIDround-trip encodingParseBodysignature and regex-based routingRelated Issues/PRs (if applicable)
Related: Proposal #7 — Batch Processing API Support [5]
Special notes for reviewers (if applicable)
ParseBodyinterface change (addedrequestHeadersparameter) touches all existing endpoint spec implementations — each gains an unused_ map[string]stringparameter to satisfy the interface.map[string]ProcessorFactoryto a[]Routewith regex + http method matching is a significant architectural change that affects all existing route registrations.NoopMetricsFactory— a TODO is noted for adding dedicated metrics support.modelis required as an extra multipart field during file upload — this is a gateway-specific requirement not present in the standard OpenAI Files API.1: https://github.com/envoyproxy/ai-gateway/blob/main/docs/proposals/007-batch-processing/proposal.md#introduce-fileid-and-batchid-encoding
2: https://github.com/Arize-ai/openinference/tree/main/spec
3: https://github.com/Arize-ai/openinference/blob/main/spec/semantic_conventions.md
4: https://github.com/Arize-ai/openinference/blob/main/spec/embedding_spans.md
5: https://github.com/envoyproxy/ai-gateway/blob/main/docs/proposals/007-batch-processing/proposal.md