Skip to content

Commit a589c50

Browse files
jesserobbinsclaude
andauthored
feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage (#119)
* feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage Adds an `afm embed` subcommand that serves an OpenAI-compatible embeddings endpoint on top of Apple's NaturalLanguage `NLContextualEmbedding`, letting local-first clients use the same OpenAI interface for RAG / semantic search / clustering workflows that they already use for chat completions. Entirely on-device, no model downloads beyond what macOS already ships, no network dependency. Refs #118. ## HTTP surface - POST /v1/embeddings - Accepts a string, array of strings, or pre-tokenized IDs - `encoding_format`: `float` (default) or `base64` - `dimensions`: optional Matryoshka-style truncation + L2 renormalize (NL backend rejects request when it exceeds native dimension) - `X-Embedding-Truncated` response header counts inputs that exceeded the backend's max sequence length - Malformed JSON, missing fields, and unknown enum values return 400 with a descriptive `EmbeddingError.invalidInput` reason - Other 4xx AbortErrors (e.g. 415 Unsupported Media Type) pass through with their original status - Oversized (>1 MiB) bodies return 413 with an embeddings-specific error message, not the chat server's "conversation too long" text - OPTIONS /v1/embeddings and OPTIONS /v1/models register CORS preflight; Access-Control-Allow-Headers is reflected from Access-Control-Request-Headers (falling back to "Content-Type, Authorization, OpenAI-Organization, OpenAI-Project"), with `Vary: Origin, Access-Control-Request-Headers` so intermediary caches don't replay preflights across clients - GET /v1/models advertises only the loaded backend's model id so a client can't discover an id the server can't actually serve ## Shipped model ids - `apple-nl-contextual-en` (English) - `apple-nl-contextual-multi` (Latin-script multilingual — NL's multilingual contextual model is Latin-only; non-Latin scripts are out of scope for this backend) Native dimension and max sequence length come from `NLContextualEmbedding.dimension` / `maximumSequenceLength` at load time so values track OS updates. ## CLI - `afm embed -m <id>` starts the server (default port 9998) - `afm embed --list-models` enumerates shipped ids ## Architecture - `EmbeddingBackend` protocol + `NLContextualEmbeddingBackend` actor owning the `NLContextualEmbedding` handle - `EmbeddingModelRegistry` maps ids to metadata; the CLI --list-models path surfaces the full registry while the HTTP surface exposes only the loaded model - Shutdown uses `DispatchSourceSignal` rather than a raw C `signal()` handler, with the shutdown flag checked on both sides of `app.server.start()` so SIGINT delivered during bind tears the just-bound listener down cleanly - `EmbeddingsUsage` is scoped separately from the chat `Usage` struct so the Apple-only embeddings path never pulls in MLX's `MLXMetalLibrary.ensureAvailable()` side effects (which mutate the process cwd) ## Tests 21 XCTests under `Tests/MacLocalAPITests/`: - 18 controller tests: single/array/tokenized input, dimensions truncation + renormalization, base64 encoding, unknown model, malformed JSON, missing field, empty/whitespace input, unknown encoding_format, truncation header, unsupported-media-type preserves status + body shape, oversized payload, CORS preflight on both routes, reflected allow-headers, list-models shape - 3 registry tests ## Deliberately out of scope - MLX embedding backend (sentence-transformers, BGE, etc.) — follow up - Non-Latin script coverage for the multilingual backend - Matryoshka beyond the truncation+normalize behavior above - `/v1/batch/embeddings` parity Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * embeddings: stable model created, NL truncation tracking, /health version Addresses PR #119 review feedback from sourcery-ai and codex. NL truncation (codex): NLContextualEmbedding silently truncates inputs beyond maximumSequenceLength, but the backend never set EmbedResult.truncatedInputCount, so X-Embedding-Truncated was dead code. embed() now flags an input as truncated when the returned sequenceLength hits the backend cap. Slightly over-reports inputs that land exactly at the cap, but under-reporting (the previous silent behavior) is worse for the long-document workflows this header exists for. Stable model created (sourcery): EmbeddingModelInfo.created was Int(Date().timeIntervalSince1970), so /v1/models returned a different value on every request. Adds createdEpoch to EmbeddingModelEntry and uses the macOS 14 GA date (2023-09-26 = 1_695_686_400) for both apple-nl-contextual-{en,multi} — these are OS-shipped assets, so the OS release is the right stable anchor. listModels now uses the entry's stable value and testListModelsReturnsOnlyLoadedModel pins it. /health version (sourcery): embeddings server's /health endpoint was hard-coding "1.0.0". Now reports BuildInfo.fullVersion like the rest of the app. Empty token arrays (sourcery): createEmbeddings now rejects {"input":[[]]} and {"input":[[1,2],[]]} with 400 up front, before the backend sees them, so token-count accounting never has to reason about empty inner arrays. Vary cleanup (sourcery): applyCORSHeaders drops Origin from the Vary list now that Access-Control-Allow-Origin is the wildcard. A `*` response is already origin-agnostic, so varying on Origin is meaningless; Access-Control-Request-Headers stays because the allow- headers value is still reflected per request. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b7ffe18 commit a589c50

10 files changed

Lines changed: 1605 additions & 1 deletion
Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,224 @@
1+
import Vapor
2+
import Foundation
3+
4+
struct EmbeddingsController: RouteCollection {
5+
private static let maxRequestBodySize: ByteCount = "1mb"
6+
7+
private let modelEntry: EmbeddingModelEntry
8+
private let backend: any EmbeddingBackend
9+
10+
init(modelEntry: EmbeddingModelEntry, backend: any EmbeddingBackend) {
11+
self.modelEntry = modelEntry
12+
self.backend = backend
13+
}
14+
15+
func boot(routes: RoutesBuilder) throws {
16+
let v1 = routes.grouped("v1")
17+
v1.on(.POST, "embeddings", body: .collect(maxSize: Self.maxRequestBodySize), use: createEmbeddings)
18+
v1.on(.OPTIONS, "embeddings", use: handleOptions)
19+
v1.get("models", use: listModels)
20+
v1.on(.OPTIONS, "models", use: handleOptions)
21+
}
22+
23+
private func handleOptions(req: Request) async throws -> Response {
24+
let response = Response(status: .ok)
25+
applyCORSHeaders(to: response, for: req)
26+
return response
27+
}
28+
29+
private func listModels(req: Request) async throws -> Response {
30+
// Advertise only the model this server actually loaded. Advertising the
31+
// full shipped-model list would cause 404s when clients discovered and
32+
// then requested an ID the running backend can't serve.
33+
let model = EmbeddingModelInfo(
34+
id: modelEntry.id,
35+
created: modelEntry.createdEpoch,
36+
ownedBy: "apple"
37+
)
38+
let response = EmbeddingModelsResponse(data: [model])
39+
return try jsonResponse(for: response, request: req)
40+
}
41+
42+
private func createEmbeddings(req: Request) async throws -> Response {
43+
do {
44+
let request = try req.content.decode(EmbeddingsRequest.self)
45+
let requestedModelID = (request.model ?? modelEntry.id)
46+
.trimmingCharacters(in: .whitespacesAndNewlines)
47+
guard requestedModelID == modelEntry.id else {
48+
throw EmbeddingError.modelNotFound(requestedModelID)
49+
}
50+
51+
if request.input.isEmpty {
52+
throw EmbeddingError.invalidInput("Input must not be empty")
53+
}
54+
55+
let nativeDimension = await backend.nativeDimension
56+
57+
if let dimensions = request.dimensions {
58+
guard dimensions > 0, dimensions <= nativeDimension else {
59+
throw EmbeddingError.invalidDimensions(requested: dimensions, native: nativeDimension)
60+
}
61+
}
62+
63+
let embedResult: EmbedResult
64+
if request.input.isTokenized {
65+
let tokenIDArrays = request.input.tokenIDArrays
66+
if tokenIDArrays.contains(where: { $0.isEmpty }) {
67+
throw EmbeddingError.invalidInput("Token-id inputs must not be empty arrays")
68+
}
69+
embedResult = try await backend.embedTokenIDs(tokenIDArrays)
70+
} else {
71+
embedResult = try await backend.embed(request.input.strings)
72+
}
73+
let targetDimensions = request.dimensions ?? nativeDimension
74+
let encodingFormat = request.resolvedEncodingFormat
75+
76+
for (index, vector) in embedResult.vectors.enumerated() where vector.count != nativeDimension {
77+
req.logger.error(
78+
"Embeddings backend dimension drift: expected \(nativeDimension), got \(vector.count) at index \(index)"
79+
)
80+
throw EmbeddingError.internalFailure
81+
}
82+
83+
let data = embedResult.vectors.enumerated().map { index, vector in
84+
let outputVector = targetDimensions < vector.count
85+
? EmbeddingMath.truncateAndNormalize(vector, dimensions: targetDimensions)
86+
: EmbeddingMath.l2Normalize(vector)
87+
let payload: EmbeddingVectorPayload
88+
switch encodingFormat {
89+
case .float:
90+
payload = .float(outputVector)
91+
case .base64:
92+
payload = .base64(EmbeddingEncoding.base64LittleEndian(from: outputVector))
93+
}
94+
return EmbeddingDataItem(index: index, embedding: payload)
95+
}
96+
97+
let response = EmbeddingsResponse(
98+
model: modelEntry.id,
99+
data: data,
100+
promptTokens: embedResult.tokenCounts.reduce(0, +)
101+
)
102+
let httpResponse = try jsonResponse(for: response, request: req)
103+
if embedResult.truncatedInputCount > 0 {
104+
httpResponse.headers.replaceOrAdd(
105+
name: "X-Embedding-Truncated",
106+
value: "\(embedResult.truncatedInputCount)"
107+
)
108+
}
109+
return httpResponse
110+
} catch let embeddingError as EmbeddingError {
111+
return try errorResponse(for: embeddingError, request: req)
112+
} catch let decodingError as DecodingError {
113+
return try errorResponse(for: .invalidInput(Self.describeDecodingError(decodingError)), request: req)
114+
} catch let abortError as AbortError where abortError.status.code >= 400 && abortError.status.code < 500 {
115+
return try abortErrorResponse(for: abortError, request: req)
116+
} catch {
117+
req.logger.error("Embeddings request failed: \(String(reflecting: error))")
118+
return try errorResponse(for: .internalFailure, request: req)
119+
}
120+
}
121+
122+
private func jsonResponse<T: Content>(for payload: T, request: Request) throws -> Response {
123+
let response = Response(status: .ok)
124+
response.headers.add(name: .contentType, value: "application/json")
125+
applyCORSHeaders(to: response, for: request)
126+
try response.content.encode(payload)
127+
return response
128+
}
129+
130+
private func errorResponse(for error: EmbeddingError, request: Request) throws -> Response {
131+
let response = Response(status: Self.httpStatus(for: error))
132+
response.headers.add(name: .contentType, value: "application/json")
133+
applyCORSHeaders(to: response, for: request)
134+
try response.content.encode(OpenAIError(message: error.localizedDescription, type: "embedding_error"))
135+
return response
136+
}
137+
138+
private func abortErrorResponse(for abortError: AbortError, request: Request) throws -> Response {
139+
let response = Response(status: abortError.status)
140+
response.headers.add(name: .contentType, value: "application/json")
141+
applyCORSHeaders(to: response, for: request)
142+
try response.content.encode(OpenAIError(message: abortError.reason, type: "embedding_error"))
143+
return response
144+
}
145+
146+
private static let defaultAllowHeaders = "Content-Type, Authorization, OpenAI-Organization, OpenAI-Project"
147+
148+
private func applyCORSHeaders(to response: Response, for request: Request) {
149+
response.headers.replaceOrAdd(name: .accessControlAllowOrigin, value: "*")
150+
response.headers.replaceOrAdd(name: .accessControlAllowMethods, value: "POST, GET, OPTIONS")
151+
// Reflect the browser's preflight request-headers list (which can include
152+
// SDK-specific headers like x-stainless-*) and fall back to a default set
153+
// that covers the common OpenAI-compatible clients.
154+
let requested = request.headers.first(name: "Access-Control-Request-Headers")
155+
let allowHeaders = requested.flatMap { $0.isEmpty ? nil : $0 } ?? Self.defaultAllowHeaders
156+
response.headers.replaceOrAdd(name: .accessControlAllowHeaders, value: allowHeaders)
157+
response.headers.replaceOrAdd(name: "Access-Control-Expose-Headers", value: "X-Embedding-Truncated")
158+
// Intermediary caches must vary on the requested-headers list so a
159+
// preflight response computed for one client's header set is not served
160+
// to another. Origin is omitted because Access-Control-Allow-Origin is
161+
// the wildcard `*`, which already implies the response is origin-agnostic.
162+
response.headers.replaceOrAdd(name: .vary, value: "Access-Control-Request-Headers")
163+
}
164+
165+
private static func describeDecodingError(_ error: DecodingError) -> String {
166+
switch error {
167+
case .dataCorrupted(let ctx):
168+
return "Malformed request body: \(ctx.debugDescription)"
169+
case .keyNotFound(let key, _):
170+
return "Missing required field: \(key.stringValue)"
171+
case .typeMismatch(_, let ctx), .valueNotFound(_, let ctx):
172+
let path = ctx.codingPath.map(\.stringValue).joined(separator: ".")
173+
return path.isEmpty
174+
? "Invalid field value: \(ctx.debugDescription)"
175+
: "Invalid value for field '\(path)': \(ctx.debugDescription)"
176+
@unknown default:
177+
return "Malformed request body"
178+
}
179+
}
180+
181+
static func httpStatus(for error: EmbeddingError) -> HTTPStatus {
182+
switch error {
183+
case .modelNotFound:
184+
return .notFound
185+
case .invalidInput, .invalidDimensions, .tokenizationFailed:
186+
return .badRequest
187+
case .backendUnavailable, .assetDownloadRequired, .assetDownloadFailed:
188+
return .serviceUnavailable
189+
case .internalFailure:
190+
return .internalServerError
191+
}
192+
}
193+
}
194+
195+
private struct EmbeddingModelsResponse: Content {
196+
let object: String
197+
let data: [EmbeddingModelInfo]
198+
199+
init(data: [EmbeddingModelInfo]) {
200+
self.object = "list"
201+
self.data = data
202+
}
203+
}
204+
205+
private struct EmbeddingModelInfo: Content {
206+
let id: String
207+
let object: String
208+
let created: Int
209+
let ownedBy: String
210+
211+
enum CodingKeys: String, CodingKey {
212+
case id
213+
case object
214+
case created
215+
case ownedBy = "owned_by"
216+
}
217+
218+
init(id: String, created: Int, ownedBy: String) {
219+
self.id = id
220+
self.object = "model"
221+
self.created = created
222+
self.ownedBy = ownedBy
223+
}
224+
}

0 commit comments

Comments
 (0)