Commit a589c50
feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage (#119)
* feat(embeddings): add /v1/embeddings backed by Apple NaturalLanguage
Adds an `afm embed` subcommand that serves an OpenAI-compatible
embeddings endpoint on top of Apple's NaturalLanguage
`NLContextualEmbedding`, letting local-first clients use the same
OpenAI interface for RAG / semantic search / clustering workflows that
they already use for chat completions. Entirely on-device, no model
downloads beyond what macOS already ships, no network dependency.
Refs #118.
## HTTP surface
- POST /v1/embeddings
- Accepts a string, array of strings, or pre-tokenized IDs
- `encoding_format`: `float` (default) or `base64`
- `dimensions`: optional Matryoshka-style truncation + L2 renormalize
(NL backend rejects request when it exceeds native dimension)
- `X-Embedding-Truncated` response header counts inputs that exceeded
the backend's max sequence length
- Malformed JSON, missing fields, and unknown enum values return 400
with a descriptive `EmbeddingError.invalidInput` reason
- Other 4xx AbortErrors (e.g. 415 Unsupported Media Type) pass through
with their original status
- Oversized (>1 MiB) bodies return 413 with an embeddings-specific
error message, not the chat server's "conversation too long" text
- OPTIONS /v1/embeddings and OPTIONS /v1/models register CORS preflight;
Access-Control-Allow-Headers is reflected from
Access-Control-Request-Headers (falling back to
"Content-Type, Authorization, OpenAI-Organization, OpenAI-Project"),
with `Vary: Origin, Access-Control-Request-Headers` so intermediary
caches don't replay preflights across clients
- GET /v1/models advertises only the loaded backend's model id so a
client can't discover an id the server can't actually serve
## Shipped model ids
- `apple-nl-contextual-en` (English)
- `apple-nl-contextual-multi` (Latin-script multilingual — NL's
multilingual contextual model is Latin-only; non-Latin scripts are
out of scope for this backend)
Native dimension and max sequence length come from
`NLContextualEmbedding.dimension` / `maximumSequenceLength` at load time
so values track OS updates.
## CLI
- `afm embed -m <id>` starts the server (default port 9998)
- `afm embed --list-models` enumerates shipped ids
## Architecture
- `EmbeddingBackend` protocol + `NLContextualEmbeddingBackend` actor
owning the `NLContextualEmbedding` handle
- `EmbeddingModelRegistry` maps ids to metadata; the CLI --list-models
path surfaces the full registry while the HTTP surface exposes only
the loaded model
- Shutdown uses `DispatchSourceSignal` rather than a raw C `signal()`
handler, with the shutdown flag checked on both sides of
`app.server.start()` so SIGINT delivered during bind tears the
just-bound listener down cleanly
- `EmbeddingsUsage` is scoped separately from the chat `Usage` struct so
the Apple-only embeddings path never pulls in MLX's
`MLXMetalLibrary.ensureAvailable()` side effects (which mutate the
process cwd)
## Tests
21 XCTests under `Tests/MacLocalAPITests/`:
- 18 controller tests: single/array/tokenized input, dimensions
truncation + renormalization, base64 encoding, unknown model,
malformed JSON, missing field, empty/whitespace input, unknown
encoding_format, truncation header, unsupported-media-type preserves
status + body shape, oversized payload, CORS preflight on both routes,
reflected allow-headers, list-models shape
- 3 registry tests
## Deliberately out of scope
- MLX embedding backend (sentence-transformers, BGE, etc.) — follow up
- Non-Latin script coverage for the multilingual backend
- Matryoshka beyond the truncation+normalize behavior above
- `/v1/batch/embeddings` parity
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* embeddings: stable model created, NL truncation tracking, /health version
Addresses PR #119 review feedback from sourcery-ai and codex.
NL truncation (codex): NLContextualEmbedding silently truncates inputs
beyond maximumSequenceLength, but the backend never set
EmbedResult.truncatedInputCount, so X-Embedding-Truncated was dead
code. embed() now flags an input as truncated when the returned
sequenceLength hits the backend cap. Slightly over-reports inputs that
land exactly at the cap, but under-reporting (the previous silent
behavior) is worse for the long-document workflows this header exists
for.
Stable model created (sourcery): EmbeddingModelInfo.created was
Int(Date().timeIntervalSince1970), so /v1/models returned a different
value on every request. Adds createdEpoch to EmbeddingModelEntry and
uses the macOS 14 GA date (2023-09-26 = 1_695_686_400) for both
apple-nl-contextual-{en,multi} — these are OS-shipped assets, so the
OS release is the right stable anchor. listModels now uses the entry's
stable value and testListModelsReturnsOnlyLoadedModel pins it.
/health version (sourcery): embeddings server's /health endpoint was
hard-coding "1.0.0". Now reports BuildInfo.fullVersion like the rest of
the app.
Empty token arrays (sourcery): createEmbeddings now rejects
{"input":[[]]} and {"input":[[1,2],[]]} with 400 up front, before the
backend sees them, so token-count accounting never has to reason about
empty inner arrays.
Vary cleanup (sourcery): applyCORSHeaders drops Origin from the Vary
list now that Access-Control-Allow-Origin is the wildcard. A `*`
response is already origin-agnostic, so varying on Origin is
meaningless; Access-Control-Request-Headers stays because the allow-
headers value is still reflected per request.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent b7ffe18 commit a589c50
10 files changed
Lines changed: 1605 additions & 1 deletion
File tree
- Sources/MacLocalAPI
- Controllers
- Models
- Tests/MacLocalAPITests
Lines changed: 224 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
0 commit comments