Skip to content

refactor(backends): self-describing WrappedServer backends (#2287)#2320

Draft
jeremyfowers wants to merge 5 commits into
mainfrom
feat/self-describing-backends
Draft

refactor(backends): self-describing WrappedServer backends (#2287)#2320
jeremyfowers wants to merge 5 commits into
mainfrom
feat/self-describing-backends

Conversation

@jeremyfowers

Copy link
Copy Markdown
Member

Implements the plan in #2287: each inference backend describes itself with a plain-data descriptor plus a server class, and every scattered if (recipe == "...") site is rewritten to read a registry built from those descriptors.

What changed

Adding a backend is now one LEMON_BACKENDS line + a <stem>_descriptor.cpp (data) + a <stem>_factory.cpp (create()). No router, CLI, docs, or support-matrix edits — those are all derived.

  • Descriptor typesBackendDescriptor / BackendOption / SlotPolicy (backend_descriptor.h); RecipeBackendDef moved to a shared header.
  • Two-tier registry, generated from LEMON_BACKENDS at CMake configure time — a CLI-safe data registry (descriptors only, links into both lemonade and lemond) and a server-only factory registry (binds each descriptor to its class's create()). This split is what lets the CLI read recipe options/flags from descriptors without linking server classes.
  • All 9 backends carry a descriptor (display name, binary, device, slot policy, options, support matrix, labels) + a create().
  • Descriptor-driven sites (appendix rows 1–13): router creation, NPU/slot eviction & cloud LRU exemption (SlotPolicy), device type, recipe options / CLI flags / defaults, config-section identity, support matrix (RECIPE_DEFS deleted), recipe→label inference, cloud availability.
  • /system-info recipes entries enriched with display_name / selectable_backend / uses_ctx_size / options / support. The desktop app now reads recipe display names from /system-info instead of hardcoded TypeScript.
  • Docs generationdocs/tools/gen_backend_docs.py boots lemond, reads /system-info + server_models.json, and rewrites marker-delimited regions of docs/dev/backends-reference.md. A new CI job (backend-docs-drift) fails on drift. Authoring guide: docs/dev/adding-a-backend.md.

Corner cases / cleanups

  • Unified ryzenai's config section to ryzenai everywhere (was inconsistent between s_backend_names and recipe_to_config_section).
  • FLM↔exclusive-NPU eviction now keys off "NPU holder that isn't FLM" (correctly includes whisper-on-npu), fixing a latent recipe-string gap; collected-then-evicted to avoid iterator invalidation.
  • Generic ModelInfo::extras map (from unknown server_models.json keys) so new backends add per-model fields without editing shared structs.

Verification

Local (this machine): lemond + lemonade CLI + web-app build green; tsc clean. Passing suites: server_endpoints (69), server_pinning (6), app-regression (37), test_model_name_normalization, test_cuda_arch_mapping. CLI run --help shows all descriptor-derived flags; /system-info carries the enriched fields; docs --check is clean.

Pre-existing failures unrelated to this change (reproduced identically on main): test_flm_status (stale message expectations, 16), test_llamacpp_system_backend (HIP plugin required on AMD-GPU hosts), test_multi_checkpoint_completeness (model pull/network), server_eviction (references phi-3-mini-4k-instruct-q4, absent from the registry), server_cli2 test_020 (built-in model name "Lite Collection" with a space breaks the test's whitespace parser). Relying on CI for clean-environment + cross-platform validation.

Notes for reviewers

  • recipeOptionsConfig.ts (the deeply TypeScript-typed per-recipe option forms) is intentionally left to maintainers per AGENTS.md — the schema is now exposed via /system-info for a future dynamic migration.
  • Backend install still goes through each backend's BackendSpec (install params are class-side behavior); the descriptor supplies the binary name.

🤖 Generated with Claude Code

Make each inference backend describe itself with a plain-data descriptor plus
a server class, and rewrite the scattered `if (recipe == "...")` sites to read
a registry built from those descriptors. Adding a backend becomes one
LEMON_BACKENDS line plus a descriptor + factory file — no router, CLI, docs, or
support-matrix edits.

- Descriptor types (BackendDescriptor/BackendOption/SlotPolicy) + a CLI-safe
  data registry and a server-only factory registry, generated from the
  LEMON_BACKENDS list at CMake configure time.
- All 9 backends carry a descriptor (device, slot policy, options, support
  matrix, labels, binary) and a create().
- Descriptor-driven: router creation, NPU/slot eviction, device type, recipe
  options/CLI flags, config-section identity, support matrix, recipe labels,
  cloud availability.
- /system-info recipes enriched with display_name/selectable_backend/options/
  support; the app reads recipe display names from it instead of hardcoded TS.
- docs/tools/gen_backend_docs.py generates docs/dev/backends-reference.md from
  /system-info; a CI step fails on drift. Authoring guide in
  docs/dev/adding-a-backend.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the enhancement New feature or request label Jun 19, 2026
@jeremyfowers

Copy link
Copy Markdown
Member Author

CI status

All cross-platform builds pass (MSVC, AppleClang, GCC, Arch, openSUSE, Fedora rpm), validating the descriptor aggregate-init, CMake LEMON_BACKENDS codegen, and the CLI-safe/server-only registry split compile everywhere. Functional jobs exercising this change pass: CLI/Endpoints (ubuntu + macOS), Test .exe (whisper, moonshine, stable-diffusion, text-to-speech), backend-docs-drift, plus locally endpoints (69), pinning (6), app-regression (37).

The single red — Test CLI/Endpoints (windows-latest) → test_026_anthropic_messages_tool_calling — is a pre-existing flaky timeout, not from this PR. It's a 500 s ReadTimeout on a tool-calling inference request that the Windows runner intermittently can't finish in time:

  • main run 27765794877: same job fails on the same test with the identical read timeout=500 signature.
  • main run 27795912134: same job passes.

This PR touches backend construction, not inference, anthropic_api.cpp, or the tool-calling loop, so it can't change that test's latency. Re-running the job.

jeremyfowers and others added 4 commits June 19, 2026 16:25
Restructure the self-describing backends to the layout the issue #2287 plan
specified — one folder per backend — instead of the flat file layout I used
before. This also folds the earlier _descriptor/_factory split into the spec's
cleaner shape: the descriptor is a header-only `inline const` and create() lives
with the server class.

Each backend now lives in its own folder, in namespace lemon::backends::<stem>:
  include/lemon/backends/<stem>/<stem>.h         inline const descriptor (CLI-safe data)
  include/lemon/backends/<stem>/<stem>_server.h  WrappedServer subclass + create() decl
  server/backends/<stem>/<stem>_server.cpp       implementation + create() def

Shared registry/util files stay at the top of backends/. The CMake foreach over
LEMON_BACKENDS compiles each <stem>/<stem>_server.cpp and generates the registry
headers from the folder paths. Removes the per-backend *_descriptor.{h,cpp} and
*_factory.{h,cpp} files. Behavior is unchanged (same descriptors, same create()).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the existing curated docs generate from the backend descriptors instead of
just shipping a separate reference file — closing appendix rows 14 and 22.

- Expand the descriptor with the editorial fields the curated docs need:
  `modality`, `experimental`, `web_display_name`, and a per-support-row
  `device_summary` (RecipeBackendDef). These keep the descriptor the single
  source of truth.
- /system-info exposes them plus a registry `order` index and `slot_policy`.
- gen_backend_docs.py now targets multiple docs and renders:
    * README.md "Supported Configurations" HTML matrix (grouped by modality,
      merged rows, rowspans, experimental tag) — wrapped in GENERATED markers;
    * docs/guide/configuration/multi-model.md NPU-exclusivity list.
  The backend-docs-drift CI job's --check now covers all three docs.

The generated README matrix is also more complete than the hand-written one
(it now includes whispercpp rocm/metal, kokoro metal, sd-cpp metal). Footnotes
and prose outside the markers are preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wrap cli.md's "Recipe-Specific Options" tables in GENERATED markers and render
them from the descriptor options. This also fixes pre-existing drift: the section
documented `--steps`/`--cfg-scale`/`--width`/`--height` flags that the CLI no
longer registers, and omitted the moonshine and vllm recipes. Now covered by the
backend-docs-drift CI check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add inline-marker support to the generator and wrap the `--recipe` "Common
values" list in custom-models.md so it renders from the descriptor recipe set
(plus collection.omni). Now covered by the backend-docs-drift CI check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant