You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: --guided-json server flag and per-test spec extraction (#97, #98)
* fix: --guided-json server flag now applied to incoming requests (#97)
The --guided-json CLI flag was parsed and validated but its result was
discarded — server requests never received the schema. The flag only
worked in single-prompt CLI mode (afm mlx -s).
Fix: store the parsed schema in MLXModelService.defaultGuidedJsonSchema
and apply it as a fallback in both chat completions and batch completions
controllers when the request body omits response_format. Per-request
response_format still wins.
Verified: `afm mlx --guided-json '{...}'` now produces valid JSON output
matching the schema for all test cases (color/hex, person, etc.).
Closes#97
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* fix: per-test spec extraction for [@ label] template format
The regex for parsing test section headers was failing on the [@ label]
template format used throughout test-llm-comprehensive.txt. The match
captured "@ label" as group 1 and None as group 2, then took label from
the (empty) group 2. Result: 0 specs extracted, all per-test scoring
runs received empty test context.
The judge then scored entries based only on the JSONL content, with no
knowledge of the test intent — leading to nonsensical reasoning where
codex would invent expectations from neighboring tests.
Fix: separate regexes for [@ label] (template) and [model @ label]
(named variant). Verified: 91/91 labels now extracted correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* refactor: address PR #98 review feedback
- Centralize effectiveResponseFormat logic in MLXModelService.effectiveResponseFormat()
helper, called from both chat and batch controllers
- Make effectiveResponseFormat parameter required (no default) in createStreamingResponse
to prevent future callers from accidentally skipping the guided-json fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
0 commit comments