You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: auto-install Python + Metal shader warmup (#43)
* fix: auto-install Python + Metal shader warmup on startup
P0 — install.sh: if no Python 3.10+ and no Homebrew, automatically
downloads standalone Python from python-build-standalone (no sudo
needed). Eliminates the #1 install blocker for users without Homebrew.
P0 — first request hang: adds a warmup step after model load that
runs one forward pass to trigger Metal shader compilation. Prints
"Warming up (compiling Metal shaders)..." so users know what's
happening. Prevents the first real request from hanging 5+ minutes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: strip think tags from Anthropic endpoint + disk space check
P2: Think tags leaked through Anthropic /v1/messages endpoint because it
bypassed the reasoning parser entirely. Both streaming and non-streaming
paths now use the reasoning parser to separate reasoning from content,
emitting only content to Anthropic clients.
P1: Add disk space check before model download — queries HuggingFace for
model repo size and warns if available disk is insufficient. Skips
silently for local/cached models.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: standalone Python URL + move warmup to lifespan hook
P0: The hardcoded python-build-standalone URL pointed at the old
indygreg repo which now 404s. Updated to astral-sh/python-build-standalone
with cpython 3.12.13 (release 20260320), verified accessible.
P2: Metal shader warmup ran in CLI before batched/hybrid engines were
started (they start in the FastAPI lifespan hook). Moved warmup into
the lifespan hook so it runs after engine.start() for all engine types.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add generate_warmup() to BatchedEngine and HybridEngine
Both engines inherited the no-op base generate_warmup(), so Metal shader
warmup in the lifespan hook was silently skipped for --continuous-batching
and hybrid modes. Now both engines override it with a real forward pass,
matching SimpleEngine's implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments