Skip to content

Add serving-llms-on-instinct skill#58

Merged
danielholanda merged 5 commits into
mainfrom
instinct-inference-skill
Jun 15, 2026
Merged

Add serving-llms-on-instinct skill#58
danielholanda merged 5 commits into
mainfrom
instinct-inference-skill

Conversation

@Mahdi-CV

Copy link
Copy Markdown
Collaborator

Adds the serving-llms-on-instinct skill: end-to-end LLM inference serving on AMD Instinct GPUs (MI300X/MI325X/MI350X/MI355X) with vLLM on ROCm. The skill handles GPU detection, environment validation, vLLM configuration, launch, and health verification, and refuses non-servable models (diffusion, audio, embeddings, rerankers) with an explanation.

What's included

  • SKILL.md and reference.md: skill definition and runtime guidance
  • scripts/detect.py: GPU detection via amd-smi (local or remote host)
  • scripts/validate.py: environment validation with auto-fix
  • scripts/sync_recipes.py: refresh recipes from vLLM recipes + Docker Hub
  • scripts/estimate_vram.py: weight + KV-cache VRAM estimation (handles quantized models)
  • data/recipes_cache.json: model configs synced from vllm-project/recipes
  • data/gpu_overrides.json: GPU-specific docker flags and legacy model configs
  • data/blacklist.json: models that cannot be served as LLM endpoints

Registration

  • Added the plugin entry to .claude-plugin/marketplace.json and .cursor-plugin/marketplace.json
  • Updated the README skills table: serving-llms-on-instinct moved from planned to in-repo

@Mahdi-CV Mahdi-CV requested a review from danielholanda June 12, 2026 23:28

@danielholanda danielholanda left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks great. Next step here is to add a quick walkthrough so other folks have a bit more guidance when trying this:
#58

@danielholanda danielholanda merged commit f553383 into main Jun 15, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants