Skip to content

feat(aws-neuron): add mini-sglang-neuron LLM inference framework#8

Open
dadaism wants to merge 4 commits intomainfrom
feat/aws-neuron-mini-sglang
Open

feat(aws-neuron): add mini-sglang-neuron LLM inference framework#8
dadaism wants to merge 4 commits intomainfrom
feat/aws-neuron-mini-sglang

Conversation

@dadaism
Copy link
Copy Markdown
Contributor

@dadaism dadaism commented Mar 25, 2026

Summary

  • Clones yottalabsai/mini-sglang-neuron at pinned commit e984a5b during image build
  • Source remains at /opt/mini-sglang-neuronserver.sh and run_minisgl.sh accessible at runtime
  • Installs ninja via pip (replaces conda install ninja from init_setup.sh)
  • Pins transformers>=4.56.0,<4.57.3 as required by minisgl
  • Adds build-time assertion: import minisgl
  • Bumps TAG_SUFFIX2026032502 to trigger CI rebuild

Test plan

  • CI — Build aws-neuron passes
  • docker run --rm $IMAGE python3.11 -c "import minisgl; print('ok')" succeeds
  • docker run --rm $IMAGE ls /opt/mini-sglang-neuron/server.sh shows the file

dadaism added 4 commits March 25, 2026 14:58
Clones and installs yottalabsai/mini-sglang-neuron at a pinned commit
(e984a5b) during image build. Source remains at /opt/mini-sglang-neuron
so server.sh and run_minisgl.sh are accessible at runtime.

- Add MINI_SGLANG_NEURON_COMMIT ARG (overridable via bake)
- Install ninja (replaces conda-based init_setup.sh dependency)
- Pin transformers to >=4.56.0,<4.57.3 as required by minisgl
- Add build-time assertion: import minisgl
- Bump TAG_SUFFIX to 2026032502
neuronx_distributed_inference (minisgl dependency) is only available
on pip.repos.neuron.amazonaws.com, not PyPI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant