Commit 839cfd9

authored

[Docs][API] Add AWS Neuron/Trainium2 support for disaggregated inference (#1894)

- Add NIXL support for Neuron disaggregated inference - Add annotation fallback for model name in informers.go (supports model paths with '/') - Add AIBRIX_KV_CONNECTOR_TYPE env var for NIXL mode in pd_disaggregation.go - Use disagg_prefill_resp wrapper for NIXL mode (Neuron compatibility) - Fix: use LoadEnv instead of LoadEnvString (LoadEnvString doesn't exist) - Add Neuron/Trainium2 disaggregated inference setup documentation - Address review feedback: reduce EBS volume to 512GB, timeout to 600s, use placeholder for envoy URL - Address PR review comments: refactor NIXL/SHFS handling - Move SHFS code into else block (comment r2709292080) - Fix vLLM+NIXL condition: check connector type only, not engine (comment r2709300356) - Add StormService sample for Neuron/Trainium2 disaggregated inference - Add samples/disaggregation/neuron/pool.yaml for Neuron PD deployment - Uses DRA for Neuron + EFA resource allocation - Includes prefill/decode roles with NIXL connector configuration - Parallel to existing vllm/pool.yaml sample structure - Add backward compatibility check and unit tests for KV connector types - Restore llmEngine == VLLMEngine condition for SHFS kv_transfer_params - Add comprehensive unit tests for SHFS vs NIXL connector types - Add TestPreparePrefillPayloadBackwardCompatibility test - Ensures original behavior is preserved for GPU/SHFS deployments - Add comprehensive unit tests for NIXL mode in updateRoutingContextWithKVTransferParams - Extended TestUpdateRoutingContextWithKVTransferParams with SHFS and NIXL mode tests - Added TestUpdateRoutingContextNIXLMode for detailed NIXL behavior validation - Tests verify disagg_prefill_resp wrapper for Neuron backend - Tests ensure backward compatibility for SHFS mode- - rm PR-neuron-disaggregated-inference.md - style: fix gofmt alignment in test file - Fix TestWithIPPods test by using non-empty model name The test was failing because the empty model name was not being properly handled by the cache. The getModelNameFromPod function rejects empty model names, causing pods not to be added to the cache properly. Changed the model name from '' to 'test-model' to ensure proper cache initialization and metric storage. Signed-off-by: Yahav <yahavb@amazon.com>

1 parent cc41a80 commit 839cfd9Copy full SHA for 839cfd9

6 files changed

+1033

-87

lines changed

docs
- neuron-disaggregated-inference-setup.md
pkg
- cache
  - informers.go
- plugins/gateway/algorithms
samples/disaggregation/neuron
- pool.yaml

6 files changed

+1033

-87

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 839cfd9

6 files changed

6 files changed

File tree

6 files changed

6 files changed

0 commit comments