Skip to content

Commit fc0567a

Browse files
authored
fix makefile crds target and broken links in docs (#700)
Signed-off-by: bo.jiang <[email protected]>
1 parent aa903ee commit fc0567a

File tree

3 files changed

+3
-2
lines changed

3 files changed

+3
-2
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,7 @@ YQ = $(PROJECT_DIR)/bin/yq
378378
yq: ## Download yq locally if necessary.
379379
GOBIN=$(PROJECT_DIR)/bin GO111MODULE=on $(GO_CMD) install github.com/mikefarah/yq/[email protected]
380380

381+
.PHONY: crds
381382
crds: kustomize yq # update helm CRD files
382383
$(KUSTOMIZE) build config/default \
383384
| $(YQ) 'select(.kind == "CustomResourceDefinition")' \

site/content/en/docs/adoption/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ OptimizationJobs.
5656
it leverages LWS for multi-node inference, see documentation [here](https://docs.sglang.ai/ome/docs/concepts/inference_service/#multi-node-mode)
5757

5858
[**SGLang**](https://github.com/sgl-project/sglang): SGLang, a fast serving framework for large language models and vision language models. It can be deployed with LWS on Kubernetes for
59-
distributed model serving, see documentation [here](https://docs.sglang.ai/references/deploy_on_k8s.html#deploy-on-kubernetes)
59+
distributed model serving, see documentation [here](https://docs.sglang.ai/references/multi_node_deployment/deploy_on_k8s.html)
6060

6161
[**vLLM**](https://github.com/vllm-project/vllm): vLLM is a fast and easy-to-use library for LLM inference, it can be deployed with LWS on Kubernetes for distributed model serving, see documentation [here](https://docs.vllm.ai/en/stable/deployment/frameworks/lws.html).
6262

site/content/en/docs/examples/sglang.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ description: >
1010

1111
In this example, we demonstrate how to deploy a distributed inference service using LeaderWorkerSet (LWS) with [SGLang](https://docs.sglang.ai/) on GPU clusters.
1212

13-
SGLang provides native support for distributed tensor-parallel inference and serving, enabling efficient deployment of large language models (LLMs) such as DeepSeek-R1 671B and Llama-3.1-405B across multiple nodes. This example uses the [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model to demonstrate multi-node serving capabilities. For implementation details on distributed execution, see the SGLang docs [Run Multi-Node Inference](https://docs.sglang.ai/references/multi_node.html).
13+
SGLang provides native support for distributed tensor-parallel inference and serving, enabling efficient deployment of large language models (LLMs) such as DeepSeek-R1 671B and Llama-3.1-405B across multiple nodes. This example uses the [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model to demonstrate multi-node serving capabilities. For implementation details on distributed execution, see the SGLang docs [Run Multi-Node Inference](https://docs.sglang.ai/references/multi_node_deployment/multi_node.html).
1414

1515
Since SGLang employs tensor parallelism for multi-node inference, which requires more frequent communications than pipeline parallelism, ensure high-speed bandwidth between nodes to avoid poor performance.
1616

0 commit comments

Comments
 (0)