fix makefile crds target and broken links in docs (#700)

ErikJiang · web-flow · commit fc0567a9a8d9 · 2025-11-18T08:43:48.000-08:00
Signed-off-by: bo.jiang &lt;bo.jiang@daocloud.io&gt;
diff --git a/Makefile b/Makefile
@@ -378,6 +378,7 @@ YQ = $(PROJECT_DIR)/bin/yq
 yq: ## Download yq locally if necessary.
 	GOBIN=$(PROJECT_DIR)/bin GO111MODULE=on $(GO_CMD) install github.com/mikefarah/yq/v4@v4.45.1
 
+.PHONY: crds
 crds: kustomize yq # update helm CRD files
 	$(KUSTOMIZE) build config/default \
 	| $(YQ) 'select(.kind == "CustomResourceDefinition")' \
diff --git a/site/content/en/docs/adoption/_index.md b/site/content/en/docs/adoption/_index.md
@@ -56,7 +56,7 @@ OptimizationJobs.
 it leverages LWS for multi-node inference, see documentation [here](https://docs.sglang.ai/ome/docs/concepts/inference_service/#multi-node-mode)
 
 [**SGLang**](https://github.com/sgl-project/sglang): SGLang, a fast serving framework for large language models and vision language models. It can be deployed with LWS on Kubernetes for
-distributed model serving, see documentation [here](https://docs.sglang.ai/references/deploy_on_k8s.html#deploy-on-kubernetes)
+distributed model serving, see documentation [here](https://docs.sglang.ai/references/multi_node_deployment/deploy_on_k8s.html)
 
 [**vLLM**](https://github.com/vllm-project/vllm): vLLM is a fast and easy-to-use library for LLM inference, it can be deployed with LWS on Kubernetes for distributed model serving, see documentation [here](https://docs.vllm.ai/en/stable/deployment/frameworks/lws.html).
 
diff --git a/site/content/en/docs/examples/sglang.md b/site/content/en/docs/examples/sglang.md
@@ -10,7 +10,7 @@ description: >
 
 In this example, we demonstrate how to deploy a distributed inference service using LeaderWorkerSet (LWS) with [SGLang](https://docs.sglang.ai/) on GPU clusters.
 
-SGLang provides native support for distributed tensor-parallel inference and serving, enabling efficient deployment of large language models (LLMs) such as DeepSeek-R1 671B and Llama-3.1-405B across multiple nodes. This example uses the [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model to demonstrate multi-node serving capabilities. For implementation details on distributed execution, see the SGLang docs [Run Multi-Node Inference](https://docs.sglang.ai/references/multi_node.html).
+SGLang provides native support for distributed tensor-parallel inference and serving, enabling efficient deployment of large language models (LLMs) such as DeepSeek-R1 671B and Llama-3.1-405B across multiple nodes. This example uses the [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) model to demonstrate multi-node serving capabilities. For implementation details on distributed execution, see the SGLang docs [Run Multi-Node Inference](https://docs.sglang.ai/references/multi_node_deployment/multi_node.html).
 
 Since SGLang employs tensor parallelism for multi-node inference, which requires more frequent communications than pipeline parallelism, ensure high-speed bandwidth between nodes to avoid poor performance.