Added the README and script files for training sql_agent on NPU (#272)

xiaochulaoban · ultmaster · web-flow · commit 0d721228d55c · 2025-11-15T01:27:07.000+08:00
Co-authored-by: Yuge Zhang &lt;scottyugochang@gmail.com&gt;
diff --git a/docs/how-to/train-sql-agent.md b/docs/how-to/train-sql-agent.md
@@ -321,6 +321,44 @@ For the LLaMA profile, export an `HF_TOKEN` before running so VERL can download
     env RAY_DEBUG=legacy HYDRA_FULL_ERROR=1 VLLM_USE_V1=1 ray start --head --dashboard-host=0.0.0.0
     ```
 
+!!! note "Launching Training with NPUs"
+
+    The example also supports running with **Huawei Ascend NPUs**. This feature is contributed by [Teams from Huawei](https://github.com/microsoft/agent-lightning/pull/272). To use it, resort to the function `config_train_npu` in the script.
+
+    **Hardware Supported:** Atlas 200T A2 Box16, Atlas 900 A2 PODc, Atlas 800T A3. At least **a single 40GB NPU** is required to run the **Qwen2.5-Coder-1.5B-Instruct** model.
+
+    **Environment Setup:** Python 3.11.13, CANN 8.2.RC1, torch 2.7.1+cpu, torch_npu 2.7.1.dev20250724. For basic environment preparation, please refer to this [document](https://gitcode.com/Ascend/pytorch).
+
+    Before installing dependencies, configure the following pip mirrors:
+
+    ```bash
+    pip config set global.index-url http://repo.huaweicloud.com/repository/pypi/simple
+    pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"
+    ```
+
+    Then install vLLM, vLLM-Ascend and VERL:
+
+    ```bash
+    pip install vllm==0.10.0 --trusted-host repo.huaweicloud.com
+    pip install vllm-Ascend==0.10.0rc1 --trusted-host repo.huaweicloud.com
+    pip install verl==0.5.0
+    ```
+
+    To ensure the VERL framework runs correctly on NPU, add the following lines to `verl/utils/vllm_utils.py`:
+
+    ```python
+    from vllm_ascend.patch import platform
+    from vllm_ascend.patch import worker
+    ```
+
+    See the following reference for more details: [https://github.com/vllm-project/vllm-ascend/issues/1776](https://github.com/vllm-project/vllm-ascend/issues/1776).
+
+    After the above dependencies have been installed, from [`examples/spider`]({{ src("examples/spider") }}) run the following script command:
+
+    ```bash
+    python train_sql_agent.py npu
+    ```
+
 ### Debugging the Agent without VERL
 
 [`sql_agent.py`]({{ src("examples/spider/sql_agent.py") }}) also provides a `debug_sql_agent()` helper to run the LangGraph workflow directly against a local or hosted OpenAI-compatible endpoint before using VERL.
diff --git a/examples/spider/README.md b/examples/spider/README.md
@@ -37,6 +37,8 @@ Train a SQL agent using the Qwen2.5-Coder-1.5B-Instruct model with the following
 python train_sql_agent.py qwen
 ```
 
+If you want to use an NPU for training, please refer to the **Launch Training with NPUS** section in [How to Train a SQL Agent](../../docs/how-to/train-sql-agent.md).
+
 ### Debugging
 
 To test and debug the SQL agent interactively:
diff --git a/examples/spider/train_sql_agent.py b/examples/spider/train_sql_agent.py
@@ -137,6 +137,20 @@ def config_train_qwen() -> Dict[str, Any]:
     return config
 
 
+def config_train_npu() -> Dict[str, Any]:
+    """A configuration for training with NPU."""
+
+    config = deepcopy(RL_TRAINING_CONFIG)
+    del config["actor_rollout_ref"]["rollout"]["engine_kwargs"]["vllm"]["enable_auto_tool_choice"]
+    del config["actor_rollout_ref"]["rollout"]["engine_kwargs"]["vllm"]["tool_call_parser"]
+    del config["trainer"]["logger"][1]
+    config["actor_rollout_ref"]["actor"]["use_torch_compile"] = False
+    config["trainer"]["val_before_train"] = False
+    config["trainer"]["save_freq"] = 256
+    config["trainer"]["device"] = "npu"
+    return config
+
+
 def config_train_llama() -> Dict[str, Any]:
     """A configuration for training with LLaMA-3.2-1B-Instruct.
 
@@ -171,8 +185,8 @@ def main() -> None:
 
     parser.add_argument(
         "config",
-        choices=["fast", "qwen", "llama"],
-        help="Training configuration: 'fast' (CI testing), 'qwen' (Qwen-2.5-Coder-1.5B), 'llama' (LLaMA-3.2-3B)",
+        choices=["fast", "qwen", "llama", "npu"],
+        help="Training configuration: 'fast' (CI testing), 'qwen' (Qwen-2.5-Coder-1.5B), 'llama' (LLaMA-3.2-3B),'npu' (Train with NPU)",
     )
 
     parser.add_argument(
@@ -182,8 +196,12 @@ def main() -> None:
     args = parser.parse_args()
 
     # Get the appropriate configuration
-    config_functions = {"fast": config_train_fast, "qwen": config_train_qwen, "llama": config_train_llama}
-
+    config_functions = {
+        "fast": config_train_fast,
+        "qwen": config_train_qwen,
+        "llama": config_train_llama,
+        "npu": config_train_npu,
+    }
     config = config_functions[args.config]()
 
     # Set active agent - use provided value or default based on config choice