Skip to content

【Hackathon 10th Spring No.46】Add Windows platform guards for Python runtime (Part 3/3)#7503

Open
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-046-win-python-runtime-1
Open

【Hackathon 10th Spring No.46】Add Windows platform guards for Python runtime (Part 3/3)#7503
bobby-cloudforge wants to merge 1 commit intoPaddlePaddle:developfrom
CloudForge-Solutions:task/h10-046-win-python-runtime-1

Conversation

@bobby-cloudforge
Copy link
Copy Markdown

Motivation

Windows lacks POSIX primitives (os.setsid, os.killpg, os.fork, /dev/shm). This PR adds platform-conditional guards so FastDeploy's Python runtime modules work on both Linux and Windows without breaking existing Linux behaviour.

Modifications

  • Shared-memory paths: Replace hardcoded /dev/shm with tempfile.gettempdir() on Windows via _shm_base helper variable in cache_messager.py, common_engine.py, engine.py, async_expert_loader.py, fmq.py, zmq_client.py, zmq_server.py, worker_process.py
  • Process group cleanup: Wrap os.killpg(os.getpgid(...)) with sys.platform check, falling back to p.terminate() on Windows in common_engine.py, engine.py, expert_service.py
  • Process creation: Skip preexec_fn=os.setsid on Windows via conditional kwargs in common_engine.py, engine.py, prefix_cache_manager.py
  • Multiprocessing context: Use "spawn" instead of "fork" on Windows in engine.py

Usage or Command

No API changes. Guards activate automatically when sys.platform == "win32".

Accuracy Tests

No behavioural change on Linux — all guards are behind sys.platform == "win32" checks.

Checklist

  • 10 files, +74/−28 — minimal, focused changes
  • No new dependencies
  • All pre-commit hooks pass
  • No behavioural changes on Linux

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 20, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 20, 2026
@bobby-cloudforge bobby-cloudforge changed the title 【Hackathon 10th Spring No.46】Add Windows platform guards for Python runtime (Part 3/3) 【Hackathon 10th Spring No.46】[Build] Windows Python runtime guards -part Apr 20, 2026
@bobby-cloudforge bobby-cloudforge changed the title 【Hackathon 10th Spring No.46】[Build] Windows Python runtime guards -part [CI]【Hackathon 10th Spring No.46】Add Windows platform guards for Python runtime (Part 3/3) Apr 20, 2026
@bobby-cloudforge bobby-cloudforge changed the title [CI]【Hackathon 10th Spring No.46】Add Windows platform guards for Python runtime (Part 3/3) 【Hackathon 10th Spring No.46】Add Windows platform guards for Python runtime (Part 3/3) Apr 20, 2026
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-20 14:06 CST\n\n## 📋 Review 摘要\n\nPR 概述:为 FastDeploy Python 运行时模块添加 Windows 平台兼容守卫,替换 POSIX 专属原语(/dev/shmos.setsidos.killpgfork)。\n变更范围engine/cache_manager/inter_communicator/worker/eplb/\n影响面 TagEngine KVCache\n\n### 📝 PR 规范检查\n\nPR 标题中的 [Build] 不在官方 Tag 列表中。此 PR 主要是为 Windows 运行时做平台兼容适配,建议使用 [Feature][Others]。\n\n标题建议(可直接复制):\n- [Feature] Add Windows Python runtime guards for platform compatibility\n\n### 问题\n\n| 级别 | 文件 | 概述 |\n|------|------|------|\n| 🟡 建议 | 多文件 | /dev/shm 替换覆盖不完整,遗漏约 8 个文件 |\n| 🟡 建议 | common_engine.py | _shm_base 三元表达式重复 10+ 次,建议提取公共函数 |\n| 🟡 建议 | common_engine.py | Windows 下 p.terminate() + shell=True 无法杀死子进程树 |\n| ❓ 疑问 | engine.py | spawn 上下文要求所有参数可序列化,需确认兼容性 |\n\n### 总体评价\n\n变更方向正确,守卫逻辑清晰且不影响 Linux 行为。主要问题是覆盖范围不完整(仍有 ~8 个文件遗留硬编码 /dev/shm),以及重复代码较多建议提取公共工具函数。建议补全遗漏文件后合入。"

address = (pod_ip, engine_worker_queue_port)
else:
address = f"/dev/shm/fd_task_queue_{engine_worker_queue_port}.sock"
_shm_base = "/dev/shm" if sys.platform != "win32" else tempfile.gettempdir()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 /dev/shm 替换覆盖不完整

通过全仓搜索发现以下文件仍有硬编码 /dev/shm,未被本 PR 覆盖:

  • fastdeploy/entrypoints/openai/utils.py(2 处:L117, L143)
  • fastdeploy/cache_manager/multimodal_cache_manager.py(1 处:L130)
  • fastdeploy/input/multimodal_processor.py
  • fastdeploy/input/ernie4_5_vl_processor/process.py
  • fastdeploy/input/qwen_vl_processor/process.py
  • fastdeploy/input/qwen3_vl_processor/process.py(2 处)
  • fastdeploy/input/paddleocr_vl_processor/process.py

建议在本 PR 或后续 PR 中补齐,否则 Windows 上运行到这些路径时仍会失败。

else:
address = f"/dev/shm/fd_task_queue_{self.cfg.parallel_config.local_engine_worker_queue_port}.sock"
# Shared-memory base: /dev/shm on Linux, tempdir on Windows
_shm_base = "/dev/shm" if sys.platform != "win32" else tempfile.gettempdir()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 _shm_base 三元表达式在本 PR 中重复了 10+ 次,建议提取为公共工具函数

可以在 fastdeploy/utils/fastdeploy/envs.py 中定义一个常量或函数,例如:

# fastdeploy/utils/platform_compat.py
import sys, tempfile

SHM_BASE = "/dev/shm" if sys.platform != "win32" else tempfile.gettempdir()

各处直接 from fastdeploy.utils.platform_compat import SHM_BASE 即可,维护成本更低,也更容易未来扩展(如支持自定义路径)。

pgid = os.getpgid(self.worker_proc.pid)
os.killpg(pgid, signal.SIGTERM)
else:
self.worker_proc.terminate()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 Windows 下 p.terminate() 配合 shell=True 可能无法杀死实际子进程

当使用 shell=True 启动子进程时,subprocess.Popen 实际创建的是 cmd.exe(Windows)或 /bin/sh(Linux)作为中间进程。在 Windows 上,p.terminate() 只会终止 cmd.exe,而不会终止其子进程树,可能导致孤儿进程。

建议考虑使用 subprocess.Popencreationflags=subprocess.CREATE_NEW_PROCESS_GROUP 配合 os.kill(p.pid, signal.CTRL_BREAK_EVENT),或使用 taskkill /F /T /PID 来终止整个进程树:

if sys.platform == "win32":
    subprocess.call(["taskkill", "/F", "/T", "/PID", str(p.pid)])
else:
    pgid = os.getpgid(p.pid)
    os.killpg(pgid, signal.SIGTERM)

)
ctx = multiprocessing.get_context("fork")
# Windows: "spawn" required since fork is unavailable
ctx = multiprocessing.get_context("spawn" if sys.platform == "win32" else "fork")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 spawn 上下文要求 Process(target=..., args=(...)) 中的所有参数必须可 pickle 序列化

fork 不同,spawn 不会继承父进程内存,而是在新进程中重新 import 模块并反序列化参数。请确认 start_data_parallel_service 函数及其传入的 cfg 对象(虽然已 deepcopy)在 pickle 时不会失败(例如包含不可序列化的锁、文件句柄、CUDA context 等)。

如果未来 Windows 真正运行到这条路径,这可能导致 spawn 启动时的 PicklingError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants