-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Description
Current State
Ray workers are treated as homogeneous nodes. The scheduler doesn't distinguish between compute-heavy tasks like FHE Bootstrapping/PBS and memory-heavy tasks like Linear layers/Rescaling. This leads to inefficient resource utilization on mixed hardware clusters (e.g., A100s mixed with T4s or CPUs).
Desired State
The Ray scheduler intelligently routes FHE operations based on hardware capabilities. Complex operations (PBS, ReLUs) are routed to deep nodes (high-compute GPUs), while lighter linear operations are routed to shallow nodes (CPUs or lighter GPUs), mimicking the FLASH-FHE hardware architecture in software.
Actions
- Define custom Ray resources in cluster config (e.g., accelerator_type:deep for A100s, accelerator_type:shallow for T4/CPU).
- Update model parser to tag operations with resource requirements (e.g., ReLU requires deep, MatMul requires shallow).
- Implement a ray.workflow pipeline to handle asynchronous data handoff between different node types.
Success Criteria
- Cluster configuration accepts custom resource tags for different GPU types.
- DAG execution logs show PBS tasks exclusively running on deep nodes.
- End-to-end inference completes successfully with tasks distributed across heterogeneous workers.
Reactions are currently unavailable