Multi-GPU scheduling by antoniupop · Pull Request #315 · zama-ai/fhevm

antoniupop · 2025-06-17T08:00:00Z

No description provided.

Copilot

Pull Request Overview

This PR refactors scheduling to support multi-GPU environments by replacing single GPU key instances with vectors and adding a new locality field, while also updating CI configurations and benchmark features for latency and throughput.

Changes data structures and constructors (e.g. in scheduler and keys modules) to use Vectfhe::CudaServerKey instead of a single key.
Updates scheduling algorithms, benchmark configuration options, and CI workflows to accommodate new GPU scheduling policies and optimization targets.

Reviewed Changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
coprocessor/fhevm-engine/scheduler/src/dfg/scheduler.rs	Updated GPU scheduling logic with a new locality field and changed csks field type to a vector.
coprocessor/fhevm-engine/scheduler/src/dfg.rs	Added a locality field to OpNode to support GPU scheduling.
coprocessor/fhevm-engine/fhevm-engine-common/src/keys.rs	Updated the GPU key storage to a vector with conditional configuration for latency versus non-latency modes.
coprocessor/fhevm-engine/executor/src/server.rs	Modified GPU key handling to index into the vector (using csks[0]) for setting the server key.
Various benchmark and CI files	Introduced new features (bench, latency, throughput) and updated workflow inputs for scheduling and optimization target.

Files not reviewed (2)

coprocessor/fhevm-engine/coprocessor/.sqlx/query-514d2ee9ff416b42cc48b33df6a2ef5bb25f3ed9abdd2a03957d42252812278a.json: Language not supported
coprocessor/fhevm-engine/coprocessor/.sqlx/query-9d7dd33f60a053cf0f7e9a480d32ae6aaf684011580f0534afb25e0dd39d50a9.json: Language not supported

Comments suppressed due to low confidence (2)

coprocessor/fhevm-engine/scheduler/src/dfg/scheduler.rs:58

Ensure that all code paths accessing the csks vector perform a runtime check or handle the case when the vector is empty to avoid potential panics when indexing.

    csks: Vec<tfhe::CudaServerKey>,

coprocessor/fhevm-engine/fhevm-engine-common/src/keys.rs:84

[nitpick] Consider adding inline comments explaining the conditional compilation for 'latency' versus default GPU key decompression so future maintainers can easily understand the rationale behind the different initialization flows.

            gpu_server_key: (0..get_number_of_gpus() as u32)

antoniupop force-pushed the antoniu/multi-gpu branch 2 times, most recently from db67dbd to d80eaf4 Compare June 17, 2025 08:12

antoniupop marked this pull request as ready for review June 17, 2025 08:13

antoniupop force-pushed the antoniu/multi-gpu branch 10 times, most recently from 82ef26a to bb057ba Compare June 17, 2025 11:50

antoniupop requested review from goshawk-3 and rudy-6-4 June 17, 2025 12:51

antoniupop self-assigned this Jun 17, 2025

rudy-6-4 reviewed Jun 17, 2025

View reviewed changes

Comment thread coprocessor/fhevm-engine/coprocessor/src/tfhe_worker.rs

antoniupop requested a review from Copilot June 17, 2025 14:23

Copilot AI reviewed Jun 17, 2025

View reviewed changes

Comment thread coprocessor/fhevm-engine/executor/src/server.rs

rudy-6-4 approved these changes Jun 17, 2025

View reviewed changes

antoniupop force-pushed the antoniu/multi-gpu branch 4 times, most recently from 6c96f19 to 9f1b474 Compare June 18, 2025 08:02

feat(coprocessor): add new scheduling infrastructure for multi-GPU

16bb9b7

antoniupop force-pushed the antoniu/multi-gpu branch from 9f1b474 to 16bb9b7 Compare June 19, 2025 08:44

antoniupop merged commit 75a6abe into main Jun 19, 2025
70 checks passed

antoniupop deleted the antoniu/multi-gpu branch June 19, 2025 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU scheduling#315

Multi-GPU scheduling#315
antoniupop merged 1 commit intomainfrom
antoniu/multi-gpu

antoniupop commented Jun 17, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

antoniupop commented Jun 17, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants