Skip to content

Multi-GPU scheduling#315

Merged
antoniupop merged 1 commit intomainfrom
antoniu/multi-gpu
Jun 19, 2025
Merged

Multi-GPU scheduling#315
antoniupop merged 1 commit intomainfrom
antoniu/multi-gpu

Conversation

@antoniupop
Copy link
Copy Markdown
Collaborator

No description provided.

@antoniupop antoniupop force-pushed the antoniu/multi-gpu branch 2 times, most recently from db67dbd to d80eaf4 Compare June 17, 2025 08:12
@antoniupop antoniupop marked this pull request as ready for review June 17, 2025 08:13
@antoniupop antoniupop force-pushed the antoniu/multi-gpu branch 10 times, most recently from 82ef26a to bb057ba Compare June 17, 2025 11:50
@antoniupop antoniupop requested review from goshawk-3 and rudy-6-4 June 17, 2025 12:51
@antoniupop antoniupop self-assigned this Jun 17, 2025
@antoniupop antoniupop requested a review from Copilot June 17, 2025 14:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors scheduling to support multi-GPU environments by replacing single GPU key instances with vectors and adding a new locality field, while also updating CI configurations and benchmark features for latency and throughput.

  • Changes data structures and constructors (e.g. in scheduler and keys modules) to use Vectfhe::CudaServerKey instead of a single key.
  • Updates scheduling algorithms, benchmark configuration options, and CI workflows to accommodate new GPU scheduling policies and optimization targets.

Reviewed Changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
coprocessor/fhevm-engine/scheduler/src/dfg/scheduler.rs Updated GPU scheduling logic with a new locality field and changed csks field type to a vector.
coprocessor/fhevm-engine/scheduler/src/dfg.rs Added a locality field to OpNode to support GPU scheduling.
coprocessor/fhevm-engine/fhevm-engine-common/src/keys.rs Updated the GPU key storage to a vector with conditional configuration for latency versus non-latency modes.
coprocessor/fhevm-engine/executor/src/server.rs Modified GPU key handling to index into the vector (using csks[0]) for setting the server key.
Various benchmark and CI files Introduced new features (bench, latency, throughput) and updated workflow inputs for scheduling and optimization target.
Files not reviewed (2)
  • coprocessor/fhevm-engine/coprocessor/.sqlx/query-514d2ee9ff416b42cc48b33df6a2ef5bb25f3ed9abdd2a03957d42252812278a.json: Language not supported
  • coprocessor/fhevm-engine/coprocessor/.sqlx/query-9d7dd33f60a053cf0f7e9a480d32ae6aaf684011580f0534afb25e0dd39d50a9.json: Language not supported
Comments suppressed due to low confidence (2)

coprocessor/fhevm-engine/scheduler/src/dfg/scheduler.rs:58

  • Ensure that all code paths accessing the csks vector perform a runtime check or handle the case when the vector is empty to avoid potential panics when indexing.
    csks: Vec<tfhe::CudaServerKey>,

coprocessor/fhevm-engine/fhevm-engine-common/src/keys.rs:84

  • [nitpick] Consider adding inline comments explaining the conditional compilation for 'latency' versus default GPU key decompression so future maintainers can easily understand the rationale behind the different initialization flows.
            gpu_server_key: (0..get_number_of_gpus() as u32)

@antoniupop antoniupop force-pushed the antoniu/multi-gpu branch 4 times, most recently from 6c96f19 to 9f1b474 Compare June 18, 2025 08:02
@antoniupop antoniupop merged commit 75a6abe into main Jun 19, 2025
70 checks passed
@antoniupop antoniupop deleted the antoniu/multi-gpu branch June 19, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants