About GPU time slicing

GPU time slicing enables multiple workloads to share a single physical GPU by dividing processing time in short, alternating time slots. This method improves resource utilization, reduces idle GPU time, and allows multiple users to run AI/ML workloads concurrently in {productname-short}. The NVIDIA GPU Operator manages this scheduling based on a time-slicing-config ConfigMap that defines the number of GPU slices for each physical GPU.

Time-slicing differs from Multi-Instance GPU (MIG) partitioning. While MIG provides memory and fault isolation, time-slicing shares the same GPU memory across workloads without strict isolation. Time-slicing is ideal for lightweight inference tasks, data preprocessing, and other scenarios where full GPU isolation is unnecessary.

Consider the following points when using GPU time slicing:

Memory sharing: All workloads share GPU memory. High memory usage by one workload can impact others.
Performance trade-offs: While time slicing allows multiple workloads to share a GPU, it does not provide strict resource isolation like MIG.
GPU compatibility: Time slicing is supported on specific NVIDIA GPUs.

Additional resources

NVIDIA GPU Sharing Documentation
OpenShift AI CLI - Configuring Time Slicing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about-gpu-time-slicing.adoc

about-gpu-time-slicing.adoc

About GPU time slicing

Files

about-gpu-time-slicing.adoc

Latest commit

History

about-gpu-time-slicing.adoc

File metadata and controls

About GPU time slicing