Skip to content

daxmawal/StarPU-Inference-Server

Repository files navigation

StarPU Inference Server

CI codecov

⚠️ Project Status: In Development

This project is currently under active development. There are no releases yet, and the interface or features will change frequently.

Inference Scheduling with StarPU and LibTorch

This project combines StarPU and LibTorch to efficiently schedule deep learning inference tasks across CPUs and GPUs of a compute node. The main goal is to maximize throughput while maintaining latency control, by leveraging asynchronous and heterogeneous execution.

Goal

  • Perform inference of TorchScript models (e.g., ResNet, BERT) using LibTorch.
  • Dynamically schedule inference tasks between CPU and GPU using StarPU.
  • Optimize throughput while satisfying latency constraints.

Installation

See installation for setup instructions, including dependency lists, and native build steps. See docker guide for Docker image build commands and execution.

Quickstart

Follow the Quickstart guide to:

  1. Build the gRPC inference server.
  2. Export the bert-base-uncased TorchScript model.
  3. Launch the server with the provided configuration.
  4. Drive it using the Python gRPC client or by authoring your own client.

Documentation

The documentation index lives in the docs folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors