This project is currently under active development. There are no releases yet, and the interface or features will change frequently.
This project combines StarPU and LibTorch to efficiently schedule deep learning inference tasks across CPUs and GPUs of a compute node. The main goal is to maximize throughput while maintaining latency control, by leveraging asynchronous and heterogeneous execution.
- Perform inference of TorchScript models (e.g., ResNet, BERT) using LibTorch.
- Dynamically schedule inference tasks between CPU and GPU using StarPU.
- Optimize throughput while satisfying latency constraints.
See installation for setup instructions, including dependency lists, and native build steps. See docker guide for Docker image build commands and execution.
Follow the Quickstart guide to:
- Build the gRPC inference server.
- Export the
bert-base-uncasedTorchScript model. - Launch the server with the provided configuration.
- Drive it using the Python gRPC client or by authoring your own client.
The documentation index lives in the docs folder.