libcudf-rs

Rust bindings for libcudf, the GPU-accelerated DataFrame library from RAPIDS.

Overview

This project provides safe, idiomatic Rust bindings to cuDF using the cxx library for seamless C++/Rust interoperability. cuDF enables GPU-accelerated operations on DataFrames, offering significant performance improvements for data processing tasks.

Executing SQL workloads on GPU

For SQL execution, this project uses Apache DataFusion with a physical optimizer rule that replaces vanilla DataFusion nodes with GPU variants.

Taking the following query from the TPCH benchmark:

select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
from
    lineitem
where
        l_shipdate <= date '1998-09-02'
group by
    l_returnflag,
    l_linestatus
order by
    l_returnflag,
    l_linestatus;

DataFusion will produce the following executable plan:

SortPreservingMergeExec: [...]
  SortExec: expr=[...], preserve_partitioning=[...]
    ProjectionExec: expr=[...]
      AggregateExec: mode=FinalPartitioned, gby=[...], aggr=[...]
        RepartitionExec: partitioning=Hash([...], 4), input_partitions=4
          AggregateExec: mode=Partial, gby=[...], aggr=[...]
            ProjectionExec: expr=[...]
              FilterExec: <expr>, projection=[...]
                DataSourceExec: file_groups={4 groups: [...]}, projection=[...]

This project inspects the plan and replaces nodes with their cuDF (GPU)-based variants, producing a different executable plan that looks like this:

CuDFUnloadExec, metrics=[...]
  CuDFSortExec: expr=[...], preserve_partitioning=[...]
    CuDFProjectionExec: expr=[...]
      CuDFAggregateExec: mode=Single, group_by=[...], aggr_expr=[...]
        CuDFProjectionExec: expr=[...]
          CuDFFilterExec: l_shipdate@6 <= 1998-09-02, projection=[...]
            CuDFLoadExec, metrics=[...]
              DataSourceExec: file_groups={4 groups: [...]}, projection=[...]

The cuDF-based plan is indeed cheaper and faster to execute than the pure CPU one. This was measured by comparing the execution latency in two different machines:

m5.4xlarge | 16vCPU 64Gb RAM | ~$625 monthly | 906 ms TPCH Q1
g4dn.xlarge | 4vCPU 16Gb NVIDIA T4 | ~$423 monthly | 813 ms TPCH Q1

Even if the GPU-based machine is cheaper because of having fewer vCPUs and less RAM, it's still capable of executing TPCH Q1, so doing some basic math, the conclusion is that, for the same latency, executing on GPU is 1.65x cheaper with the current state of this project.

What's next?

This project is the result of a couple of weeks' hackathon, and there are several low-hanging fruit to be addressed that could make GPU execution significantly more performant.

Even though the focus of this project is to get TPCH Q1 working faster and cheaper in GPU vs CPU, it's capable of running the full TPCH suite on GPU. Rather than implementing a wide breadth of features, it focuses on laying the foundations for executing relational algebra on GPUs for a wide variety of use cases.

Follow-up work will bring further performance improvements and support for new relational algebra operations.

Project Structure

The project is organized as a Rust workspace with the following crates:

libcudf-sys: Low-level FFI bindings to libcudf using cxx
libcudf-rs: Safe, high-level Rust API wrapping the FFI bindings
libcudf-datafusion: Integration with Apache DataFusion

Prerequisites

Before building this project, you need:

CUDA Toolkit: Required for GPU operations
- Install from NVIDIA CUDA Downloads
libcudf: The cuDF C++ library
- Build from source: cuDF build instructions
- Or install via conda: conda install -c rapidsai -c conda-forge cudf

Rust toolchain:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

C++ compiler: GCC 9+ or Clang that supports C++17

Building

Once dependencies are installed:

# Build the project
cargo build

# Run tests (requires CUDA-capable GPU)
cargo test

# Build with release optimizations
cargo build --release

Usage

Add this to your Cargo.toml:

[dependencies]
libcudf-rs = { path = "path/to/libcudf-rs" }

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
.cargo		.cargo
libcudf-benchmarks		libcudf-benchmarks
libcudf-datafusion-benchmarks		libcudf-datafusion-benchmarks
libcudf-datafusion		libcudf-datafusion
libcudf-sys		libcudf-sys
src		src
testdata/weather		testdata/weather
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

libcudf-rs

Overview

Executing SQL workloads on GPU

What's next?

Project Structure

Prerequisites

Building

Usage

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

libcudf-rs

Overview

Executing SQL workloads on GPU

What's next?

Project Structure

Prerequisites

Building

Usage

Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages