Skip to content

[FEA] Add testing to help defend against thread-safety errors  #16584

Open
@GregoryKimball

Description

@GregoryKimball

Is your feature request related to a problem? Please describe.
In 24.08, libcudf received reports of two thread-safety errors. In #16405, we learned that cuco's legacy static_map used a counter that was not thread-safe. In #16426, we learned that a std::map in a singleton class was not thread-safe.

Describe the solution you'd like
libcudf should have some C++ multi-threaded tests that attempt to catch thread-safety issues before they impact our partners.

In #16426, Spark-RAPIDS encountered segfaults in their test_avro integration test. This test appears to use pyspark to create a local spark session, and then read 50 files over multiple worker threads.

I suggest that we extend recent benchmarking work in multi-threaded read_parquet, read_orc, and groupby_max to design a C++ integration test that we can run in CI or nightlies.

Perhaps we could run the libcudf TPC-H derived examples at a tiny scale factor over multiple threads. That pattern would be useful for studying pipelining as well as possibly shaking out some threading problems.

Describe alternatives you've considered
Continue to rely on libcudf users to report thread-safety issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.testsUnit testing for project

    Type

    No type

    Projects

    Status

    To be revisited

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions