Skip to content
@Toloka

Toloka

Data labeling platform for ML

Pinned Loading

  1. tendem-evaluation tendem-evaluation Public

    Tendem hybrid AI+Human system benchmarking

    Python 2

  2. beemo beemo Public

    Benchmark for fine-grained machine-generated text detection. 6.5k texts written by humans, generated by ten open-source instruction-finetuned LLMs and edited by expert annotators.

    8 1

  3. u-math u-math Public

    Official evaluation code for the U-MATH and μ-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of LLMs on university-level problems.

    Python 10 3

  4. crowd-kit crowd-kit Public

    Control the quality of your labeled data with the Python tools you already know.

    Python 236 19

Repositories

Showing 10 of 30 repositories

Top languages

Loading…

Most used topics

Loading…