Skip to content
@Toloka

Toloka

Data labeling platform for ML

Pinned Loading

  1. tendem-evaluation tendem-evaluation Public

    Tendem hybrid AI+Human system benchmarking

    Python 2

  2. beemo beemo Public

    Benchmark for fine-grained machine-generated text detection. 6.5k texts written by humans, generated by ten open-source instruction-finetuned LLMs and edited by expert annotators.

    9 2

  3. u-math u-math Public

    Official evaluation code for the U-MATH and μ-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of LLMs on university-level problems.

    Python 11 3

  4. crowd-kit crowd-kit Public

    Control the quality of your labeled data with the Python tools you already know.

    Python 239 21

Repositories

Showing 10 of 31 repositories
  • Toloka/template-builder’s past year of commit activity
    TypeScript 4 Apache-2.0 1 1 2 Updated Mar 5, 2026
  • tendem-mcp Public

    Tendem MCP server

    Toloka/tendem-mcp’s past year of commit activity
    Python 103 4 1 2 Updated Mar 4, 2026
  • CrowdSpeech Public

    Benchmark Dataset for Crowdsourced Audio Transcription

    Toloka/CrowdSpeech’s past year of commit activity
    Python 8 3 1 5 Updated Feb 20, 2026
  • dbt-af Public

    Distributed run of dbt models using Airflow

    Toloka/dbt-af’s past year of commit activity
    Python 168 15 1 2 Updated Feb 11, 2026
  • u-math Public

    Official evaluation code for the U-MATH and μ-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of LLMs on university-level problems.

    Toloka/u-math’s past year of commit activity
    Python 11 MIT 3 1 0 Updated Jan 30, 2026
  • pg-queue-playground Public

    Playground for transactional queues in PostgreSQL

    Toloka/pg-queue-playground’s past year of commit activity
    Java 5 0 2 1 Updated Jan 30, 2026
  • tendem-evaluation Public

    Tendem hybrid AI+Human system benchmarking

    Toloka/tendem-evaluation’s past year of commit activity
    Python 2 0 0 0 Updated Jan 27, 2026
  • crowd-kit Public

    Control the quality of your labeled data with the Python tools you already know.

    Toloka/crowd-kit’s past year of commit activity
    Python 239 21 4 2 Updated Dec 1, 2025
  • .github Public

    Niceties for GitHub

    Toloka/.github’s past year of commit activity
    0 0 0 0 Updated Nov 18, 2025
  • primeape Public

    Multilingual human preference prediction and explanation

    Toloka/primeape’s past year of commit activity
    0 0 1 0 Updated Sep 8, 2025

Top languages

Loading…