Skip to content
Change the repository type filter

All

    Repositories list

    • Code used for "ROC-n-reroll: How verifier imperfection affects test-time scaling" at ICLR 2026.
      Jupyter Notebook
      0200Updated Feb 20, 2026Feb 20, 2026
    • folktexts

      Public
      Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!
      Jupyter Notebook
      MIT License
      52501Updated Dec 14, 2025Dec 14, 2025
    • Code to reproduce the paper "Monoculture or Multiplicity: Which Is It?"
      Jupyter Notebook
      MIT License
      0000Updated Oct 27, 2025Oct 27, 2025
    • BenchBench is a Python package to evaluate multi-task benchmarks.
      Python
      MIT License
      11910Updated Oct 12, 2025Oct 12, 2025
    • Jupyter Notebook
      MIT License
      0500Updated Sep 22, 2025Sep 22, 2025
    • Python
      MIT License
      1500Updated Aug 30, 2025Aug 30, 2025
    • Achieve error-rate fairness between societal groups for any score-based classifier.
      Python
      MIT License
      41902Updated Aug 21, 2025Aug 21, 2025
    • A framework for few-shot evaluation of language models.
      Python
      MIT License
      3.2k100Updated May 4, 2025May 4, 2025
    • Code to reproduce the paper "Do causal predictors generalize better to new domains?"
      Python
      Other
      181500Updated Feb 7, 2025Feb 7, 2025
    • Jupyter Notebook
      MIT License
      0000Updated Jan 22, 2025Jan 22, 2025
    • Code to reproduce the paper "Questioning the Survey Responses of Large Language Models"
      Jupyter Notebook
      MIT License
      2900Updated Dec 8, 2024Dec 8, 2024
    • Code to reproduce the experiments in the paper Training on the Test Task Confounds Evaluation and Emergence.
      Jupyter Notebook
      11100Updated Dec 3, 2024Dec 3, 2024
    • lawma

      Public
      Lawma: A lightly fine-tuned Llama model for legal classification tasks.
      Jupyter Notebook
      12800Updated Sep 14, 2024Sep 14, 2024
    • Datasets derived from US census data
      Python
      MIT License
      2228174Updated May 15, 2024May 15, 2024
    • tttlm

      Public
      Test-time-training on nearest neighbors for large language models
      Python
      MIT License
      65000Updated Apr 18, 2024Apr 18, 2024
    • Code for "Is your model predicting the past?"
      Jupyter Notebook
      MIT License
      0200Updated Mar 10, 2024Mar 10, 2024
    • whynot

      Public
      A Python sandbox for decision making in dynamics
      Python
      MIT License
      4442682Updated Aug 21, 2023Aug 21, 2023
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.