[tts] Add sample-only reasoning TTS math slice#4774
[tts] Add sample-only reasoning TTS math slice#4774taivu1998 wants to merge 3 commits intomarin-community:mainfrom
Conversation
|
🤖 Specification Problem Approach Key code
Tests Validation run locally on the PR branch: The targeted |
This lands a standalone test_time_scaling package so candidate generation, selector replay, and artifact accounting live outside benchmark-specific evaluation code. It also adds the first math-focused runner and regression tests so we can iterate on sample-only reasoning TTS before moving on to code and verifier stages.
d1d4d2b to
9a0c247
Compare
Add the first marin.test_time_scaling vertical slice with replayable candidate generation, sample-only selectors, and stable artifact logging for math reasoning runs. This adds a standalone runner and focused tests so we can measure first-sample, majority-vote, and logprob baselines on a shared candidate pool.