[Agentic SFT] Post-train Marin-32B on the TerminalCorpus

## Description

Post-train `marin-community/marin-32b-base` on 15% of Nemotron-Terminal-Corpus with the same SFT recipe used in `#4307`, then evaluate on Terminal-Bench 2.0 against the Qwen3-32B SFT checkpoint at the same training fraction.

## Hypothesis or Goal

Marin-32B base should absorb the same terminal-skills mix with comparable or better sample efficiency than the Qwen3-32B baseline when both are trained on 15% of the full corpus.

### Links

* WandB Report:
* Issue #4307: https://github.com/marin-community/marin/issues/4307
* Marin 32B base: https://huggingface.co/marin-community/marin-32b-base

## Results


## Summary

This experiment fine-tuned `marin-community/marin-32b-base` for 859 steps on 15% of Nemotron-Terminal-Corpus and evaluated the checkpoint on Terminal-Bench 2.0 with the Harbor/Terminus-2 setup. Early failures were traced to an EOS metadata mismatch in the saved checkpoint, not to the SFT weights themselves; after patching the checkpoint and adding pipeline safeguards, the apples-to-apples eval still rejected the hypothesis. At a comparable training budget, Marin-32B solved 2/87 tasks (~2.3%) versus Qwen3-32B SFT's 18/86 (~20.9%), and Marin's two solves were also solved by Qwen3 at half its training budget. The current recommendation is to treat the 15% Marin-32B SFT as a negative result and only continue if testing full-corpus Marin SFT is worth the cost.

### Helpful links
- Final closing result: https://github.com/marin-community/marin/issues/4760#issuecomment-4667077804
- EOS root cause and pipeline fix: https://github.com/marin-community/marin/issues/4760#issuecomment-4333044507
- Apples-to-apples post-fix eval: https://github.com/marin-community/marin/issues/4760#issuecomment-4334709841
- Raw TB2 trajectories: https://huggingface.co/datasets/AlienKevin/terminal-bench-2-sft-traces

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Agentic SFT] Post-train Marin-32B on the TerminalCorpus #4760

Description

Hypothesis or Goal

Links

Results

Summary

Helpful links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Agentic SFT] Post-train Marin-32B on the TerminalCorpus #4760

Description

Description

Hypothesis or Goal

Links

Results

Summary

Helpful links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions