[datasets] Add Hermes trace support to the SFT pipeline#4858
[datasets] Add Hermes trace support to the SFT pipeline#4858taivu1998 wants to merge 2 commits intomarin-community:mainfrom
Conversation
This adds minimal postprocessing and row-ID hook seams for SFT conversation transforms so Hermes-style trace normalization can land without reshaping the existing adapter stack. It also preserves transformed dataset hash stability when the new hooks are unset and adds regression tests for hook ordering and signature behavior.
Normalize Hermes tool responses into Marin's chat format while preserving raw think/tool-call assistant turns. Register the glm-5.1 and kimi splits, add focused fixtures and regression tests, and add a trace-focused pilot SFT experiment.
|
🤖 Spec Problem Approach Key code
Tests |
Add conversation transform hooks for dataset-specific message normalization and stable row IDs, then register the Hermes glm-5.1 and kimi trace splits for SFT. Adds regression fixtures/tests and a trace-focused pilot experiment.