fix: update the training data when only a subset of the training split is used

### Priority Level

Medium

### Task Summary

As @alexahaushalter pointed out [here](https://github.com/NVIDIA-NeMo/Safe-Synthesizer/pull/337#discussion_r3075557853), when the total training split is >25k records, by default not all the training split actually gets used in training. For evaluation purposes, we should only keep the part of the training data that's actually seen by the model. 

### Technical Details & Implementation Plan

- If the data fraction is <1, we update the persisted training data after subsetting
- The only subset gets used in evaluation

### Dependencies

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update the training data when only a subset of the training split is used #398

Priority Level

Task Summary

Technical Details & Implementation Plan

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix: update the training data when only a subset of the training split is used #398

Description

Priority Level

Task Summary

Technical Details & Implementation Plan

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions