Priority Level
Medium
Task Summary
As @alexahaushalter pointed out here, when the total training split is >25k records, by default not all the training split actually gets used in training. For evaluation purposes, we should only keep the part of the training data that's actually seen by the model.
Technical Details & Implementation Plan
- If the data fraction is <1, we update the persisted training data after subsetting
- The only subset gets used in evaluation
Dependencies
No response
Priority Level
Medium
Task Summary
As @alexahaushalter pointed out here, when the total training split is >25k records, by default not all the training split actually gets used in training. For evaluation purposes, we should only keep the part of the training data that's actually seen by the model.
Technical Details & Implementation Plan
Dependencies
No response