Skip to content

fix: update the training data when only a subset of the training split is used #398

@nina-xu

Description

@nina-xu

Priority Level

Medium

Task Summary

As @alexahaushalter pointed out here, when the total training split is >25k records, by default not all the training split actually gets used in training. For evaluation purposes, we should only keep the part of the training data that's actually seen by the model.

Technical Details & Implementation Plan

  • If the data fraction is <1, we update the persisted training data after subsetting
  • The only subset gets used in evaluation

Dependencies

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskDevelopment task

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions