Skip to content

Improve API to use a custom batch_sampler in a SentenceTransformerTrainer #3152

Open
@alonme

Description

@alonme

Currently only specific batch_sampler values are possible
There seems to be a need to enable users to create custom batch samplers

Examples from issues:

  1. Batch sampler #3123
  2. Using DataLoader in v3? #2707
  3. MNRL with Multiple hard negatives per query and NoDuplicatesBatchSampler #2954

I believe that the current solution (that i found the hard way) that is suggested in the issues is to subclass SentenceTransformerTrainer and override get_batch_sampler, is not documented well enough and isn't straightforward

Is there any reason not to just accept anything that inherits from DefaultBatchSampler (for example) as a parameter?

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions