Resampling with label uniformity and user uniformity

Hi,

I have a regression problem, so the label is a single floating-point number within a well-defined range (e.g. [0, 1]). The label distribution is non-uniform: namely, there is markedly less data at the edges, but also in the very middle of the range. So far, a classical problem for SMOGN. However, I sample data from multiple users, and there is also a huge imbalance in amount of data among users. I would prefer that all users are well-represented in the training set in addition to balancing the label range distribution. Thus, I would prefer that the algorithm is aware of user labels, and tries to undersample users with a lot of data and preserve or oversample users with little data. Is this currently possible? Do you have suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resampling with label uniformity and user uniformity #39

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Resampling with label uniformity and user uniformity #39

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions