You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have online mining (OHKM) which increases the loss weight on a per-node basis to encourage the optimization to focus on "hard" nodes even if the overall loss is low.
Turning this on early in training can often lead to instabilities, but turning it on in a second training run initialized with the weights for the first training run tends to work well.
It would be great to have a "second phase" training in which OHKM is enabled and the learning rate reset after the first run converges.
It might be easier to set it up as a second training run that runs in a sequence, at the cost of some orchestration complexity and re-initialization overhead. This is how multi-model training runs work (e.g., centroid -> centered instance).
Alternatively, handling the restarting logic internally as part of the same training run would be cleaner on the frontend, but might require a soft layer of orchestration (above Trainer, but in the same process).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We have online mining (OHKM) which increases the loss weight on a per-node basis to encourage the optimization to focus on "hard" nodes even if the overall loss is low.
Turning this on early in training can often lead to instabilities, but turning it on in a second training run initialized with the weights for the first training run tends to work well.
It would be great to have a "second phase" training in which OHKM is enabled and the learning rate reset after the first run converges.
It might be easier to set it up as a second training run that runs in a sequence, at the cost of some orchestration complexity and re-initialization overhead. This is how multi-model training runs work (e.g., centroid -> centered instance).
Alternatively, handling the restarting logic internally as part of the same training run would be cleaner on the frontend, but might require a soft layer of orchestration (above
Trainer
, but in the same process).Idea credit: @olinesn
Beta Was this translation helpful? Give feedback.
All reactions