The resume mechanism is broken and results in multiple errors: - Weight names do not match - Important states are not saved (AMP scaler, scheduler, ...) - Only the best model is saved, which makes stop&resume hard.