You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in the Deepseek v3.2 Tech report - if using top_p or top_k sampling, the sampling mask should be propagated to the trainer to avoid drifting off policy in the trainer compared to the inference engine.