[skyrl-train][algorithm] Apply inference engine top-k mask to trainer

As described in the Deepseek v3.2 Tech report - if using top_p or top_k sampling, the sampling mask should be propagated to the trainer to avoid drifting off policy in the trainer compared to the inference engine.

<img width="1015" height="301" alt="Image" src="https://github.com/user-attachments/assets/6302bc48-3c54-4cf2-aecb-56eba0fff532" />