Skip to content

Conversation

@cmunley1
Copy link
Contributor

@cmunley1 cmunley1 commented Jan 28, 2026

add docs for how gym and RL enforces monotonicity and performs on policy token id corrections. add hypothetical docs on how to disable these checks for non monotonic trajectories, eg qwen3 thinking or agents with context management

disabling would be done by
NVIDIA-NeMo/RL#1812
potentially in NVIDIA-NeMo/RL#1779

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants