We need to learn Verl codebase and try to reproduce the result of Jan-v1, Jan-nano to ensure consistency. - Tools - multi turn Agent loop - roll out - custom reward with chat history - wandb logger for custom reward Verl Documentation: https://verl.readthedocs.io/en/latest/sglang_multiturn/multiturn.html#multi-turn-rollout-support