feat: Add CISPO (Clipped IS-weight Policy Optimization)#681
Open
kekmodel wants to merge 4 commits intoTHUDM:mainfrom
Open
feat: Add CISPO (Clipped IS-weight Policy Optimization)#681kekmodel wants to merge 4 commits intoTHUDM:mainfrom
kekmodel wants to merge 4 commits intoTHUDM:mainfrom