Skip to content

Latest commit

 

History

History
35 lines (26 loc) · 1.34 KB

todo.md

File metadata and controls

35 lines (26 loc) · 1.34 KB

Logprob / MC test based on vllm

  • implement in chat template (ow.chat.logprobs.create(messages=blockwise)) -> goto eval -> 0-100 judge

deploy checkpoint API

Use tag as color in dashboard plots

RL jobs

https://www.reddit.com/r/LocalLLaMA/comments/1ijab77/train_your_own_reasoning_model_80_less_vram_grpo/

  • distill a reasoning model where prefix shows the number of reasoning tokens, so that we can control reasoning length at inference time (assistant: {cot with 590 tokens})
    • optionally: prefix with noisy version of thinking length, to allow flexibility
  • make shorter reasoning chains:
    • v1: by adding a length penalty to the reward function
    • v2: by training the model on EY's "how could I have thought this faster?" task Format: U: Is this statement true: ...? A: yada yada U: How could you have thought that faster? A: ... I could have said: " yada " Reward: Is the second CoT likely and short? logP(" yada yada ") - logP(" yada ") + alpha(len(" yada "))

torchtune jobs

general

  • merge chat.py, temporary_api.pyx
  • add cpu instances
  • customisable keep worker running for X mins
  • delete API key revokes access