- implement in chat template (ow.chat.logprobs.create(messages=blockwise)) -> goto eval -> 0-100 judge
- distill a reasoning model where prefix shows the number of reasoning tokens, so that we can control reasoning length at inference time (assistant: {cot with 590 tokens})
- optionally: prefix with noisy version of thinking length, to allow flexibility
- make shorter reasoning chains:
- v1: by adding a length penalty to the reward function
- v2: by training the model on EY's "how could I have thought this faster?" task Format: U: Is this statement true: ...? A: yada yada U: How could you have thought that faster? A: ... I could have said: " yada " Reward: Is the second CoT likely and short? logP(" yada yada ") - logP(" yada ") + alpha(len(" yada "))
- merge chat.py, temporary_api.pyx
- add cpu instances
- customisable keep worker running for X mins
- delete API key revokes access