Open
Description
It would be an interesting experiment to see if a model could be used to improve the guidelines iteratively.
So something like:
- Model runs evals with current guidelines
- Model observes output to ensure all evals still pass
- Model makes a change to the guidelines (perhaps to remove something or refine to use less tokens)
- Model returns to step 1
It might be important to use a "reasoning" model like o1 or r1 for this "architect" step before executing the evals again on a lower-model.
Metadata
Metadata
Assignees
Labels
No labels