feat: subset tool #2
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
looking for a way to iterate faster and be more targeted with the data we ingest so it doesn't take 10 hours. one solid session with claude 4 sonnet in cursor came up with this
locomo_tool.pyscript that helps us do that. instructions on how to use are in the readme, but some example outputs:pretty neat. to subset you'd just run something like
python3 locomo_tool.py subset --conversation 0 --category 1 --n 10 --output experiment_cat1.jsonand it'd output something that follows the (super messy) existing data structure so it should work downstream in all the evaluate scripts... just change the data path in e.g., theevaluate_honcho.shscript to point to your newly subsetted data file.