Skip to content

Conversation

@vintrocode
Copy link

looking for a way to iterate faster and be more targeted with the data we ingest so it doesn't take 10 hours. one solid session with claude 4 sonnet in cursor came up with this locomo_tool.py script that helps us do that. instructions on how to use are in the readme, but some example outputs:

$ (locomo) ➜  locomo git:(vince/subset-tool) python3 locomo_tool.py explore --list-conversations

Available Conversations:
==================================================
 0: Caroline ↔ Melanie
 1: Jon ↔ Gina
 2: John ↔ Maria
 3: Joanna ↔ Nate
 4: Tim ↔ John
 5: Audrey ↔ Andrew
 6: James ↔ John
 7: Deborah ↔ Jolene
 8: Evan ↔ Sam
 9: Calvin ↔ Dave

$ (locomo) ➜  locomo git:(vince/subset-tool) python3 locomo_tool.py explore --conversation 0 --category 1 --n 5 --preview

Subset Preview: Caroline ↔ Melanie | Category 1 | Top 5 questions
================================================================================
✓ Found 5 questions (requested 5)

Selected Questions:
   1. "What did Caroline research?"
      Evidence: D2:8
   2. "What is Caroline's identity?"
      Evidence: D1:5
   3. "What is Caroline's relationship status?"
      Evidence: D3:13, D2:14
   4. "Where did Caroline move from 4 years ago?"
      Evidence: D3:13, D4:3
   5. "What career path has Caroline decided to persue?"
      Evidence: D4:13, D1:11

Latest Evidence: D4:13
Sessions to include: 1 to 4
  Session 1: All 18 messages
  Session 2: All 17 messages
  Session 3: All 23 messages
  Session 4: First 13 messages

Total messages in subset: 71

pretty neat. to subset you'd just run something like python3 locomo_tool.py subset --conversation 0 --category 1 --n 10 --output experiment_cat1.json and it'd output something that follows the (super messy) existing data structure so it should work downstream in all the evaluate scripts... just change the data path in e.g., the evaluate_honcho.sh script to point to your newly subsetted data file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants