-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Comments from Jason --
Top-level Issues
- Docs are outdated: Following the README often results in broken runs due to missing imports, hard-coded paths, or incorrect instructions.
- Config handling is messy: Many configs are hard-coded or load other configs, making things hard to trace and brittle to edit.
- Tied to Carina: Deep dependencies on internal infra make it unusable for external collaborators (e.g., ARPA-H).
MEDS Demo – Tokenizer Training Broken
- Step 7 ("Train tokenizer with cookbook.py") fails due to missing required args.
- Tokenizer config logic is opaque — it seems to point to existing trained configs, not a spec for training a new one.
- No clear minimal path to train a tokenizer (e.g., just vocab size + dataset). create_cookbook_k.py seems promising but fails due to import and config issues.
- Filesystem logic is unpredictable — e.g., setting cache/default results in cache_4k/default/.
What Works
- Inference and patient representation code works — I can tokenize MEDS eventstreams and get patient embeddings from the pretrained models.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels