Updating repo for collaborators

Comments from Jason -- 

Top-level Issues
- Docs are outdated: Following the README often results in broken runs due to missing imports, hard-coded paths, or incorrect instructions.
- Config handling is messy: Many configs are hard-coded or load other configs, making things hard to trace and brittle to edit.
- Tied to Carina: Deep dependencies on internal infra make it unusable for external collaborators (e.g., ARPA-H).

MEDS Demo – Tokenizer Training Broken
- Step 7 ("Train tokenizer with cookbook.py") fails due to missing required args.
- Tokenizer config logic is opaque — it seems to point to existing trained configs, not a spec for training a new one.
- No clear minimal path to train a tokenizer (e.g., just vocab size + dataset). create_cookbook_k.py seems promising but fails due to import and config issues.
- Filesystem logic is unpredictable — e.g., setting cache/default results in cache_4k/default/.

What Works
- Inference and patient representation code works — I can tokenize MEDS eventstreams and get patient embeddings from the pretrained models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating repo for collaborators #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Updating repo for collaborators #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions