Milestones

Algorithm Development
No due date
•4/5 issues closed
80% complete1 open 4 closed
Augmentation
The eICR Augmentation specifications, developed by APHL, detail out the various changes and tags that need to be incorporated into an ‘augmented’ eICR for the TTC project, as well as for other projects that intend to make any changes to an existing eICR. The specifications demonstrate 4 main changes: A new Document Id, extension, version, effectiveTime, assigningAuthority, and setId tags with all new values to demonstrate this is a ‘new’ iteration of this eICR, but keep all existing other Document Id tags to retain history. This includes keeping the ‘parentDocument’ tag pointing to the original eICR A new Author tag/section in the header indicating which application (‘text-to-code’ in our case) has performed the augmentation of the eICR, but keep all existing Author tags/sections to maintain history. A new Author tag/section within the specific ‘section’.’entry’ area where the change to the eICR will occur. (ie. Lab Resulting section - entry.Observation). A Translation tag, under the ‘code’ element that contained the error, contains the newly mapped LOINC code from TTC.
No due date
•13/15 issues closed
86% complete2 open 13 closed
API
See Eng Sync [notes](https://docs.google.com/document/d/1OMgW3Xw7I7azv2IiUjGVtJKPl9rIlce8ApKS-hHLhNU/edit?tab=t.0#heading=h.z4u29epgcfum) where we sketched out the requirements for the API.
No due date
•5/5 issues closed
100% complete0 open 5 closed
Testing Site MVP
We need to create a simple frontend experience where users can input strings and get back standardized LOINC codes for lab test names. See example mockup: https://drive.google.com/file/d/17czDxU1DsDpVCaYM8PcbgXerihYCGqU7/view?usp=drive_link
No due date
•5/5 issues closed
100% complete0 open 5 closed
AWS <> TTC Infrastructure
We need to create functions that allow the TTC lambda function to receive information about a TTC event, read and write various pieces of information (manifest & eICRs to/from s3 bucket), query vector DB, and push a success event to the APHL event bridge.
No due date
•23/33 issues closed
69% complete10 open 23 closed
KNN Modeling
This phase builds a second classifier using K-Nearest Neighbors with cosine similarity over document embeddings. It generalizes scoring and timing utilities for broader model support and wraps both Naive Bayes and kNN models into a unified interface for easier training, evaluation, and deployment.
No due date
0% complete0 open 0 closed
Embedding Representation
This milestone shifts to using semantic document embeddings as an alternative to BoW. It involves creating synthetic lab name data, using LOINC as a source for semantic context, and constructing vocabulary embeddings from LOINC metadata. Embedding persistence functions are also developed
No due date
•2/2 issues closed
100% complete0 open 2 closed
Naive Bayes Modeling
This phase introduces the first classification approach using a multiclass Naive Bayes model trained on BoW features. Tasks include model training, evaluation, and validation via k-fold cross-validation. Functions for performance metrics and model persistence are also implemented.
No due date
•0/6 issues closed
0% complete6 open 0 closed
Model Preprocessing & Representation
This milestone establishes the core preprocessing pipeline to clean and normalize input text, preparing it for modeling. It includes standard NLP tasks such as tokenization, lemmatization, and whitespace/punctuation normalization. It also introduces the initial feature engineering approach using Bag-of-Words (BoW) and supports persistence of vocabulary representations
No due date
•4/7 issues closed
57% complete3 open 4 closed
eICR Processing
All things related to handling the eICR around TTC.
No due date
•16/16 issues closed
100% complete0 open 16 closed
Model Exploration
This phase focuses on setting the foundation for modeling lab result data by identifying the appropriate NLP/ML tools and gaining familiarity with the structure and variability of real-world eCR data. Tasks include comparing available libraries, reviewing sample eCRs, and generating synthetic lab data for development and testing.
No due date
•5/5 issues closed
100% complete0 open 5 closed