Automated Extraction of Fluoropyrimidine Treatment & Toxicities from Clinical Notes

Rule-based • Classical ML • Deep Learning (BERT/ClinicalBERT) • LLM (zero-shot & error-analysis prompting)

This repository hosts code accompanying the manuscript:

Automated Extraction of Fluoropyrimidine Treatment and Treatment-Related Toxicities from Clinical Notes Using Natural Language Processing
Xizhi Wu, Madeline S. Kreider, Philip E. Empey, Chenyu Li, Yanshan Wang

Overview

We compare four families of NLP methods for extracting fluoropyrimidine (FP) treatment and treatment-related toxicities from oncology clinical notes:

Rule-based with MedTagger (regex + custom context/negation rules).
Classical ML: Logistic Regression (LR), Linear SVM, Random Forest (RF) using bag-of-words / TF-IDF.
Deep Learning: BERT and ClinicalBERT sentence classifiers.
LLM prompting: LLaMA-3.1-8B (local) with zero-shot and error-analysis prompting.

All methods use the same 80:20 train–test split and are evaluated with precision, recall, and (weighted) F1.

Tasks & Labels

Binary sentence-level classification across five categories:

Drug of interest (capecitabine, 5-FU; brand names; combination regimens & abbreviations)
Arrhythmia
Heart Failure (HF)
Valvular Complications
HFS treatment/prevention therapies (topicals & uridine triacetate)

See resources/keywords/ for curated terminology lists and resources/prompts/ for LLM prompts.

Pipelines

Rule-based with MedTagger

A deterministic NLP pipeline built with regular expressions and custom context rules (negation, uncertainty, experiencer). Clinical experts curated terminology lists, and rules were tuned on 80% of the dataset. MedTagger executed these regex rules to extract mentions of fluoropyrimidine treatment and toxicities.

Classical ML (LR / SVM / RF)

A supervised classification approach with one binary model per toxicity category. Text is preprocessed (tokenization, normalization) and transformed into features:

TF-IDF with sublinear scaling for linear models (Logistic Regression, SVM).

Count-based n-grams for Random Forest. Class weighting addressed imbalance, and models were evaluated using precision, recall, and F1.

Deep Learning (BERT / ClinicalBERT)

Neural classifiers using transformer-based embeddings. Input sentences are tokenized and passed through pretrained BERT or ClinicalBERT. Each category is trained independently, with ClinicalBERT providing domain-specific adaptation. Predictions are converted into binary outputs and evaluated against the gold standard.

LLM prompting (LLMs: Zero-shot & Error-analysis prompting)

Inference-only classification with LLaMA 3.1 8B using prompt engineering:

Zero-shot prompting: binary classification prompts with terminology lists and explicit yes/no outputs.

Error-analysis prompting: improved prompts with chain-of-thought reasoning examples, created by analyzing systematic misclassifications. These enhanced prompts helped LLMs better capture indirect or nuanced clinical evidence.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
pipelines		pipelines
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Extraction of Fluoropyrimidine Treatment & Toxicities from Clinical Notes

Table of Contents

Overview

Tasks & Labels

Pipelines

About

Uh oh!

Releases

Packages

Languages

PittNAIL/NLP4FPandToxicity

Folders and files

Latest commit

History

Repository files navigation

Automated Extraction of Fluoropyrimidine Treatment & Toxicities from Clinical Notes

Table of Contents

Overview

Tasks & Labels

Pipelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages