FDA Drug Label Ingestion & Parsing

Project Summary

This project ingests unstructured drug labeling data from the DailyMed public API, extracts meaningful sections, and structures it into clean JSON for downstream NLP/ML use.

Tech Stack

Python
requests, lxml, json, spacy
Basic CLI orchestration

Workflow

ingest.py: Downloads HTML files using the DailyMed SPL web service.
clean.py: Parses HTML sections using LXML and extracts clinical sections like "INDICATIONS", "WARNINGS", etc.
nlp.py: analyzes sections and creates entity dicts.

Example Output

{
  "INDICATIONS AND USAGE": "This medication is used for...",
  "WARNINGS": "Do not use if you are allergic to...",
  ...
}

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
cheatsheets		cheatsheets
deep-learning		deep-learning
src		src
tests/ingestion		tests/ingestion
.env		.env
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
active-dev-healthcare.ipynb		active-dev-healthcare.ipynb
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
spacy-notes.txt		spacy-notes.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FDA Drug Label Ingestion & Parsing

Project Summary

Tech Stack

Workflow

Example Output

About

Uh oh!

Releases

Packages

Uh oh!

Languages

schampoux/ds-toolkit

Folders and files

Latest commit

History

Repository files navigation

FDA Drug Label Ingestion & Parsing

Project Summary

Tech Stack

Workflow

Example Output

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages