Skip to content

schampoux/ds-toolkit

Repository files navigation

FDA Drug Label Ingestion & Parsing

Project Summary

This project ingests unstructured drug labeling data from the DailyMed public API, extracts meaningful sections, and structures it into clean JSON for downstream NLP/ML use.

Tech Stack

  • Python
  • requests, lxml, json, spacy
  • Basic CLI orchestration

Workflow

  1. ingest.py: Downloads HTML files using the DailyMed SPL web service.
  2. clean.py: Parses HTML sections using LXML and extracts clinical sections like "INDICATIONS", "WARNINGS", etc.
  3. nlp.py: analyzes sections and creates entity dicts.

Example Output

{
  "INDICATIONS AND USAGE": "This medication is used for...",
  "WARNINGS": "Do not use if you are allergic to...",
  ...
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published