Skip to content

Add source adapter layer for IBM MarketScan, IBM LCED, and CPRD GOLD#235

Open
xushenbo wants to merge 2 commits into
theislab:mainfrom
xushenbo:feature/source-adapters-migration
Open

Add source adapter layer for IBM MarketScan, IBM LCED, and CPRD GOLD#235
xushenbo wants to merge 2 commits into
theislab:mainfrom
xushenbo:feature/source-adapters-migration

Conversation

@xushenbo
Copy link
Copy Markdown

Summary

This PR migrates three EHR ETL pipelines into a new ehrdata.io.source Python layer, providing a unified adapter interface for IBM MarketScan, IBM LCED, and CPRD GOLD.

New modules

  • ehrdata.io.source.adapters.marketscan — US commercial claims (diagnosis, therapy, procedure, patinfo, insurance, provider)
  • ehrdata.io.source.adapters.lced — IBM LCED linked claims-EMR data (diagnosis, therapy, lab test, habit, patinfo)
  • ehrdata.io.source.adapters.cprd — CPRD GOLD UK primary-care data (diagnosis, therapy, lab test, patinfo); supports Read code and prodcode vocabulary translation
  • ehrdata.io.source.vocab — vocabulary loaders for NDC→ingredient, RxNorm→ingredient, LOINC, Read code (medcode→readcode), and prodcode mappings
  • ehrdata.io.source.to_ehrdata — bridge that converts any combination of canonical source tables into an EHRData object

Documentation

  • Reference pages for all three data sources under docs/sources/
  • Four tutorial notebooks under docs/tutorials/: CPRD overview and EDA, LOINC mapping, ICD version inference and mapping, LCED T2D cohort study

Tests

327 new tests covering all adapters, vocab loaders, normalisation utilities, and the to_ehrdata bridge.

Test plan

  • pytest tests/io/test_source_*.py — 327 tests, all passing
  • Tutorial notebooks run without errors
  • hatch run docs:build — docs build cleanly

- Three ETL adapters (marketscan, lced, cprd) producing canonical tables
- Vocabulary loaders: NDC, RxNorm, LOINC, Read code, prodcode, ICD GEM/CCS stubs
- to_ehrdata() bridge converting canonical tables to EHRData
- 327 tests across all modules
- Reference docs: docs/sources/{marketscan,lced,cprd}.md
- Tutorial notebooks: CPRD overview, LOINC mapping, ICD mapping, LCED cohort
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant