Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and pyomop-migrate CLI#260
Conversation
Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>
|
@copilot |
Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>
Done in dc34df0. Added
Example usage: # SQLite → SQLite
pyomop --migrate \
--src-dbtype sqlite --src-name hospital.sqlite \
--dbtype sqlite --name omop.sqlite \
--mapping ehr_to_omop.json
# PostgreSQL → PostgreSQL
pyomop --migrate \
--src-dbtype pgsql --src-host srchost --src-user reader --src-pw secret --src-name ehr \
--dbtype pgsql --host omophost --user writer --pw secret --name omop \
--mapping ehr_to_omop.json --batch-size 500Full CLI reference and examples added to |
--migrate CLI
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #260 +/- ##
=========================================
- Coverage 77.1% 75.3% -1.8%
=========================================
Files 19 21 +2
Lines 3334 3982 +648
Branches 268 384 +116
=========================================
+ Hits 2572 3002 +430
- Misses 667 836 +169
- Partials 95 144 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>
Done in e3a0515. Here's what was added: Schema extraction (
pyomop --extract-schema \
--src-dbtype sqlite --src-name hospital.sqlite \
--schema-output hospital_schema.mdEnvironment variable support
export SRC_DB_HOST=db.hospital.org
export SRC_DB_USER=readonly
export SRC_DB_PASSWORD=secret
export SRC_DB_NAME=ehr_db
pyomop --extract-schema --src-dbtype pgsql --schema-output ehr_schema.mdTests: 9 new tests covering schema Markdown content, PK/FK rendering, the |
--migrate CLI--migrate and --extract-schema CLI
…ails for generic database loader
|
Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>
Done in 117a1b5. Here's what was changed: New module
pyomop-migrate --migrate --src-dbtype sqlite --src-name hospital.sqlite \
--dbtype sqlite --name omop.sqlite --mapping mapping.json
pyomop-migrate --extract-schema --src-dbtype sqlite --src-name hospital.sqlite \
--schema-output schema.md
Renamed files:
|
--migrate and --extract-schema CLIpyomop-migrate CLI
CdmCsvLoaderis coupled to a flat FHIR-derived CSV. This addsCdmGenericLoader— reads from any async SQLAlchemy source database (SQLite/Postgres/MySQL) and loads into an OMOP CDM target, guided by the same JSON mapping convention.Core implementation
src/pyomop/migrate/pyomop_migrate.py—CdmGenericLoaderclass;create_source_engine(),load_mapping(),build_source_url(), andextract_schema_to_markdown()helpers, plus thepyomop-migrateCLI entry pointsrc/pyomop/migrate/__init__.py— sub-package re-exporting all public symbolssrc/pyomop/generic_loader.py— thin backward-compatibility shim (deprecated; re-exports frompyomop.migrate)src/pyomop/mapping.generic.example.json— annotated example mapping (patients → person, encounters → visit_occurrence, diagnoses → condition_occurrence, lab results → measurement, medications → drug_exposure)Mapping format
Extends the existing
mapping.default.jsonconvention with asource_tablekey per entry. Filters translate to SQLWHEREclauses (pushed to the source DB, not pandas masks).{ "tables": [ { "source_table": "patients", "name": "person", "filters": [{"column": "active", "equals": 1}], "columns": { "person_id": "id", "gender_source_value": "gender", "gender_concept_id": {"const": 0}, "year_of_birth": {"const": 0}, "race_concept_id": {"const": 0}, "ethnicity_concept_id":{"const": 0} } } ] }Same 5 post-load steps as
CdmCsvLoader:person_idFK normalisation, birth-date backfill, gender concept mapping, concept-code lookups. Missing source/target tables warn and skip rather than abort.pyomop-migrateCLI scriptAll migration functionality lives in a dedicated
pyomop-migrateentry point, keepingmain.py(pyomop) unchanged. Source-database connection details use--src-*options; the target OMOP CDM database uses the standard connection options.--extract-schemacommandIntrospects the source database and writes a Markdown document with full schema information (table names, column names, data types, nullable flags, PK/FK relationships, and row counts). This output is designed to be fed to AI agents to auto-generate a mapping JSON file.
Environment variable support
All source database connection parameters can be supplied via environment variables instead of CLI flags, keeping credentials out of shell history:
SRC_DB_HOST--src-hostSRC_DB_PORT--src-portSRC_DB_USER--src-userSRC_DB_PASSWORD--src-pwSRC_DB_NAME--src-nameThis applies to both
--migrateand--extract-schema.Tests & docs
tests/test_pyomop_migrate.py— 26 unit tests covering row loading, gender/birth backfill, SQL filters, missing-table skip, multi-table mapping, batch correctness, CLI migrate, schema extraction (content, PK/FK, CLI end-to-end), URL building (all backends + env var override), and error casesdocs/pyomop_migrate.md— full usage, API reference, CLI option tables with examples, schema extraction section, and environment variable referencenotes/pyomop_migrate.md— design decisions, env var security model, schema extraction design, and future workREADME.md— new quick-start section, schema extraction example, env var note, and updated command-line reference__init__.py/mkdocs.yml/pyproject.toml— export, navigation, and entry point wired upOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.