Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and `pyomop-migrate` CLI by Copilot · Pull Request #260 · dermatologist/pyomop

Copilot · 2026-02-19T13:37:28Z

CdmCsvLoader is coupled to a flat FHIR-derived CSV. This adds CdmGenericLoader — reads from any async SQLAlchemy source database (SQLite/Postgres/MySQL) and loads into an OMOP CDM target, guided by the same JSON mapping convention.

Core implementation

src/pyomop/migrate/pyomop_migrate.py — CdmGenericLoader class; create_source_engine(), load_mapping(), build_source_url(), and extract_schema_to_markdown() helpers, plus the pyomop-migrate CLI entry point
src/pyomop/migrate/__init__.py — sub-package re-exporting all public symbols
src/pyomop/generic_loader.py — thin backward-compatibility shim (deprecated; re-exports from pyomop.migrate)
src/pyomop/mapping.generic.example.json — annotated example mapping (patients → person, encounters → visit_occurrence, diagnoses → condition_occurrence, lab results → measurement, medications → drug_exposure)

Mapping format

Extends the existing mapping.default.json convention with a source_table key per entry. Filters translate to SQL WHERE clauses (pushed to the source DB, not pandas masks).

{
  "tables": [
    {
      "source_table": "patients",
      "name": "person",
      "filters": [{"column": "active", "equals": 1}],
      "columns": {
        "person_id":           "id",
        "gender_source_value": "gender",
        "gender_concept_id":   {"const": 0},
        "year_of_birth":       {"const": 0},
        "race_concept_id":     {"const": 0},
        "ethnicity_concept_id":{"const": 0}
      }
    }
  ]
}

Same 5 post-load steps as CdmCsvLoader: person_id FK normalisation, birth-date backfill, gender concept mapping, concept-code lookups. Missing source/target tables warn and skip rather than abort.

`pyomop-migrate` CLI script

All migration functionality lives in a dedicated pyomop-migrate entry point, keeping main.py (pyomop) unchanged. Source-database connection details use --src-* options; the target OMOP CDM database uses the standard connection options.

# Migrate SQLite source → SQLite OMOP target
pyomop-migrate --migrate \
  --src-dbtype sqlite --src-name hospital.sqlite \
  --dbtype sqlite --name omop.sqlite \
  --mapping ehr_to_omop.json

# PostgreSQL source → PostgreSQL OMOP target
pyomop-migrate --migrate \
  --src-dbtype pgsql --src-host srchost --src-user reader --src-pw secret --src-name ehr \
  --dbtype pgsql --host omophost --user writer --pw secret --name omop \
  --mapping ehr_to_omop.json --batch-size 500

# Extract source database schema to Markdown
pyomop-migrate --extract-schema \
  --src-dbtype sqlite --src-name hospital.sqlite \
  --schema-output hospital_schema.md

`--extract-schema` command

Introspects the source database and writes a Markdown document with full schema information (table names, column names, data types, nullable flags, PK/FK relationships, and row counts). This output is designed to be fed to AI agents to auto-generate a mapping JSON file.

Environment variable support

All source database connection parameters can be supplied via environment variables instead of CLI flags, keeping credentials out of shell history:

Environment variable	CLI option
`SRC_DB_HOST`	`--src-host`
`SRC_DB_PORT`	`--src-port`
`SRC_DB_USER`	`--src-user`
`SRC_DB_PASSWORD`	`--src-pw`
`SRC_DB_NAME`	`--src-name`

This applies to both --migrate and --extract-schema.

Tests & docs

tests/test_pyomop_migrate.py — 26 unit tests covering row loading, gender/birth backfill, SQL filters, missing-table skip, multi-table mapping, batch correctness, CLI migrate, schema extraction (content, PK/FK, CLI end-to-end), URL building (all backends + env var override), and error cases
docs/pyomop_migrate.md — full usage, API reference, CLI option tables with examples, schema extraction section, and environment variable reference
notes/pyomop_migrate.md — design decisions, env var security model, schema extraction design, and future work
README.md — new quick-start section, schema extraction example, env var note, and updated command-line reference
__init__.py / mkdocs.yml / pyproject.toml — export, navigation, and entry point wired up

Original prompt

This section details on the original issue you should resolve

<issue_title>Generate a generic database loader for OMOP CDM.</issue_title>
<issue_description># Generate a generic database loader for OMOP CDM.

Description

Currently, loader.py loads a csv file generated from FHIR into any database with the OMOP CDM schema, using a mapping JSON file to map the CSV columns to the database columns.

Read loader.py carefully and understand how it works.

Now create a generic_loader.py that can connect to any database using SQLAlchemy and load the tables into the OMOP CDM schema, using a mapping JSON file. In other words, the source can be any database with any schema, and the target is any database with the OMOP CDM schema. The mapping JSON file will specify how to map the source database tables and columns to the target database tables and columns.

The generic_loader.py should be able to handle different types of databases (e.g. MySQL, PostgreSQL, SQLite) and should be able to load data in batches for efficiency.

The mapping JSON file should follow the same structure as the current mapping.default.json file.

The generic_loader.py should also have error handling and logging to track the loading process and any issues that arise.

Write unit tests for the generic_loader.py to ensure it works correctly.

Provide documentation on how to use the generic_loader.py, including how to create the mapping JSON file and how to run the loader. Brief instructions should go in the README.md file, and more detailed instructions should go in a separate documentation file, docs/generic_loader.md. Implementation details and future plans should go into notes/generic_loader.md.

Check if the new generic_loader.py aligns in functionality with the existing loader.py

Do a final pass to ensure the code is clean, of good quality, well-documented, and follows best practices.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Generate a generic database loader for OMOP CDM. #259

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

dermatologist · 2026-02-19T19:52:35Z

@copilot
Provide a --migrate command line option that will migrate data from the source database to the target database. All options for source and target database connections should be provided as command-line arguments. The mapping JSON file should also be provided as a command-line argument. The loader should connect to the source database, read the data according to the mapping, and then connect to the target database and load the data into the OMOP CDM schema. Update documentation as needed. Do a final pass to ensure the code is clean, of good quality, well-documented, and follows best practices.

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot · 2026-02-19T19:57:51Z

@copilot
Provide a --migrate command line option that will migrate data from the source database to the target database. All options for source and target database connections shoul...

Done in dc34df0. Added --migrate to the pyomop CLI with:

--src-dbtype / --src-host / --src-port / --src-user / --src-pw / --src-name / --src-schema for the source database connection
--mapping FILE (-m) for the JSON mapping file (required with --migrate)
--batch-size (default 1000) for INSERT batch tuning
Existing --dbtype / --host / --port / --user / --pw / --name / --schema for the OMOP CDM target

Example usage:

# SQLite → SQLite
pyomop --migrate \
  --src-dbtype sqlite --src-name hospital.sqlite \
  --dbtype sqlite --name omop.sqlite \
  --mapping ehr_to_omop.json

# PostgreSQL → PostgreSQL
pyomop --migrate \
  --src-dbtype pgsql --src-host srchost --src-user reader --src-pw secret --src-name ehr \
  --dbtype pgsql --host omophost --user writer --pw secret --name omop \
  --mapping ehr_to_omop.json --batch-size 500

Full CLI reference and examples added to docs/generic_loader.md and README.md. Three new CLI tests cover the end-to-end flow, the missing-mapping error, and the unsupported source DB type error — all 13 tests pass.

codecov · 2026-02-21T15:30:00Z

Codecov Report

❌ Patch coverage is 64.96914% with 227 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.3%. Comparing base (e9265ad) to head (b69cd5e).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/pyomop/migrate/pyomop_migrate.py	64.8%	187 Missing and 40 partials ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##           develop    #260     +/-   ##
=========================================
- Coverage     77.1%   75.3%   -1.8%     
=========================================
  Files           19      21      +2     
  Lines         3334    3982    +648     
  Branches       268     384    +116     
=========================================
+ Hits          2572    3002    +430     
- Misses         667     836    +169     
- Partials        95     144     +49

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dermatologist · 2026-02-21T15:39:21Z

@copilot

Add a new feature to generic_loader.py that will introspect the source database and generate a markdown file with schema information, including table names, column names, data types and any relationships between tables. Also include PK/FK information if available. This will help AI agents to understand the source database schema and generate the appropriate mapping JSON file for the migration process.
Add an --extract-schema command line option that will trigger this functionality. The generated markdown file should be saved to a specified location provided as a command-line argument. All options for source database connection should be provided as command-line arguments.
Update generic_loader.py to look for environment variables for database connection parameters (e.g. DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, DB_NAME) first, as an alternative to command line arguments for better security and flexibility. Update the documentation to reflect this new option for database connection parameters. This should apply to both the --migrate and --extract-schema functionalities.
Generate appropriate unit tests for this new feature to ensure it works correctly. Improve coverage for the existing code as well.
Provide documentation on how to use the --extract-schema option, including how to specify the source database connection and the output location for the generated markdown file in the README.md file and in a separate documentation file docs/generic_loader.md. Implementation details and future plans should go into notes/generic_loader.md.
Do a final pass to ensure the code is clean, of good quality, well-documented, and follows best practices.

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot · 2026-02-21T15:45:33Z

@copilot

Add a new feature to generic_loader.py that will introspect the source database and generate a markdown file with schema information, including table names, column names, data types and an...

Done in e3a0515. Here's what was added:

Schema extraction (--extract-schema)

extract_schema_to_markdown(engine, output_path) in generic_loader.py introspects the source DB and writes a Markdown file containing a summary table (table name, row count, primary keys) and per-table sections with column name, type, nullable, default, PK/FK annotations, and a foreign key relationship table.
--extract-schema / --schema-output CLI options added to pyomop:

pyomop --extract-schema \
  --src-dbtype sqlite --src-name hospital.sqlite \
  --schema-output hospital_schema.md

Environment variable support

build_source_url() helper reads SRC_DB_HOST, SRC_DB_PORT, SRC_DB_USER, SRC_DB_PASSWORD, SRC_DB_NAME as fallbacks, so credentials never need to appear in shell history.
Click envvar= is also set on each --src-* option so they're resolved at the CLI level too.
Both --migrate and --extract-schema benefit from this.

export SRC_DB_HOST=db.hospital.org
export SRC_DB_USER=readonly
export SRC_DB_PASSWORD=secret
export SRC_DB_NAME=ehr_db

pyomop --extract-schema --src-dbtype pgsql --schema-output ehr_schema.md

Tests: 9 new tests covering schema Markdown content, PK/FK rendering, the --extract-schema CLI end-to-end, bad dbtype error, and all URL-building paths including env var override. Total: 22 → 26 tests, all passing.

…ails for generic database loader

dermatologist · 2026-02-22T17:47:58Z

@copilot

Create a new module in src/pyomop/migrate and move the generic_loader.py to that module. This will help to organize the code better and separate the migration functionality.
Rename generic_loader.py to pyomop_migrate.py.
Create a new command-line script called pyomop-migrate and move the --migrate and --extract-schema command line arguments from main.py to this script.
The main.py should be reverted to its original state before the migration functionality was added (revert main.py to the state in the develop branch).
The new commands will be pyomop-migrate --migrate and pyomop-migrate --extract-schema.
Rename generic_loader.md to pyomop_migrate.md to reflect the new module and command-line script name. Change links in mkdocs.yml to the new file name as well.
Update the documentation to reflect the new module and command-line script in docs/pyomop_migrate.md and notes/pyomop_migrate.md. Make required name changes only in README.md to reflect the new command-line script name.
Update unit tests as required.
Do a final pass to ensure the code is clean, of good quality, well-documented, and follows best practices.

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot · 2026-02-22T17:56:09Z

@copilot

Create a new module in src/pyomop/migrate and move the generic_loader.py to that module. This will help to organize the code better and separate the migration functionality.

Rename gene...

Done in 117a1b5. Here's what was changed:

New module src/pyomop/migrate/

pyomop_migrate.py — the full ETL code (previously generic_loader.py) plus the new migrate_cli Click command and main() entry point
__init__.py — re-exports CdmGenericLoader, build_source_url, create_source_engine, extract_schema_to_markdown, load_mapping

pyomop-migrate CLI script
Registered as an entry point in pyproject.toml. Commands:

pyomop-migrate --migrate --src-dbtype sqlite --src-name hospital.sqlite \
  --dbtype sqlite --name omop.sqlite --mapping mapping.json

pyomop-migrate --extract-schema --src-dbtype sqlite --src-name hospital.sqlite \
  --schema-output schema.md

main.py reverted to its original state (no --migrate/--extract-schema options).

src/pyomop/generic_loader.py replaced with a thin deprecation shim that re-exports from pyomop.migrate.pyomop_migrate to preserve backward compatibility.

Renamed files:

docs/generic_loader.md → docs/pyomop_migrate.md
notes/generic_loader.md → notes/pyomop_migrate.md
tests/test_generic_loader.py → tests/test_pyomop_migrate.py (imports updated to use pyomop.migrate.pyomop_migrate)

mkdocs.yml nav, __init__.py import, and README.md command references all updated accordingly. All 26 tests pass.

…ings

…mentation

Initial plan

9837c2b

Copilot AI assigned Copilot and dermatologist Feb 19, 2026

Copilot started work on behalf of dermatologist February 19, 2026 13:37 View session

Add CdmGenericLoader for database-to-OMOP ETL

241a13d

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add generic database loader for OMOP CDM~~ Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping Feb 19, 2026

Copilot AI requested a review from dermatologist February 19, 2026 13:49

Copilot finished work on behalf of dermatologist February 19, 2026 13:50

Copilot started work on behalf of dermatologist February 19, 2026 19:53 View session

Add --migrate CLI option for database-to-OMOP ETL

dc34df0

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot AI changed the title ~~Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping~~ Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and --migrate CLI Feb 19, 2026

Copilot finished work on behalf of dermatologist February 19, 2026 19:59

Copilot started work on behalf of dermatologist February 21, 2026 15:39 View session

Add --extract-schema CLI, env var support, build_source_url helper

e3a0515

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot AI changed the title ~~Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and --migrate CLI~~ Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping, --migrate and --extract-schema CLI Feb 21, 2026

Copilot finished work on behalf of dermatologist February 21, 2026 15:46

Revise README to include Agent Assisted ETL and schema extraction det…

3254d21

…ails for generic database loader

Copilot started work on behalf of dermatologist February 22, 2026 17:48 View session

Reorganize into pyomop.migrate module with pyomop-migrate CLI script

117a1b5

Co-authored-by: dermatologist <6259644+dermatologist@users.noreply.github.com>

Copilot AI changed the title ~~Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping, --migrate and --extract-schema CLI~~ Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and pyomop-migrate CLI Feb 22, 2026

Copilot finished work on behalf of dermatologist February 22, 2026 17:57

dermatologist added 2 commits February 22, 2026 17:18

Remove deprecated pyomop.generic_loader module and update import warn…

50c4b5e

…ings

Remove references to --migrate and --extract-schema commands in README

6ab664d

dermatologist marked this pull request as ready for review February 22, 2026 23:24

Remove installation and quick start sections from pyomop_migrate docu…

b69cd5e

…mentation

dermatologist approved these changes Feb 22, 2026

View reviewed changes

dermatologist approved these changes Feb 23, 2026

View reviewed changes

dermatologist merged commit 212437e into develop Feb 23, 2026
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and `pyomop-migrate` CLI#260

Add CdmGenericLoader: database-to-OMOP CDM ETL via JSON mapping and `pyomop-migrate` CLI#260
dermatologist merged 9 commits intodevelopfrom
copilot/add-generic-database-loader

Copilot AI commented Feb 19, 2026 •

edited

Loading

Uh oh!

dermatologist commented Feb 19, 2026

Uh oh!

Copilot AI commented Feb 19, 2026

Uh oh!

codecov bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

dermatologist commented Feb 21, 2026

Uh oh!

Copilot AI commented Feb 21, 2026

Uh oh!

dermatologist commented Feb 22, 2026

Uh oh!

Copilot AI commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core implementation

Mapping format

pyomop-migrate CLI script

--extract-schema command

Environment variable support

Tests & docs

Description

Comments on the Issue (you are @copilot in this section)

Uh oh!

dermatologist commented Feb 19, 2026

Uh oh!

Copilot AI commented Feb 19, 2026

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dermatologist commented Feb 21, 2026

Uh oh!

Copilot AI commented Feb 21, 2026

Uh oh!

dermatologist commented Feb 22, 2026

Uh oh!

Copilot AI commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 19, 2026 •

edited

Loading

`pyomop-migrate` CLI script

`--extract-schema` command

codecov bot commented Feb 21, 2026 •

edited

Loading