Skip to content

Optional gams.transfer backend for read + write (opt-in, v2.1.0)#111

Merged
elainethale merged 19 commits into
mainfrom
eh/gams-transfer
May 23, 2026
Merged

Optional gams.transfer backend for read + write (opt-in, v2.1.0)#111
elainethale merged 19 commits into
mainfrom
eh/gams-transfer

Conversation

@elainethale

Copy link
Copy Markdown
Member

What

Adds an opt-in, switchable gams.transfer backend alongside the existing gdxcc path for both reading and writing GDX files. gdxcc stays the default and the correctness oracle; nothing changes for existing callers.

Select per call or globally:

to_dataframes(path, backend="gams_transfer")
to_gdx(dfs, out, backend="gams_transfer")
# or set GDXPDS_BACKEND=gams_transfer

Why

On a real ReEDS-sized GDX, gams.transfer is ~87x faster for both read and write (read 776.6s -> 9.0s; write 1002.0s -> 11.5s in the spike). That magnitude justifies carrying a second backend behind a switch so the speedup can be documented/tested before any future default flip.

How (symmetric architecture)

Rather than bolt a transfer module onto a gdx.py that doubles as "the gdxcc backend", the gdxcc logic was first extracted behind a shared interface:

  • gdx.py -> backend-agnostic interface + data model (GdxFile/GdxSymbol delegate I/O to a _backend_impl)
  • _backend.py -> Backend enum, GdxBackend ABC, resolve_backend, make_backend, DEFAULT_BACKEND
  • _gdxcc_backend.py -> today's gdxcc read/write/handle logic, extracted
  • _transfer_backend.py -> the gams.transfer read + write translation

Staged across the commits on this branch: Phase 0 (extract gdxcc + unify set-text reads) -> Step 1 (backend selection plumbing + new to_dataframes(symbols=[...]) subset feature) -> Phase A (transfer read) -> Aliases read as Sets in both backends -> Phase B (transfer write).

Notable details:

  • Value columns derive from value_col_names (GamsValueType) on both backends -- one source of truth, no parallel hard-coded list.
  • Special values disambiguated via gt sentinels (isNA/isUndef/isEps) to preserve the gdxcc NA/UNDEF/EPS distinctions byte-for-byte.
  • v2.1.0 keeps strict parity: no set-text-write (empty element_text, matching gdxcc's membership-boolean behavior). Writing aliases via transfer raises clearly (to_gdx never infers one).
  • HAVE_GAMS_TRANSFER is a functional probe (constructs a Container), so transfer-dependent tests skip cleanly where gamsapi is present but cannot load the active GAMS libraries.

Testing

  • New tests/test_backend_parity.py: read parity over every in-tree fixture (x load_set_text), special-value parity, symbol-subset parity, and write parity over the full write x read backend matrix (both engines write, both read each output) -- including transfer-write/transfer-read.
  • R12 mixed-boolean set-write gate + an alias-write NotImplementedError test (the one write branch parity cannot reach).
  • New tests/test_backend_selection.py: enum / resolution / precedence / info() / exceptions.
  • Full suite green on two GAMS installs: GAMS 53 = 115 passed; GAMS 48 = 86 passed, 29 skipped. pyright: 0 errors.

Remaining v2.1.0 steps (follow-up, not in this PR)

  • Promote the spike to dev/benchmarks/ and capture the speedup table.
  • Bump __version__ -> 2.1.0, CHANGES.txt, docs note for the opt-in switch.

The v3.0.0 default-flip + set-text-write are deliberately deferred (one coordinated breaking release); the v2.1.0 code isolates the empty-element_text emission and the DEFAULT_BACKEND indirection so that flip is a near one-spot change.

🤖 Generated with Claude Code

elainethale and others added 15 commits May 21, 2026 10:07
…cstring

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e + v3.0.0

Expand the v2.1.0 plan from a read-only accelerator to a read+write
gams.transfer fast path behind an opt-in, switchable backend:

- Symmetric architecture: a GdxBackend ABC (read primitive load_symbols,
  with load_file/load_symbol conveniences; plus write_file/close); gdxcc
  logic extracted to _gdxcc_backend.py first (Phase 0), gams.transfer as a
  sibling _transfer_backend.py.
- Backend selection via a public Backend str-enum + GDXPDS_BACKEND env var;
  single DEFAULT_BACKEND constant (no 'auto').
- Set-text reads unified into load_file/load_symbol (Phase 0).
- New non-breaking feature: to_dataframes(symbols=[...]) subset read; one
  targeted c.read(symbols=...) on gams.transfer.
- Record the evaluation-spike gate as satisfied (~87x on a ReEDS-sized GDX).
- Add a v3.0.0 entry bundling the default-flip with set-text-write, and
  point the set-membership wart at it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First slice of the Phase 0 backend extraction. Introduces the engine-agnostic
seam without changing behavior (full suite still 65 passed):

- _backend.py: Backend StrEnum, DEFAULT_BACKEND, the GdxBackend ABC (abstract
  load_symbols primitive over symbol objects, with load_file/load_symbol
  conveniences), and make_backend.
- _gdxcc_backend.py: GdxccBackend.load_symbols, the gdxDataReadStr record loop
  (incl. set-text and special-value conversion) moved out of GdxSymbol.load.
- gdx.py: GdxFile builds self._backend_impl; GdxSymbol.load delegates to it.

The GDX handle and metadata reads stay on GdxFile for now; later slices move
open_read, write_file, and handle ownership/close behind the same ABC.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Second slice of Phase 0. Extracts file/symbol metadata reading behind the
backend ABC, leaving GdxSymbol.__init__ backend-agnostic. Behavior unchanged
(65 passed).

- _backend.py: add abstract open_read to GdxBackend.
- _gdxcc_backend.py: open_read (gdxOpenRead + gdxFileVersion/gdxSystemInfo +
  the symbol-creation loop + resolve_domain pass) and a _make_symbol helper
  that performs the per-symbol gdxSymbolInfoX/gdxSymbolGetDomainX reads that
  previously lived in GdxSymbol.__init__.
- gdx.py: GdxSymbol.__init__ no longer touches file.H (writing symbols load
  immediately; read symbols stay lazy until the backend populates them);
  _num_records is initialized in the constructor. GdxFile.read delegates to
  open_read, and the non-lazy branch now calls backend.load_file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final structural slice of Phase 0. The gdxcc handle lifecycle and the write
path now live in the backend; gdx.py is left as interface + data model.
Behavior unchanged (65 passed, incl. test_handle_lifecycle).

- _backend.py: add handle property (default None), abstract write_file,
  a default write_symbol that raises NotImplementedError, and abstract close.
- _gdxcc_backend.py: own the _GdxHandle (created in __init__ via load_gdxcc),
  expose handle + run-once close(); implement write_file (gdxOpenWrite ->
  per-symbol write -> gdxClose) and write_symbol (the former GdxSymbol.write
  body, incl. universal-set UEL register and convert_np_to_gdx_svs).
- gdx.py: GdxFile no longer holds _H/_handle/_create_gdx_object; the finalizer
  retargets to backend.close (still a non-self bound method, preserving
  GC-time finalization); GdxFile.H delegates to backend.handle and is documented
  as a gdxcc-specific escape hatch (v3.0.0 removal candidate). GdxFile.write and
  GdxSymbol.write are thin delegations. Drop now-unused imports (numbers.Number,
  the special alias, _GdxHandle, load_gdxcc).
- test_handle_lifecycle: assert the public f.H is None after cleanup (the
  private _H moved to the backend).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Set-text reads stop being a special lazy-loop in read_gdx.py and become an
ordinary parameter on the eager bulk-load path, finishing Phase 0.

- gdx.py: add GdxFile.load_all(*, load_set_text=False) delegating to
  backend.load_file; read()'s non-lazy branch now calls self.load_all() so
  there is a single 'load every symbol' entry point.
- read_gdx.py: collapse the two asymmetric to_dataframes paths (eager vs the
  set-text lazy per-symbol loop) into one eager _get_dataframes(load_set_text)
  that calls load_all; the dataframes property delegates to it. to_dataframe
  (single symbol) is unchanged.
- tests: add test_set_element_text_to_dataframes covering the plural
  to_dataframes(load_set_text=...) path, which previously had no coverage and
  is what this refactor rewrote. 66 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ring

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plumb the gams.transfer-era backend switch and the new subset feature, all
still defaulting to gdxcc (gams.transfer construction lands in Phase A).

- Capability: lazy HAVE_GAMS_TRANSFER probe in tools (PEP 562 __getattr__),
  re-exported from gdxpds; info() reports the gams.transfer binding and the
  resolved default backend.
- Selection: resolve_backend (kwarg -> GDXPDS_BACKEND -> DEFAULT_BACKEND;
  capability-gated; no silent fallback) and make_backend wired for both kinds
  (GAMS_TRANSFER raises 'not yet implemented' until Phase A; assert_never
  guards exhaustiveness).
- API: backend= kwarg threaded through GdxFile (+ clone), both Translators,
  and to_dataframes/to_dataframe/list_symbols/get_data_types/
  get_subset_relationships/to_gdx. Backend is public.
- Subset feature: GdxFile.load_symbols(names) + to_dataframes(symbols=[...]);
  one targeted read on gams.transfer later, a per-symbol loop on gdxcc now.
- Exceptions: add BackendError and SymbolNotFoundError (both subclass Error,
  so existing 'except Error' still catches them); SymbolNotFoundError used
  uniformly in GdxFile.load_symbols and the existing Translator.dataframe.
- Tests: new tests/test_backend_selection.py (capability, resolution+env
  precedence, info, exception types, subset, and backend= threading across
  every read helper and to_gdx). 85 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first behavioral use of the second backend: an opt-in gams.transfer read
fast path whose output matches the gdxcc oracle.

- _transfer_backend.py: TransferBackend(GdxBackend) implementing open_read
  (metadata-only Container, so list_symbols/get_data_types stay cheap),
  load_symbols (bulk reads + caches one full container; subset does a targeted
  c.read(symbols=...) in a fresh container), close, and per-symbol translation
  for Set/Parameter/Variable/Equation:
    * categorical domain columns -> str; positional column mapping to dims;
    * Set membership = (element_text != '') -> c_bool, or element text when
      load_set_text=True;
    * variable/equation .type strings -> GamsVariableType/GamsEquationType;
    * special values -> gdxpds canonical (EPS -> machine eps, NA/UNDEF -> nan,
      +/-inf pass through; genuine 0.0 preserved).
  write_file raises NotImplementedError (Phase B). Alias raises
  NotImplementedError pending a fixture (no GDX fixture has an alias and to_gdx
  can't create one).
- _backend.py: make_backend now constructs TransferBackend for GAMS_TRANSFER.
- tests/test_backend_parity.py: read parity (gdxcc vs gams_transfer) over every
  in-tree fixture x load_set_text (same symbols, same order, identical frames),
  a special-value round-trip, and the subset path. 101 passed overall.

R13 settled: gams.transfer exposes no file version/producer or pre-load record
count, so those stay None/0 and are not asserted cross-backend.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A gamsapi whose build does not match the active GAMS install imports fine but
fails to load its GAMS shared libraries (e.g. gmdcclib) when a gams.transfer
Container is constructed. The old import-only probe reported the fast path as
available in that case, so backend-parity tests ran and crashed instead of
skipping (seen on a GAMS 48 venv).

- _probe_gams_transfer now constructs a Container against the resolved GAMS
  directory to confirm the engine actually works; any failure (missing,
  version-skewed, or GAMS not found) reads as not available. Still cached and
  lazy.
- info() reports 'gams.transfer usable: yes/no' alongside the importability line
  so the version-skew case is diagnosable.
- resolve_backend's error message now says 'not usable here' rather than 'not
  importable'.

Net: HAVE_GAMS_TRANSFER-gated tests skip cleanly where gams.transfer is present
but unusable (GAMS 48 venv: 86 passed, 18 skipped), and stay green where it
works (GAMS 53: 104 passed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Legacy gdxpds read an Alias as a degenerate float column (raw stored values,
no membership fixup, no special-value handling) -- an untested, effectively
unused path. An alias is just another name for a set, so gdxpds now reads it
like the set it aliases (c_bool membership / element text), consistently across
both backends.

- gdx.py (shared data model): treat GamsDataType.Alias like Set in the
  dataframe setter, _init_dataframe, _fixup_set_value, and get_value_col_default
  -- so both backends inherit Set-like alias reads.
- _gdxcc_backend: aliases take the Set set-text path (the special-value bypass
  already excluded them); an alias now reads c_bool membership instead of 0.0.
- _transfer_backend: aliases route through the Set translation (gams.transfer
  delegates an alias's .records to its parent set); the prior NotImplementedError
  is gone.
- dev/build_alias_fixture.py + tests/data/alias_fixture.gdx: a parent Set t and
  an Alias at (built via raw gdxAddAlias, since to_gdx can't emit aliases).
- tests: test_alias_reads_like_its_set asserts at reads like t (membership,
  Alias data type); the globbed test_backend_parity guarantees the gams_transfer
  read matches gdxcc. 104 passed (GAMS 53).

Alias deliberately mirrors Set's *current* behavior, including the membership-
boolean/set-text wart, slated to be fixed for Set and Alias together in v3.0.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t in ROADMAP

- v2.1.0 status block: Phase 0, Step 1, and Phase A (read backend) landed on
  eh/gams-transfer; Phase B (write) remaining.
- Record the two read-side decisions: aliases read as Sets (both backends), and
  HAVE_GAMS_TRANSFER means usable (not just importable).
- Known-warts membership entry + v3.0.0 candidate now note aliases share the
  Set membership wart (fixed together).
- PR/branch map reflects read-path completion.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TransferBackend.write_file builds a gams.transfer Container from the loaded
gdxpds symbols (the inverse of the Phase A read translation) and writes it:

- value columns derive from value_col_names (GamsValueType), the same source
  the gdxcc backend uses -- no second hard-coded list to keep in sync
- inverse special-value mapping (eps -> EPS, NaN -> NA; +/-inf unchanged)
- empty element_text for Sets (strict parity with gdxcc: no set-text-write
  in v2.1.0; the membership-boolean wart collapses sets to all-False)
- strict/relaxed domain choice mirrors the gdxcc write path
- writing aliases is unsupported (to_gdx never infers one); raises clearly

Tests: write parity now covers the full write x read backend matrix (both
engines write, both read each output) including transfer-write/transfer-read;
plus the R12 mixed-boolean set-write gate and an alias-write NotImplementedError
test for the one write branch parity cannot reach.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds tests/test_backend_timing.py: times read+write for both backends over
every in-tree fixture (sub-3 KB up to ~1.9 MB) and records the results. A
pytest_terminal_summary hook in conftest renders a size-sorted table plus a
clear-winner / switchover note at the end of the run.

These are not pass/fail performance gates (timings are machine-dependent and
measured as a single min-of-3 burst); the only assertion is that both backends
ran. The robust finding: gdxcc wins on small files (transfer's Container
overhead is not amortized) while gams.transfer wins on large ones (~2x read,
~4x write at 1.9 MB), with the write switchover around 100-300 KB.

Runs by default; skipped only when gams.transfer is unavailable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a switchable backend architecture for GDX I/O, introducing an opt-in gams.transfer engine alongside the existing gdxcc path while keeping gdxcc as the default/correctness oracle.

Changes:

  • Refactors GdxFile/GdxSymbol to delegate read/write/handle lifecycle to pluggable backend implementations (gdxcc and new gams.transfer).
  • Adds backend selection plumbing (Backend enum, env var resolution, new exceptions) and a to_dataframes(symbols=[...]) subset-read feature.
  • Expands test coverage with cross-backend read/write parity tests plus backend selection and (non-gating) timing reporting.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_write.py Updates wording of the round-trip write test docstring.
tests/test_read.py Adds coverage for eager-path set text and alias-as-set read behavior.
tests/test_handle_lifecycle.py Updates lifecycle assertion to use the new GdxFile.H delegation.
tests/test_backend_timing.py Adds (skippable) timing collection for gdxcc vs gams.transfer.
tests/test_backend_selection.py Adds tests for backend resolution, exceptions, and symbol-subset reads.
tests/test_backend_parity.py Adds read/write parity tests across the full backend matrix.
tests/conftest.py Adds session timing aggregation + terminal summary rendering.
src/gdxpds/write_gdx.py Threads backend= through to_gdx() and writer GdxFile creation.
src/gdxpds/tools.py Adds lazy HAVE_GAMS_TRANSFER probing and reports transfer usability in info().
src/gdxpds/read_gdx.py Threads backend= into read helpers; adds symbols= subset support in to_dataframes().
src/gdxpds/gdx.py Extracts backend-agnostic model; adds backend delegation + SymbolNotFoundError; adds eager load helpers.
src/gdxpds/_transfer_backend.py Implements gams.transfer read/write translation backend.
src/gdxpds/_gdxcc_backend.py Extracts the existing gdxcc logic into a backend implementation.
src/gdxpds/_backend.py Introduces backend enum, resolution logic, and the GdxBackend interface.
src/gdxpds/init.py Re-exports backend API and lazily exposes HAVE_GAMS_TRANSFER.
dev/ROADMAP.md Updates roadmap to reflect read+write transfer backend plan/status.
dev/build_alias_fixture.py Adds fixture generator for alias behavior tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/gdxpds/_transfer_backend.py Outdated
Comment thread src/gdxpds/_transfer_backend.py Outdated
Comment thread src/gdxpds/tools.py Outdated
Comment thread src/gdxpds/read_gdx.py
Comment thread src/gdxpds/write_gdx.py
elainethale and others added 2 commits May 22, 2026 09:57
… together

Add a -b/--backend flag to csv_to_gdx and gdx_to_csv (choices: gdxcc,
gams_transfer), defaulting to None so the GDXPDS_BACKEND env var / gdxcc
fallback still applies. The flag routes to to_gdx / to_dataframes; a
parametrized round-trip test exercises it through both CLIs.

Docs: replace the standalone engine note with a single Configuration
section in the overview that explains the two parallel runtime choices --
GAMS install location (gams_dir) and I/O engine (backend) -- together, since
each is set the same three ways (keyword, environment variable, CLI flag).
Add a concise Configure subsection to the README install steps that points
to it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es) + ruff format

Read parity (Copilot #2): _convert_transfer_specials now preserves the gdxcc
oracle's UNDEF/NA distinction -- GDX UNDEF -> None (object column), GDX NA ->
np.nan -- instead of collapsing both to NaN. New test_read_parity_undef builds a
genuine UNDEF via gams.transfer (gdxcc cannot write one).

Write parity (Copilot #1): the comment's premise was wrong -- gdxcc's write path
drops UNDEF (None -> 0.0, since None isn't a Number), so it never preserves it.
For strict v2.1.0 parity, _np_to_transfer_specials now maps None -> 0.0 (matching
the oracle) rather than to NA. New test_write_parity_undef locks this in.
Documented as a v3.0.0 candidate in ROADMAP.md (gams.transfer could emit a real
UNDEF, but that diverges from the oracle).

Probe (Copilot #3): _probe_gams_transfer accepts an explicit gams_dir and probes
it fresh/uncached, so info(gams_dir=...) reports that install and a transient
failure can't poison the cached default.

Types (Copilot #4/#5): widen public backend annotations from str | None to
str | Backend | None across read_gdx/write_gdx.

ruff format: collapse hand-wrapped lines in _gdxcc_backend.py and gdx.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.

Comment thread src/gdxpds/gdx.py Outdated
Comment thread dev/build_alias_fixture.py Outdated
Comment thread src/gdxpds/_gdxcc_backend.py Outdated
Comment thread src/gdxpds/_backend.py Outdated
Comment thread tests/conftest.py
elainethale and others added 2 commits May 23, 2026 14:42
Resolve the five round-2 Copilot review comments, plus the deeper checks
they warranted:

- gdx.py: load_symbols docstring named the wrong exception
  (SymbolNotFoundError, not gdxpds.tools.Error). Also documented the same
  exception on to_dataframe (was undocumented, unlike to_dataframes).
- dev fixture builders: pin backend="gdxcc" so GDXPDS_BACKEND can't
  redirect the raw-f.H scripts (build_alias_fixture, build_set_text_fixture)
  to the transfer backend (where f.H is None); pin the high-level write
  builders too so committed fixtures stay deterministic vs. the oracle.
- _gdxcc_backend.py: rewrite stale module/class docstrings to the current
  handle-ownership + full-contract reality.
- _backend.py: open_read no longer overstates _version/_producer ("where
  available"); fix stale module + GdxBackend docstrings; resolve_backend
  documents BackendError; write_symbol "single-symbol path".
- _transfer_backend.py: title/class docstring now say read + write; drop
  the Phase A/B dev-sequence labels.
- conftest.py: _crossover_note guards first==0 / non-monotonic timings so
  it can't report a bogus switchover band (the read op actually hits this).

Docstrings now describe present behavior, not the refactor that produced
the code. ruff + pyright clean; full suite 127 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- __version__ 2.0.0 -> 2.1.0 (single source; pyproject reads it dynamically)
- CHANGES.txt: v2.1.0 entry (gams.transfer opt-in read/write backend,
  backend= / GDXPDS_BACKEND / --backend, to_dataframes(symbols=),
  Backend / BackendError / SymbolNotFoundError, HAVE_GAMS_TRANSFER)
- ROADMAP: mark v2.1.0 Phase B done and the release prep complete

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elainethale elainethale merged commit d40a2bd into main May 23, 2026
3 checks passed
@elainethale elainethale deleted the eh/gams-transfer branch May 23, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants