Optional gams.transfer backend for read + write (opt-in, v2.1.0)#111
Merged
Conversation
…cstring Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e + v3.0.0 Expand the v2.1.0 plan from a read-only accelerator to a read+write gams.transfer fast path behind an opt-in, switchable backend: - Symmetric architecture: a GdxBackend ABC (read primitive load_symbols, with load_file/load_symbol conveniences; plus write_file/close); gdxcc logic extracted to _gdxcc_backend.py first (Phase 0), gams.transfer as a sibling _transfer_backend.py. - Backend selection via a public Backend str-enum + GDXPDS_BACKEND env var; single DEFAULT_BACKEND constant (no 'auto'). - Set-text reads unified into load_file/load_symbol (Phase 0). - New non-breaking feature: to_dataframes(symbols=[...]) subset read; one targeted c.read(symbols=...) on gams.transfer. - Record the evaluation-spike gate as satisfied (~87x on a ReEDS-sized GDX). - Add a v3.0.0 entry bundling the default-flip with set-text-write, and point the set-membership wart at it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First slice of the Phase 0 backend extraction. Introduces the engine-agnostic seam without changing behavior (full suite still 65 passed): - _backend.py: Backend StrEnum, DEFAULT_BACKEND, the GdxBackend ABC (abstract load_symbols primitive over symbol objects, with load_file/load_symbol conveniences), and make_backend. - _gdxcc_backend.py: GdxccBackend.load_symbols, the gdxDataReadStr record loop (incl. set-text and special-value conversion) moved out of GdxSymbol.load. - gdx.py: GdxFile builds self._backend_impl; GdxSymbol.load delegates to it. The GDX handle and metadata reads stay on GdxFile for now; later slices move open_read, write_file, and handle ownership/close behind the same ABC. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Second slice of Phase 0. Extracts file/symbol metadata reading behind the backend ABC, leaving GdxSymbol.__init__ backend-agnostic. Behavior unchanged (65 passed). - _backend.py: add abstract open_read to GdxBackend. - _gdxcc_backend.py: open_read (gdxOpenRead + gdxFileVersion/gdxSystemInfo + the symbol-creation loop + resolve_domain pass) and a _make_symbol helper that performs the per-symbol gdxSymbolInfoX/gdxSymbolGetDomainX reads that previously lived in GdxSymbol.__init__. - gdx.py: GdxSymbol.__init__ no longer touches file.H (writing symbols load immediately; read symbols stay lazy until the backend populates them); _num_records is initialized in the constructor. GdxFile.read delegates to open_read, and the non-lazy branch now calls backend.load_file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Final structural slice of Phase 0. The gdxcc handle lifecycle and the write path now live in the backend; gdx.py is left as interface + data model. Behavior unchanged (65 passed, incl. test_handle_lifecycle). - _backend.py: add handle property (default None), abstract write_file, a default write_symbol that raises NotImplementedError, and abstract close. - _gdxcc_backend.py: own the _GdxHandle (created in __init__ via load_gdxcc), expose handle + run-once close(); implement write_file (gdxOpenWrite -> per-symbol write -> gdxClose) and write_symbol (the former GdxSymbol.write body, incl. universal-set UEL register and convert_np_to_gdx_svs). - gdx.py: GdxFile no longer holds _H/_handle/_create_gdx_object; the finalizer retargets to backend.close (still a non-self bound method, preserving GC-time finalization); GdxFile.H delegates to backend.handle and is documented as a gdxcc-specific escape hatch (v3.0.0 removal candidate). GdxFile.write and GdxSymbol.write are thin delegations. Drop now-unused imports (numbers.Number, the special alias, _GdxHandle, load_gdxcc). - test_handle_lifecycle: assert the public f.H is None after cleanup (the private _H moved to the backend). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Set-text reads stop being a special lazy-loop in read_gdx.py and become an ordinary parameter on the eager bulk-load path, finishing Phase 0. - gdx.py: add GdxFile.load_all(*, load_set_text=False) delegating to backend.load_file; read()'s non-lazy branch now calls self.load_all() so there is a single 'load every symbol' entry point. - read_gdx.py: collapse the two asymmetric to_dataframes paths (eager vs the set-text lazy per-symbol loop) into one eager _get_dataframes(load_set_text) that calls load_all; the dataframes property delegates to it. to_dataframe (single symbol) is unchanged. - tests: add test_set_element_text_to_dataframes covering the plural to_dataframes(load_set_text=...) path, which previously had no coverage and is what this refactor rewrote. 66 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ring Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plumb the gams.transfer-era backend switch and the new subset feature, all still defaulting to gdxcc (gams.transfer construction lands in Phase A). - Capability: lazy HAVE_GAMS_TRANSFER probe in tools (PEP 562 __getattr__), re-exported from gdxpds; info() reports the gams.transfer binding and the resolved default backend. - Selection: resolve_backend (kwarg -> GDXPDS_BACKEND -> DEFAULT_BACKEND; capability-gated; no silent fallback) and make_backend wired for both kinds (GAMS_TRANSFER raises 'not yet implemented' until Phase A; assert_never guards exhaustiveness). - API: backend= kwarg threaded through GdxFile (+ clone), both Translators, and to_dataframes/to_dataframe/list_symbols/get_data_types/ get_subset_relationships/to_gdx. Backend is public. - Subset feature: GdxFile.load_symbols(names) + to_dataframes(symbols=[...]); one targeted read on gams.transfer later, a per-symbol loop on gdxcc now. - Exceptions: add BackendError and SymbolNotFoundError (both subclass Error, so existing 'except Error' still catches them); SymbolNotFoundError used uniformly in GdxFile.load_symbols and the existing Translator.dataframe. - Tests: new tests/test_backend_selection.py (capability, resolution+env precedence, info, exception types, subset, and backend= threading across every read helper and to_gdx). 85 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first behavioral use of the second backend: an opt-in gams.transfer read
fast path whose output matches the gdxcc oracle.
- _transfer_backend.py: TransferBackend(GdxBackend) implementing open_read
(metadata-only Container, so list_symbols/get_data_types stay cheap),
load_symbols (bulk reads + caches one full container; subset does a targeted
c.read(symbols=...) in a fresh container), close, and per-symbol translation
for Set/Parameter/Variable/Equation:
* categorical domain columns -> str; positional column mapping to dims;
* Set membership = (element_text != '') -> c_bool, or element text when
load_set_text=True;
* variable/equation .type strings -> GamsVariableType/GamsEquationType;
* special values -> gdxpds canonical (EPS -> machine eps, NA/UNDEF -> nan,
+/-inf pass through; genuine 0.0 preserved).
write_file raises NotImplementedError (Phase B). Alias raises
NotImplementedError pending a fixture (no GDX fixture has an alias and to_gdx
can't create one).
- _backend.py: make_backend now constructs TransferBackend for GAMS_TRANSFER.
- tests/test_backend_parity.py: read parity (gdxcc vs gams_transfer) over every
in-tree fixture x load_set_text (same symbols, same order, identical frames),
a special-value round-trip, and the subset path. 101 passed overall.
R13 settled: gams.transfer exposes no file version/producer or pre-load record
count, so those stay None/0 and are not asserted cross-backend.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A gamsapi whose build does not match the active GAMS install imports fine but fails to load its GAMS shared libraries (e.g. gmdcclib) when a gams.transfer Container is constructed. The old import-only probe reported the fast path as available in that case, so backend-parity tests ran and crashed instead of skipping (seen on a GAMS 48 venv). - _probe_gams_transfer now constructs a Container against the resolved GAMS directory to confirm the engine actually works; any failure (missing, version-skewed, or GAMS not found) reads as not available. Still cached and lazy. - info() reports 'gams.transfer usable: yes/no' alongside the importability line so the version-skew case is diagnosable. - resolve_backend's error message now says 'not usable here' rather than 'not importable'. Net: HAVE_GAMS_TRANSFER-gated tests skip cleanly where gams.transfer is present but unusable (GAMS 48 venv: 86 passed, 18 skipped), and stay green where it works (GAMS 53: 104 passed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Legacy gdxpds read an Alias as a degenerate float column (raw stored values, no membership fixup, no special-value handling) -- an untested, effectively unused path. An alias is just another name for a set, so gdxpds now reads it like the set it aliases (c_bool membership / element text), consistently across both backends. - gdx.py (shared data model): treat GamsDataType.Alias like Set in the dataframe setter, _init_dataframe, _fixup_set_value, and get_value_col_default -- so both backends inherit Set-like alias reads. - _gdxcc_backend: aliases take the Set set-text path (the special-value bypass already excluded them); an alias now reads c_bool membership instead of 0.0. - _transfer_backend: aliases route through the Set translation (gams.transfer delegates an alias's .records to its parent set); the prior NotImplementedError is gone. - dev/build_alias_fixture.py + tests/data/alias_fixture.gdx: a parent Set t and an Alias at (built via raw gdxAddAlias, since to_gdx can't emit aliases). - tests: test_alias_reads_like_its_set asserts at reads like t (membership, Alias data type); the globbed test_backend_parity guarantees the gams_transfer read matches gdxcc. 104 passed (GAMS 53). Alias deliberately mirrors Set's *current* behavior, including the membership- boolean/set-text wart, slated to be fixed for Set and Alias together in v3.0.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t in ROADMAP - v2.1.0 status block: Phase 0, Step 1, and Phase A (read backend) landed on eh/gams-transfer; Phase B (write) remaining. - Record the two read-side decisions: aliases read as Sets (both backends), and HAVE_GAMS_TRANSFER means usable (not just importable). - Known-warts membership entry + v3.0.0 candidate now note aliases share the Set membership wart (fixed together). - PR/branch map reflects read-path completion. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TransferBackend.write_file builds a gams.transfer Container from the loaded gdxpds symbols (the inverse of the Phase A read translation) and writes it: - value columns derive from value_col_names (GamsValueType), the same source the gdxcc backend uses -- no second hard-coded list to keep in sync - inverse special-value mapping (eps -> EPS, NaN -> NA; +/-inf unchanged) - empty element_text for Sets (strict parity with gdxcc: no set-text-write in v2.1.0; the membership-boolean wart collapses sets to all-False) - strict/relaxed domain choice mirrors the gdxcc write path - writing aliases is unsupported (to_gdx never infers one); raises clearly Tests: write parity now covers the full write x read backend matrix (both engines write, both read each output) including transfer-write/transfer-read; plus the R12 mixed-boolean set-write gate and an alias-write NotImplementedError test for the one write branch parity cannot reach. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds tests/test_backend_timing.py: times read+write for both backends over every in-tree fixture (sub-3 KB up to ~1.9 MB) and records the results. A pytest_terminal_summary hook in conftest renders a size-sorted table plus a clear-winner / switchover note at the end of the run. These are not pass/fail performance gates (timings are machine-dependent and measured as a single min-of-3 burst); the only assertion is that both backends ran. The robust finding: gdxcc wins on small files (transfer's Container overhead is not amortized) while gams.transfer wins on large ones (~2x read, ~4x write at 1.9 MB), with the write switchover around 100-300 KB. Runs by default; skipped only when gams.transfer is unavailable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a switchable backend architecture for GDX I/O, introducing an opt-in gams.transfer engine alongside the existing gdxcc path while keeping gdxcc as the default/correctness oracle.
Changes:
- Refactors
GdxFile/GdxSymbolto delegate read/write/handle lifecycle to pluggable backend implementations (gdxccand newgams.transfer). - Adds backend selection plumbing (
Backendenum, env var resolution, new exceptions) and ato_dataframes(symbols=[...])subset-read feature. - Expands test coverage with cross-backend read/write parity tests plus backend selection and (non-gating) timing reporting.
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_write.py | Updates wording of the round-trip write test docstring. |
| tests/test_read.py | Adds coverage for eager-path set text and alias-as-set read behavior. |
| tests/test_handle_lifecycle.py | Updates lifecycle assertion to use the new GdxFile.H delegation. |
| tests/test_backend_timing.py | Adds (skippable) timing collection for gdxcc vs gams.transfer. |
| tests/test_backend_selection.py | Adds tests for backend resolution, exceptions, and symbol-subset reads. |
| tests/test_backend_parity.py | Adds read/write parity tests across the full backend matrix. |
| tests/conftest.py | Adds session timing aggregation + terminal summary rendering. |
| src/gdxpds/write_gdx.py | Threads backend= through to_gdx() and writer GdxFile creation. |
| src/gdxpds/tools.py | Adds lazy HAVE_GAMS_TRANSFER probing and reports transfer usability in info(). |
| src/gdxpds/read_gdx.py | Threads backend= into read helpers; adds symbols= subset support in to_dataframes(). |
| src/gdxpds/gdx.py | Extracts backend-agnostic model; adds backend delegation + SymbolNotFoundError; adds eager load helpers. |
| src/gdxpds/_transfer_backend.py | Implements gams.transfer read/write translation backend. |
| src/gdxpds/_gdxcc_backend.py | Extracts the existing gdxcc logic into a backend implementation. |
| src/gdxpds/_backend.py | Introduces backend enum, resolution logic, and the GdxBackend interface. |
| src/gdxpds/init.py | Re-exports backend API and lazily exposes HAVE_GAMS_TRANSFER. |
| dev/ROADMAP.md | Updates roadmap to reflect read+write transfer backend plan/status. |
| dev/build_alias_fixture.py | Adds fixture generator for alias behavior tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… together Add a -b/--backend flag to csv_to_gdx and gdx_to_csv (choices: gdxcc, gams_transfer), defaulting to None so the GDXPDS_BACKEND env var / gdxcc fallback still applies. The flag routes to to_gdx / to_dataframes; a parametrized round-trip test exercises it through both CLIs. Docs: replace the standalone engine note with a single Configuration section in the overview that explains the two parallel runtime choices -- GAMS install location (gams_dir) and I/O engine (backend) -- together, since each is set the same three ways (keyword, environment variable, CLI flag). Add a concise Configure subsection to the README install steps that points to it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…es) + ruff format Read parity (Copilot #2): _convert_transfer_specials now preserves the gdxcc oracle's UNDEF/NA distinction -- GDX UNDEF -> None (object column), GDX NA -> np.nan -- instead of collapsing both to NaN. New test_read_parity_undef builds a genuine UNDEF via gams.transfer (gdxcc cannot write one). Write parity (Copilot #1): the comment's premise was wrong -- gdxcc's write path drops UNDEF (None -> 0.0, since None isn't a Number), so it never preserves it. For strict v2.1.0 parity, _np_to_transfer_specials now maps None -> 0.0 (matching the oracle) rather than to NA. New test_write_parity_undef locks this in. Documented as a v3.0.0 candidate in ROADMAP.md (gams.transfer could emit a real UNDEF, but that diverges from the oracle). Probe (Copilot #3): _probe_gams_transfer accepts an explicit gams_dir and probes it fresh/uncached, so info(gams_dir=...) reports that install and a transient failure can't poison the cached default. Types (Copilot #4/#5): widen public backend annotations from str | None to str | Backend | None across read_gdx/write_gdx. ruff format: collapse hand-wrapped lines in _gdxcc_backend.py and gdx.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve the five round-2 Copilot review comments, plus the deeper checks
they warranted:
- gdx.py: load_symbols docstring named the wrong exception
(SymbolNotFoundError, not gdxpds.tools.Error). Also documented the same
exception on to_dataframe (was undocumented, unlike to_dataframes).
- dev fixture builders: pin backend="gdxcc" so GDXPDS_BACKEND can't
redirect the raw-f.H scripts (build_alias_fixture, build_set_text_fixture)
to the transfer backend (where f.H is None); pin the high-level write
builders too so committed fixtures stay deterministic vs. the oracle.
- _gdxcc_backend.py: rewrite stale module/class docstrings to the current
handle-ownership + full-contract reality.
- _backend.py: open_read no longer overstates _version/_producer ("where
available"); fix stale module + GdxBackend docstrings; resolve_backend
documents BackendError; write_symbol "single-symbol path".
- _transfer_backend.py: title/class docstring now say read + write; drop
the Phase A/B dev-sequence labels.
- conftest.py: _crossover_note guards first==0 / non-monotonic timings so
it can't report a bogus switchover band (the read op actually hits this).
Docstrings now describe present behavior, not the refactor that produced
the code. ruff + pyright clean; full suite 127 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- __version__ 2.0.0 -> 2.1.0 (single source; pyproject reads it dynamically) - CHANGES.txt: v2.1.0 entry (gams.transfer opt-in read/write backend, backend= / GDXPDS_BACKEND / --backend, to_dataframes(symbols=), Backend / BackendError / SymbolNotFoundError, HAVE_GAMS_TRANSFER) - ROADMAP: mark v2.1.0 Phase B done and the release prep complete Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in, switchable
gams.transferbackend alongside the existinggdxccpath for both reading and writing GDX files.gdxccstays the default and the correctness oracle; nothing changes for existing callers.Select per call or globally:
Why
On a real ReEDS-sized GDX,
gams.transferis ~87x faster for both read and write (read 776.6s -> 9.0s; write 1002.0s -> 11.5s in the spike). That magnitude justifies carrying a second backend behind a switch so the speedup can be documented/tested before any future default flip.How (symmetric architecture)
Rather than bolt a transfer module onto a
gdx.pythat doubles as "the gdxcc backend", the gdxcc logic was first extracted behind a shared interface:gdx.py-> backend-agnostic interface + data model (GdxFile/GdxSymboldelegate I/O to a_backend_impl)_backend.py->Backendenum,GdxBackendABC,resolve_backend,make_backend,DEFAULT_BACKEND_gdxcc_backend.py-> today's gdxcc read/write/handle logic, extracted_transfer_backend.py-> the gams.transfer read + write translationStaged across the commits on this branch: Phase 0 (extract gdxcc + unify set-text reads) -> Step 1 (backend selection plumbing + new
to_dataframes(symbols=[...])subset feature) -> Phase A (transfer read) -> Aliases read as Sets in both backends -> Phase B (transfer write).Notable details:
value_col_names(GamsValueType) on both backends -- one source of truth, no parallel hard-coded list.isNA/isUndef/isEps) to preserve the gdxcc NA/UNDEF/EPS distinctions byte-for-byte.element_text, matching gdxcc's membership-boolean behavior). Writing aliases via transfer raises clearly (to_gdxnever infers one).HAVE_GAMS_TRANSFERis a functional probe (constructs a Container), so transfer-dependent tests skip cleanly wheregamsapiis present but cannot load the active GAMS libraries.Testing
tests/test_backend_parity.py: read parity over every in-tree fixture (xload_set_text), special-value parity, symbol-subset parity, and write parity over the full write x read backend matrix (both engines write, both read each output) -- including transfer-write/transfer-read.NotImplementedErrortest (the one write branch parity cannot reach).tests/test_backend_selection.py: enum / resolution / precedence /info()/ exceptions.Remaining v2.1.0 steps (follow-up, not in this PR)
dev/benchmarks/and capture the speedup table.__version__-> 2.1.0,CHANGES.txt, docs note for the opt-in switch.The v3.0.0 default-flip + set-text-write are deliberately deferred (one coordinated breaking release); the v2.1.0 code isolates the empty-
element_textemission and theDEFAULT_BACKENDindirection so that flip is a near one-spot change.🤖 Generated with Claude Code