Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
23e94ed
test: trim release-planning note from test_symbol_types_round_trip do…
elainethale May 21, 2026
2923be7
docs: stage gams.transfer backend (v2.1.0) with symmetric architectur…
elainethale May 21, 2026
dd81362
refactor: extract gdxcc read primitive behind a GdxBackend ABC
elainethale May 21, 2026
c0fc00e
refactor: move GDX metadata reading into GdxccBackend.open_read
elainethale May 21, 2026
e72cbf0
refactor: move GDX write path + handle ownership into GdxccBackend
elainethale May 21, 2026
9cf687d
docs: record GdxFile.H as a v3.0.0 deprecation candidate in Known warts
elainethale May 21, 2026
c26fc13
refactor: unify set-text reads through GdxFile.load_all (Phase 0)
elainethale May 21, 2026
03f5762
style: drop trailing whitespace in test_symbol_types_round_trip docst…
elainethale May 21, 2026
0c64ffb
feat: opt-in backend selection + symbol-subset reads (Step 1)
elainethale May 21, 2026
ad70f9b
feat: gams.transfer read backend (Phase A)
elainethale May 21, 2026
927847c
fix: HAVE_GAMS_TRANSFER reflects usability, not just importability
elainethale May 21, 2026
2fc182e
feat: read Aliases as Sets in both backends
elainethale May 21, 2026
33ce9d2
docs: record v2.1.0 progress (Phase 0/Step 1/Phase A) and alias-as-Se…
elainethale May 21, 2026
e16aa44
feat: gams.transfer write backend (Phase B)
elainethale May 22, 2026
c640423
test: document gdxcc vs gams_transfer read/write speed across fixtures
elainethale May 22, 2026
4ee6467
feat: --backend flag for the CLIs; document GAMS/engine configuration…
elainethale May 22, 2026
8ddbcb1
fix: address Copilot PR #111 review (special-value parity, probe, typ…
elainethale May 22, 2026
e0f7de4
fix: address Copilot PR #111 review (round 2) + docstring cleanup
elainethale May 23, 2026
11f9dc3
release: prep v2.1.0 (version bump, CHANGES, ROADMAP)
elainethale May 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 138 additions & 22 deletions dev/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,13 @@ into releases. Update this file as releases ship.
|
| [test-gap groundwork lands on main — correctness oracle for the speedup]
v
2.1.0 gams.transfer read fast path (opt-in, non-breaking)
greenfield accelerator — gated by an evaluation spike first
2.1.0 gams.transfer fast path — read AND write, opt-in & non-breaking,
behind a backend switch. Spike done (~87x); strict parity vs gdxcc.
|
v
3.0.0 breaking, coordinated:
- flip default backend to gams.transfer (when available)
- add set-text-write (gams.transfer provides it natively)
```

Versioning is strict SemVer. `__version__` is single-sourced in
Expand All @@ -36,8 +41,9 @@ Each PR branches off `main`, in order:
|--------|---------|-------|
| `eh/ruff-typing-tooling` | **v1.6.0** | typing + tooling; finish with the `to_dataframe` deprecation warning, version bump, CHANGES entry |
| e.g. `eh/breaking-cleanup` | **v2.0.0** | remove `.py` CLI shims, gdx2py, and `to_dataframe(old_interface=...)` |
| e.g. `eh/test-gaps` | _(no tag)_ | test-gap groundwork; the correctness oracle for the speedup |
| e.g. `eh/gams-transfer` | **v2.1.0** | evaluation spike first (throwaway), then the read fast path |
| `eh/test-gaps` | _(no tag)_ | test-gap groundwork; the correctness oracle for the speedup — **landed** (PR #109) |
| `eh/gams-transfer` | **v2.1.0** | Phase 0 (gdxcc extracted) + Step 1 (backend switch) + Phase A (read fast path) **done**; Phase B (write) remaining |
| e.g. `eh/gams-transfer-default` | **v3.0.0** | flip default to gams.transfer + add set-text-write (breaking, coordinated) |

Ordering: the test-gap PR lands before the speedup PR — those tests are the
correctness oracle for the backend swap. The breaking-cleanup PR is independent
Expand Down Expand Up @@ -87,6 +93,8 @@ consider folding into `GdxFile`/`GdxSymbol`.

## Test-gap groundwork (between 2.0.0 and 2.1.0, no release of its own)

**Status: landed on `main` (PR #109).**

Tests don't ship in the wheel, so they don't warrant a version bump. Their purpose
is to be the **correctness oracle** for the 2.1.0 backend swap: gams.transfer
output must match the current gdxcc output exactly. Priorities:
Expand All @@ -101,10 +109,37 @@ output must match the current gdxcc output exactly. Priorities:

Reuse the existing fixtures and real `.gdx`/`.csv` files under `../tests/`.

## v2.1.0 — gams.transfer read fast path (greenfield accelerator)

The headline performance feature: an opt-in, non-breaking read accelerator using
`gams.transfer` (ships inside `gamsapi`, cross-platform).
## v2.1.0 — gams.transfer fast path, read + write (greenfield accelerator)

The headline performance feature: an opt-in, non-breaking accelerator using
`gams.transfer` (ships inside `gamsapi`, cross-platform), covering **both read and
write**, selected at runtime by a backend switch. The current gdxcc path stays the
default and the correctness oracle.

**Spike done — gate satisfied.** Measured on a real ReEDS-sized GDX
(`inputs-v20250926_mainK0_USA_defaults.gdx`, read into DataFrames then written
back): read 776.6 s → 9.0 s (~86x), write 1002.0 s → 11.5 s (~87x). The "proceed
only if the speedup is material" gate is decisively met — which is why this release
now also covers writes, not just reads.

**Status (branch `eh/gams-transfer`, landed locally in order).** Phase 0 (gdxcc
extracted behind a `GdxBackend` ABC; set-text reads unified), Step 1 (`Backend`
enum + `backend=` kwarg + `GDXPDS_BACKEND` env var + `HAVE_GAMS_TRANSFER` +
`to_dataframes(symbols=...)` subset + `BackendError`/`SymbolNotFoundError`), and
**Phase A** (the gams.transfer read backend, parity-tested vs gdxcc over all
fixtures incl. set text, special values, subset, and aliases). Two read-side
decisions firmed up during Phase A:

- *Aliases read as Sets* (both backends). Legacy gdxpds read an alias as a
degenerate float column — an untested/unused path; it now reads like the set it
aliases (`c_bool` membership), so it shares the Set membership-boolean wart and
is fixed together with it in v3.0.0.
- *`HAVE_GAMS_TRANSFER` means usable, not merely importable.* The probe constructs
a Container, so a version-skewed gamsapi (imports but can't load the GAMS shared
libraries) reads as unavailable and transfer-gated tests skip cleanly rather
than crashing. `info()` reports `gams.transfer usable: yes/no`.

**Remaining:** Phase B (the write path) + the v2.1.0 version bump / CHANGES entry.

**Why this is riskier than gdx2py was.** gdx2py was a clean drop-in (returned a
plain list). gams.transfer is not:
Expand All @@ -119,17 +154,98 @@ plain list). gams.transfer is not:
- *Whole-container vs. lazy-per-symbol read.* gdxpds is lazy; gams.transfer's
advantage is bulk reading. Reconciling the two touches the lazy-loading model.

**Step 0 — evaluation spike (gate; throwaway code, no release).** Read the
`../tests/` fixtures with both the current gdxcc path and a gams.transfer
prototype; diff the resulting DataFrames column-by-column; benchmark both. Proceed
only if the speedup is material **and** every divergence is explainable. Otherwise
keep the slow path and abandon — the only cost was the spike.

**If it proceeds (conservative shape):** read path only, behind a
`HAVE_GAMS_TRANSFER` capability flag in `GdxSymbol.load()`
([../src/gdxpds/gdx.py](../src/gdxpds/gdx.py)), with the `gdxDataReadStr` slow path
as the always-present fallback and correctness oracle (fast path must equal slow
path). Keep the translation layer isolated and unit-tested against slow-path
output. **Do not touch the write path** — it is tightly coupled to
`gdxDataWriteStr*` and is where divergence risk is highest. Surface availability in
`info()`.
**Shape (symmetric backends; both phases opt-in, switchable, strict parity).**

- *Phase 0 — extract the gdxcc backend first.* Before adding gams.transfer,
refactor so [../src/gdxpds/gdx.py](../src/gdxpds/gdx.py) is a backend-agnostic
interface + data model that delegates I/O to a `GdxBackend` (an ABC whose read
primitive is `load_symbols(names | None)`, with `load_file`/`load_symbol` as
conveniences, plus `write_file`/`close`). The existing gdxcc logic moves
to `_gdxcc_backend.py`; gams.transfer lands as a sibling `_transfer_backend.py`.
Also unifies set-text reads into the same `load_file`/`load_symbol` paradigm
(removing the read_gdx.py lazy-loop special case). Pure, behavior-preserving
refactor (existing tests are the oracle); ships as its own PR. Avoids a
permanent gdxcc/transfer asymmetry.
- *Backend switch.* A public `Backend` str-enum (`GDXCC`, `GAMS_TRANSFER`); a
`backend=` kwarg on `GdxFile` and the top-level read/write helpers, with a
`GDXPDS_BACKEND` env-var fallback (kwarg wins). Single `DEFAULT_BACKEND`
constant (= `GDXCC`); **no `"auto"` value** — an explicit gams.transfer request
that can't be satisfied raises. A `HAVE_GAMS_TRANSFER` capability flag and the
resolved default backend surface in `info()`.
- *Phase A — read.* Translation layer from `gt.Container` records to the existing
DataFrame shape (column names, `c_bool` set membership, special-value mapping).
`load_file` is one bulk `c.read(records=True)`; lazy/subset reads map to a
targeted `c.read(symbols=...)`.
- *New feature — symbol subset.* `to_dataframes(symbols=[...])` reads only the
named symbols (fills the gap between `to_dataframe` and `to_dataframes`).
Non-breaking (default = all); a single targeted read on gams.transfer, a loop
on gdxcc.
- *Phase B — write.* Translation layer from gdxpds DataFrames to a `gt.Container`,
then `Container.write()`. Reuses the existing type-inference in
[../src/gdxpds/write_gdx.py](../src/gdxpds/write_gdx.py) unchanged; only the
serialize-to-disk step changes.
- *Strict parity is the contract.* The fast path must equal the gdxcc path
exactly, verified by backend-parametrized parity tests over every fixture (fast
path == slow path, both directions). Keep the translation layer isolated and
unit-tested against gdxcc output.

**Deliberately preserved in v2.1.0:** Phase B does **not** write Set element text —
it emits empty `element_text`, matching the current gdxcc path (which has no
`gdxAddSetText` and silently drops text). This keeps the two backends byte-identical.
Set-text-write is a behavior change, deferred to v3.0.0 (below). The detailed
implementation design (translation per symbol kind, special-value bit-pattern
handling, lazy/eager behavior, risk register) lives in the working plan file, not
here.

## v3.0.0 — default-flip to gams.transfer + set-text-write (breaking)

One coordinated breaking release, once v2.1.0's parity tests have run green for a
cycle. The ~87x speedup makes indefinite opt-in untenable — users will expect the
speedup by default — and the set-text-write fix is breaking, so the two land
together under one major bump (one migration note, not two).

- **Flip the default backend** to gams.transfer when `HAVE_GAMS_TRANSFER` — an
honest one-line change of the `DEFAULT_BACKEND` constant. gdxcc-only
environments are unaffected; `backend="gdxcc"` remains available to pin the old
path.
- **Add set-text-write.** gdxpds gains the ability to write Set element text — a
capability the gdxcc path never had (no `gdxAddSetText`). The gams.transfer write
path provides it natively: the v2.1.0 "emit empty `element_text`" step flips to
"write the loaded text column." Because the default is now gams.transfer, this
becomes default behavior. This is the breaking change the "Known warts" set-text
item calls for.

Candidate third payload for the same release: fixing the membership-boolean wart
for **Sets and Aliases** (`Value` reliably `True` for members) — also breaking,
also touches read + write + `load_set_text`. See "Known warts" below.

## Known warts / deferred cleanups

Decide deliberately when these are touched. (The one below is now slated for
v3.0.0; future entries may be unscheduled.)

- **A Set's membership boolean is the stored value's truthiness, not "is a
member".** The `Value` column reads `c_bool(False)` for plain membership
(gdxpds- and GAMS-written Sets store `0.0`) and only `c_bool(True)` when a
non-zero value happens to be stored (e.g. a set-text node index). Membership is
really conveyed by row *presence*, so the boolean is misleading.
`_fixup_set_value` ([../src/gdxpds/gdx.py](../src/gdxpds/gdx.py)) leaves the
written value at `0.0` because `isinstance(c_bool(True), Number)` is False. The
current behavior is now pinned by tests in
[../tests/test_read.py](../tests/test_read.py) so the gams.transfer backend
can't drift. **Aliases now read as Sets (v2.1.0), so they share this wart** —
the fix must cover Set and Alias together. Fixing it (membership reliably
`True`) is a deliberate behavior change touching the read path, the write path,
and `load_set_text` — coordinate with the gams.transfer work and treat it as
breaking. **Slated for v3.0.0** as a candidate payload alongside the
default-flip and set-text-write.

- **`GdxFile.H` is a gdxcc-specific escape hatch on an engine-agnostic
interface.** After the Phase 0 extraction it delegates to
`self._backend_impl.handle` — the gdxcc GDX pointer, or `None` for backends
without one (and after `cleanup`). It stays public and working in v2.1.0 (it's
used as a raw-`gdxcc` escape hatch, e.g. in
[../tests/test_specials.py](../tests/test_specials.py)). **Candidate for
deprecation/removal in v3.0.0**, when the default flips to gams.transfer and
`None` becomes the common return; power users would move to
`gdx_file._backend_impl.handle` or a documented accessor.
62 changes: 62 additions & 0 deletions dev/build_alias_fixture.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""Generate tests/data/alias_fixture.gdx.

A 1D parent Set plus an Alias of it, used by the backend read-parity tests.
gdxpds has no API for *writing* aliases (to_gdx infers types from DataFrame
shape and never emits an alias), so this is built with the raw gdxcc bindings
(gdxAddAlias), the same low-level approach used in build_set_text_fixture.py.
Committed to the repo; only re-run this if the schema changes.

Usage (from repo root, with the venv active and $env:GAMS_DIR set):

python dev\\build_alias_fixture.py

Schema:
Set t : 1D, elements a / b / c
Alias at : alias of t
"""

import os

import gdxpds.gdx

try:
from gams.core import gdx as gdxcc
except ImportError:
import gdxcc

OUT_PATH = os.path.abspath(
os.path.join(os.path.dirname(__file__), "..", "tests", "data", "alias_fixture.gdx")
)

ELEMENTS = ["a", "b", "c"]


def main():
with gdxpds.gdx.GdxFile() as f:
Comment thread
elainethale marked this conversation as resolved.
Outdated
if not gdxcc.gdxOpenWrite(f.H, OUT_PATH, "gdxpds"):
raise gdxpds.gdx.GdxError(f.H, f"Could not open {OUT_PATH!r} for writing")
f.universal_set.write()

# Parent set t = {a, b, c}.
if not gdxcc.gdxDataWriteStrStart(
f.H, "t", "parent set", 1, gdxpds.gdx.GamsDataType.Set.value, 0
):
raise gdxpds.gdx.GdxError(f.H, "Could not start writing data for symbol t")
gdxcc.gdxSymbolSetDomainX(f.H, 1, ["*"])
values = gdxcc.doubleArray(gdxcc.GMS_VAL_MAX)
values[gdxcc.GMS_VAL_LEVEL] = 0.0
for elem in ELEMENTS:
gdxcc.gdxDataWriteStr(f.H, [elem], values)
gdxcc.gdxDataWriteDone(f.H)

# Alias at -> t.
if not gdxcc.gdxAddAlias(f.H, "t", "at"):
raise gdxpds.gdx.GdxError(f.H, "Could not add alias at -> t")

gdxcc.gdxClose(f.H)

print(f"Wrote {OUT_PATH} ({os.path.getsize(OUT_PATH)} bytes)")


if __name__ == "__main__":
main()
17 changes: 16 additions & 1 deletion src/gdxpds/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
__version__ = "2.0.0"

from gdxpds.gdx import GdxError
from gdxpds._backend import Backend, BackendError
from gdxpds.gdx import GdxError, SymbolNotFoundError
from gdxpds.read_gdx import (
get_data_types,
get_subset_relationships,
Expand All @@ -25,10 +26,24 @@
"GamsLoadError",
"GdxError",
"GamsDirFinder",
"Backend",
"BackendError",
"SymbolNotFoundError",
"HAVE_GAMS_TRANSFER",
"to_dataframes",
"to_dataframe",
"list_symbols",
"get_data_types",
"get_subset_relationships",
"to_gdx",
]


def __getattr__(name: str):
# Expose HAVE_GAMS_TRANSFER lazily so ``import gdxpds`` does not pay the
# gams.transfer import cost; the probe runs on first access.
if name == "HAVE_GAMS_TRANSFER":
from gdxpds.tools import _probe_gams_transfer

return _probe_gams_transfer()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
Loading
Loading