Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,34 @@
v3.0.0, 05/25/26 -- BREAKING: the default I/O engine is now gams.transfer when it is
usable, falling back to gdxcc otherwise; pass engine="gdxcc" (or set
GDXPDS_ENGINE) to pin the gdxcc engine
BREAKING: a Set or Alias value column is now always the GAMS element
text (a string; "" for a member with no text), and membership is
conveyed by row presence. The load_set_text argument is removed: text
is always read, and any non-empty text is always written
aliases are now fully supported: they read with a populated
GdxSymbol.alias_of (the parent Set), round-trip through both
engines, and can be created via to_gdx(aliases={alias: parent}) or
gdxpds.gdx.append_alias()
BREAKING: GDX UNDEF (Python None) is now preserved on write by both
engines, round-tripping as None (previously it collapsed to 0.0)
BREAKING: removed GdxFile.H; reach the raw gdxcc handle, if needed, via
gdx_file._engine_impl.handle
import gdxpds no longer requires a GAMS binding to be installed; the
bindings load on the first GDX operation, so gdxpds info / gdxpds test
can diagnose a no-bindings environment
new TransferError for gams.transfer I/O failures (a subclass of
gdxpds.tools.Error, parallel to GdxError)
v2.1.0, 05/23/26 -- gams.transfer I/O engine for reading and writing GDX, opt-in
and non-breaking; select it with the backend= keyword, the GDXPDS_BACKEND
environment variable, or the --backend CLI flag. gdxcc stays the default,
and non-breaking; select it with the engine= keyword, the GDXPDS_ENGINE
environment variable, or the --engine CLI flag. gdxcc stays the default,
and the two engines produce identical DataFrames and GDX files
gams.transfer is much faster on large files (order-of-magnitude on
hundreds-of-MB GDX) but slower on very small ones; it is usable only with
a compatible gamsapi installed (check gdxpds.HAVE_GAMS_TRANSFER)
gdxpds info reports the resolved backend and gams.transfer usability
gdxpds info reports the resolved engine and gams.transfer usability
to_dataframes() gains a symbols= argument to read only the named symbols
new SymbolNotFoundError, raised for an unknown symbol name (subclass of
gdxpds.tools.Error, like BackendError)
gdxpds.tools.Error, like EngineError)
v2.0.0, 05/20/26 -- BREAKING: to_dataframe() now always returns a plain DataFrame;
the old_interface argument is removed. Migrate by dropping any
old_interface=False from calls (old_interface=True previously
Expand Down
17 changes: 9 additions & 8 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
`gdxpds` translates between GDX (GAMS Data eXchange) files and pandas DataFrames. GDX is the binary file format used by [GAMS](https://www.gams.com/), a mathematical optimization modeling system. Two entry points:

- High-level functions: `to_dataframes()`, `to_dataframe()`, `list_symbols()`, `get_data_types()`, `to_gdx()` — exposed at package top level.
- Backend classes: `GdxFile` and `GdxSymbol` in [src/gdxpds/gdx.py](src/gdxpds/gdx.py) for programmatic, lazy access.
- Object-oriented API: `GdxFile` and `GdxSymbol` in [src/gdxpds/gdx.py](src/gdxpds/gdx.py) for programmatic, lazy access.

## Runtime dependency on GAMS

This package **cannot function without a GAMS installation** — there is no mock layer. The SWIG-bound GDX bindings are imported at module load and talk to the GAMS shared library found at runtime. Two equivalent binding sources are supported via `try/except` imports in [src/gdxpds/__init__.py](src/gdxpds/__init__.py), [src/gdxpds/special.py](src/gdxpds/special.py), and [src/gdxpds/gdx.py](src/gdxpds/gdx.py):
This package **cannot function without a GAMS installation** — there is no mock layer. The SWIG-bound GDX bindings talk to the GAMS shared library found at runtime, and are imported **lazily** (on the first GDX operation), so `import gdxpds` itself does not need a binding. Two equivalent binding sources are supported via `try/except` imports (inside the engine modules and the lazy-load helpers, not at package import):

- **Modern (recommended):** `from gams.core import gdx as gdxcc` — shipped inside `gamsapi`, which the user installs version-matched to their GAMS install (`pip install gamsapi[transfer]==xx.y.z`). Not a base dependency of gdxpds.
- **Legacy:** `import gdxcc` — the standalone PyPI package. Available via the `[legacy]` extra (`pip install gdxpds[legacy]`). Older but the SWIG C ABI is stable enough that it still works.
Expand All @@ -20,7 +20,7 @@ Other runtime notes:

- GAMS lookup order is implemented by `GamsDirFinder` in [src/gdxpds/tools.py](src/gdxpds/tools.py): `GAMS_DIR` env var → `GAMSDIR` env var → `where gams` / `which gams` → walk default install location (`C:\GAMS` on Windows; picks highest version). The Windows walk handles both the modern `C:\GAMS\<version>\` layout and the legacy `C:\GAMS\win64\<version>\` layout by looking for `gams.exe` to identify a GAMS root.
- `GAMS_DIR` remains mandatory at runtime even with pip-installed bindings, because the GDX shared library lives in the GAMS install directory, not in the wheel. The recommended pattern is one venv per GAMS install with `$Env:GAMS_DIR` pinned via `Activate.ps1` — see [dev/README.md](dev/README.md).
- **Known issue: `import gdxpds` requires at least one binding installed.** [src/gdxpds/gdx.py](src/gdxpds/gdx.py) and [src/gdxpds/special.py](src/gdxpds/special.py) top-level-import `gdxcc` (modern or legacy); if neither is installed, `import gdxpds` raises `ModuleNotFoundError`. This means `gdxpds info` / `gdxpds test` cannot diagnose the exact "no bindings installed" environment they would be most useful for. Fix would defer the `gdxcc` imports (and the `gdxcc.GMS_*` constants currently referenced by the `GamsDataType` enum at module load).
- **`import gdxpds` works with no binding installed.** The `gdxcc.GMS_*` type codes are hardcoded in the `GamsDataType`/`GamsVariableType`/`GamsEquationType`/`GamsValueType` enums (with `tests/test_imports.py::test_gms_constants_match_gdxcc` verifying them against the live binding when present), and the bindings load on the first GDX op. So `gdxpds info` / `gdxpds test` can diagnose the "no bindings installed" environment.

If tests fail with "cannot load gdxcc" or "no `_gdxcc` module," it's a GAMS environment problem (missing `GAMS_DIR`, missing bindings, or version skew between `gamsapi` and the GAMS install), not a code bug.

Expand Down Expand Up @@ -95,14 +95,15 @@ Things that aren't obvious from one file:

- **Lazy loading.** `GdxFile` (a `MutableSequence` of `GdxSymbol`) defaults to `lazy_load=True`. Symbol data is only pulled from the GDX file when `.dataframe` is accessed. Iterating symbol metadata is cheap; touching dataframes is not.
- **Symbol kinds drive column shape.** `GamsDataType` ([src/gdxpds/gdx.py](src/gdxpds/gdx.py)) — Set, Parameter, Variable, Equation, Alias. Variables and Equations get five value columns (Level, Marginal, Lower, Upper, Scale); Parameters and Sets get a single `Value` column. Write code in [src/gdxpds/write_gdx.py](src/gdxpds/write_gdx.py) infers the type from DataFrame shape and naming.
- **Special values.** GAMS encodes NA/EPS/+Inf/-Inf/UNDEF as fixed magic floats (e.g. 1E300, 2E300, 3E300). [src/gdxpds/special.py](src/gdxpds/special.py) converts these to/from numpy equivalents (`np.nan`, `np.inf`) on read/write. Parameters and Sets bypass this conversion — keep that in mind when debugging value mismatches.
- **Set text vs. set membership.** By default, Set values are booleans (membership). Pass `load_set_text=True` to surface the GAMS element text via `gdxGetElemText()`. The `_fixup_set_vals` flag controls boolean coercion on write.
- **Lazy + idempotent GAMS bind.** `load_gdxcc()` in [src/gdxpds/tools.py](src/gdxpds/tools.py) binds the GAMS library and populates `gdxpds.special` dicts on the first GDX op (called by `GdxFile._create_gdx_object` before each handle, and by `info()` inside try/except for diagnostics). Process state: `tools._bindings_source`, `tools._loaded_gams_dir`.
- **Special values.** GAMS encodes NA/EPS/+Inf/-Inf/UNDEF as fixed magic floats (e.g. 1E300, 2E300, 3E300). [src/gdxpds/special.py](src/gdxpds/special.py) converts these to/from numpy equivalents (`np.nan`, `np.inf`, and `None` for UNDEF) on read/write. Parameters get this conversion; Sets/Aliases do not (their value column is text, see below). UNDEF (`None`) is preserved on write by both engines; the gams.transfer write passes `eps_to_zero=False` so EPS survives too.
- **Set value = element text; membership = row presence.** A Set/Alias has one `Value` column holding the GAMS element text (a string; `""` = a member with no text). Every row is a member — there is no boolean. `_fixup_set_value` ([src/gdxpds/gdx.py](src/gdxpds/gdx.py)) normalizes the column to text on assignment (a `bool`/`c_bool`/missing value → `""`), so a Set can be built from dims alone, from booleans, or from text. The read path always fetches text (`gdxGetElemText()` on gdxcc; the records frame on gams.transfer); there is no `load_set_text` flag.
- **Aliases.** An Alias reads like the Set it aliases and records the parent in `GdxSymbol.alias_of` (a parent ref, or `None`). It carries no records of its own: `alias.dataframe` returns the parent's `dataframe` directly (a live view; mutating the parent shows through the alias), and direct assignment to `alias.dataframe` raises (it's not a mutable slot). Read paths just flip `_loaded`; write paths emit only the alias header (`gdxAddAlias` / `gt.Alias`). The parent must precede it (no relaxed fallback — `DomainError` otherwise). `to_gdx(aliases={alias: parent})` and `gdxpds.gdx.append_alias()` build them; ordering follows `reorder_for_strict_domains()`, which now adds the alias→parent edge. The parent is typically a Set, but an Alias is accepted too: GDX supports chained aliases, and the gdxcc engine preserves the chain on disk (`aat -> at -> t`) while gams.transfer flattens it to the root (`aat -> t`). On read both engines produce a same-file `alias_of` ref. *Universe* aliases (alias of `*`) are a documented edge: they read without error (`alias_of` resolves to `universal_set`) and round-trip within one engine, but the engines disagree on membership (gdxcc includes the `*` element, gams.transfer doesn't), so `universe_alias_fixture.gdx` is excluded from the cross-engine parity glob and covered by `tests/test_alias.py`.
- **Lazy + idempotent GAMS bind.** `load_gdxcc()` in [src/gdxpds/tools.py](src/gdxpds/tools.py) binds the GAMS library and populates `gdxpds.special` dicts on the first GDX op (called by the gdxcc engine's `__init__` before it creates a handle, and by `info()` inside try/except for diagnostics). Process state: `tools._bindings_source`, `tools._loaded_gams_dir`.
- **`gams_dir=` on the first GDX op selects the bound install.** Once loaded, subsequent `gdxCreateD(H, <dir>, ...)` calls are no-ops against the bound library regardless of `<dir>`; `load_gdxcc()` warns when a caller passes a `gams_dir` that differs from `_loaded_gams_dir`. One GAMS library per process — multi-version testing is one-venv-per-GAMS. In-process swap is feasible via `gdxLibraryUnload()` but unimplemented (design notes tracked in a GitHub issue).
- **GDX handle lifecycle** (SWIG-bound `gdxcc`). The full `new_gdxHandle_tp` → `gdxCreateD` → `gdxFree` → `delete_gdxHandle_tp` sequence lives in one place: the `_GdxHandle` RAII class in [src/gdxpds/tools.py](src/gdxpds/tools.py), used by all three create sites (`load_gdxcc` and `load_specials` via `with`; `GdxFile._create_gdx_object` keeps the instance). It encodes two SWIG hazards so callers don't have to:
- **GDX handle lifecycle** (SWIG-bound `gdxcc`; gdxcc engine only — the gams.transfer engine holds no handle). The full `new_gdxHandle_tp` → `gdxCreateD` → `gdxFree` → `delete_gdxHandle_tp` sequence lives in one place: the `_GdxHandle` RAII class in [src/gdxpds/tools.py](src/gdxpds/tools.py), used by all three create sites (`load_gdxcc` and `load_specials` via `with`; `GdxccEngine.__init__` keeps the instance). It encodes two SWIG hazards so callers don't have to:
1. `gdxFree(H)` is **unsafe on a failed-create handle** — it dispatches through `XFree`, bound only on a successful `gdxCreateD`, so freeing after failure segfaults. `_GdxHandle` frees+deletes on success but on failure **deletes only** (the wrapper is a plain `calloc`/`free`, always safe) and never calls `gdxFree`; the create is validated by `_check_gdx_create_rc` (raises `GamsLoadError`).
2. `gdxFree` is also **unsafe to call twice** (double `XFree` + `objectCount` underflow), so `_GdxHandle.close()` is run-once/idempotent and every `new_gdxHandle_tp()` is paired with exactly one `delete_gdxHandle_tp()`.
`GdxFile` owns its handle for its lifetime and frees it via `weakref.finalize(self, handle.close)` — fired at the first of `cleanup()`/`__exit__`, garbage collection (it sits in a cycle via `universal_set`, so *cyclic* GC reclaims it), or interpreter exit. **No class frees from `__del__`** (which would run at teardown after module state is partially gone); `close()` binds its gdxcc callables at construction so it stays valid at shutdown. The legacy `to_dataframes`/`to_gdx` `Translator`s call `GdxFile.cleanup()` (not the removed `__del__`). Regression coverage: [tests/test_handle_lifecycle.py](tests/test_handle_lifecycle.py).
The gdxcc engine owns its handle for its lifetime; `GdxFile` schedules `weakref.finalize(self, self._engine_impl.close)` — fired at the first of `cleanup()`/`__exit__`, garbage collection (it sits in a cycle via `universal_set`, so *cyclic* GC reclaims it), or interpreter exit. **No class frees from `__del__`** (which would run at teardown after module state is partially gone); the engine's `close()` binds its gdxcc callables at construction so it stays valid at shutdown. The legacy `to_dataframes`/`to_gdx` `Translator`s call `GdxFile.cleanup()` (not the removed `__del__`). Regression coverage: [tests/test_handle_lifecycle.py](tests/test_handle_lifecycle.py).

## Conventions and gotchas

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ https://github.com/NatLabRockies/gdx-pandas/releases.

### Configure

gdxpds needs to know **where GAMS is**, and optionally **which I/O engine** to use. Set either once via an environment variable, or per call with the `gams_dir=` / `backend=` keywords (also `--gams_dir` / `--backend` on the CLIs):
gdxpds needs to know **where GAMS is**, and optionally **which I/O engine** to use. Set either once via an environment variable, or per call with the `gams_dir=` / `engine=` keywords (also `--gams_dir` / `--engine` on the CLIs):

```bash
export GAMS_DIR=/path/to/gams # otherwise auto-discovered
export GDXPDS_BACKEND=gams_transfer # default: gdxcc; gams_transfer is much faster on large files (needs gamsapi)
export GDXPDS_ENGINE=gdxcc # default: gams_transfer when usable (much faster on large files), else gdxcc
```

See *Configuration* in the [documentation](https://NatLabRockies.github.io/gdx-pandas) for the full keyword / environment-variable / CLI matrix and the speed trade-offs.
Expand Down
Loading
Loading