Skip to content

backend: coerce all-numpy operands in promote_scalars under a forced backend#991

Open
jobovy wants to merge 2 commits into
feat/backendsfrom
backend/coords-scalar-coerce
Open

backend: coerce all-numpy operands in promote_scalars under a forced backend#991
jobovy wants to merge 2 commits into
feat/backendsfrom
backend/coords-scalar-coerce

Conversation

@jobovy

@jobovy jobovy commented Jun 20, 2026

Copy link
Copy Markdown
Owner

What

promote_scalars passed its inputs through unchanged when none was a backend array to anchor on (ref is None), assuming "the namespace's functions handle scalars." That holds for jax but not torch: torch.cos/sqrt/… reject numpy.float64 / python floats. So under a forced torch backend, every promote_scalars caller — the coords transforms (cyl_to_rect/rect_to_cyl/…) and OblateStaeckelWrapperPotential — crashed on all-numpy inputs:

TypeError: cos(): argument 'input' (position 1) must be Tensor, not numpy.float64
   at galpy/util/coords.py:1164  (cyl_to_rect)

This is the root cause behind the migrated RotateAndTilt / Offset / OblateStaeckel / Kuzmin wrapper torch failures (they route coordinates through these transforms). The fix coerces the operands via coerce_coords in that branch:

 ref = next((v for v in vals if is_backend_array(v)), None)
 if ref is None:
-    # ... pass through, the namespace's functions handle scalars
-    return vals
+    # torch's functions REJECT numpy.float64/python floats, so coerce instead
+    return coerce_coords(xp, *vals)

One spot in the central coercion helper, rather than patching each transform.

Safety (adversarially reviewed, two independent passes, CPU-forced)

  • numpy byte-identical — the xp is numpy guard short-circuits first; verified object-identical pass-through and identical SHA-256 output hashes for cyl_to_rect/rect_to_cyl/cyl_to_spher/Rz_to_uv/uv_to_Rz over a 2000-point grid.
  • jax value-identical — jax already tolerated raw scalars; the coerced float64 values match numpy to 0.0 and autodiff still flows (jax.grad through a coords transform is finite). jax failure sets are byte-identical (functional no-op for jax).
  • No regression — 1663 backend tests pass; test_coords torch 20→9 failures (fixes 11), test_quantity torch 110→99 (fixes 11), all remaining are strict subsets; OblateStaeckelWrapper unaffected.
  • No caller-reliance break — the one by-reference caller (integrateFullOrbit.py) pins xp=numpy, hitting the guard; no caller depends on the old pass-through returning a raw scalar.
  • Also fixes a latent mixed numpy-array + scalar torch bug (the array was anchored-on but itself left un-coerced).

Tests

New tests/test_backend_coerce.py: numpy object-identity pass-through; forced-jax/torch coercion to backend float64 (the fixed branch); the anchored path unchanged; and a coords-transform integration check (fails at coords.py:1164 without the fix). Lives in the test_backend* coverage shard.

File-disjoint from #990 (Potential.py) and the merged #989. Part of the torch potential burndown; the per-potential _anchor fixes (PowerSpherical/SpiralArms/EllipticalDisk/…) are a separate themed PR.

🤖 Generated with Claude Code

@jobovy jobovy enabled auto-merge (squash) June 21, 2026 00:07
@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.87%. Comparing base (509ea17) to head (f6ecdca).
⚠️ Report is 1 commits behind head on feat/backends.

Files with missing lines Patch % Lines
galpy/backend/_namespaces.py 0.00% 4 Missing ⚠️
galpy/backend/_coerce.py 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (509ea17) and HEAD (f6ecdca). Click for more details.

HEAD has 26 uploads less than BASE
Flag BASE (509ea17) HEAD (f6ecdca)
28 2
Additional details and impacted files
@@                Coverage Diff                 @@
##           feat/backends     #991       +/-   ##
==================================================
- Coverage          99.93%   46.87%   -53.06%     
==================================================
  Files                254      254               
  Lines              39836    40668      +832     
  Branches             838      843        +5     
==================================================
- Hits               39810    19064    -20746     
- Misses                26    21604    +21578     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…backend

promote_scalars passed its inputs through unchanged when none was a backend
array to anchor on (`ref is None`), assuming "the namespace's functions handle
scalars". That holds for jax but NOT for torch: torch.cos/sqrt/... reject
numpy.float64 / python floats, so under a forced torch backend every
promote_scalars caller (the coords transforms cyl_to_rect / rect_to_cyl / ...,
OblateStaeckelWrapperPotential) crashed on all-numpy inputs. Coerce the operands
via coerce_coords in that branch instead.

Routing that branch through coerce_coords surfaced that #987's promote_scalars
refactor had silently dropped the device-reject fallback (the device-less
asarray retry when a namespace rejects the ref's .device value -- array-api jax
exposes .device as the string 'cpu' and jnp.asarray(device='cpu') raises
ValueError), leaving its test a no-op (the mock ref is no longer detected as a
backend array after #987's is_backend_array switch). Restore the fallback in
asarray_on_device (catch TypeError / ValueError -> device-less asarray; a
genuine dtype error re-raises from the fallback so it is not masked) and rewrite
the test to exercise asarray_on_device directly and deterministically.

The numpy path is byte-identical (the `xp is numpy` guard short-circuits, and
asarray_on_device's device branch is only taken when a backend array supplies a
device). jax value-identical under x64. Fixes the migrated RotateAndTilt /
Offset / OblateStaeckel / Kuzmin wrapper torch entries that route coordinates
through these transforms, plus 11 test_coords and 11 test_quantity torch cases.
New tests/test_backend_coerce.py covers the coercion branch.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@jobovy jobovy force-pushed the backend/coords-scalar-coerce branch from 9713b4a to 0d3c7de Compare June 21, 2026 03:09
@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

All-backend test status (jax / torch)

Commit b2a3704f6159e5cddc839c0dcf7c8c3ba21ac956

Green is achieved via the checked-in xfail-ledger (tests/backend_xfail.txt, applied xfail(strict=False)), so the metric to watch is the shrinking xfail count (burndown), not a raw pass count. A FAIL/ERR is an un-ledgered regression (reds the run). Because the ledger is non-strict, a now-passing ledgered test is a plain pass here (no per-push XPASS); burndown candidates -- in both directions -- are surfaced by the scheduled regen run, which rewrites the ledger from real outcomes. deferred is a separate burndown: tests skipped because they are unrunnable under the backend until the port is vectorized (see tests/backend_slow_skip.txt), e.g. the jax spherical-DF sampling/quadrature tests pending the Track F DF migration.

Overall: jax: 1058 passed · 264 xfail · 725 deferred | torch: 998 passed · 1039 xfail · 1 deferred · 6 FAIL/ERR

Ledger size: 2357 entries (jax=284, torch=2073).

Test shard jax torch
actionAngle ✅ 112 pass · 89 xfail ✅ 37 pass · 164 xfail
sphericaldf ✅ 164 pass · 26 xfail · 28 deferred ✅ 8 pass · 210 xfail
conversion + util + misc ✅ 87 pass · 4 xfail · 1 deferred ✅ 44 pass · 48 xfail
potential + scf + multipole — (no result) — (no result)
quantity + coords ✅ 287 pass · 49 xfail ✅ 228 pass · 108 xfail
orbit (energy/Jacobi + from_name) ✅ 0 pass · 0 xfail · 115 deferred ✅ 71 pass · 44 xfail
orbit + orbits (main) ✅ 0 pass · 0 xfail · 578 deferred ✅ 341 pass · 234 xfail
evolveddiskdf ✅ 35 pass · 0 xfail ✅ 34 pass · 1 xfail
jeans + dynamfric ✅ 17 pass · 2 xfail · 2 deferred ✅ 7 pass · 13 xfail · 1 deferred
qdf + pv2qdf + streamgapdf_impulse + noninertial ✅ 57 pass · 75 xfail · 1 deferred ✅ 15 pass · 118 xfail
streamgapdf ✅ 28 pass · 2 xfail ✅ 28 pass · 2 xfail
diskdf ✅ 129 pass · 0 xfail ✅ 116 pass · 13 xfail
streamdf + streamspraydf + streamTrack ✅ 142 pass · 17 xfail ❌ 69 pass · 84 xfail · 6 FAIL/ERR
Per-shard counts
Test shard backend pass xfail deferred XPASS fail error
actionAngle jax 112 89 0 0 0 0
actionAngle torch 37 164 0 0 0 0
sphericaldf jax 164 26 28 0 0 0
sphericaldf torch 8 210 0 0 0 0
conversion + util + misc jax 87 4 1 0 0 0
conversion + util + misc torch 44 48 0 0 0 0
potential + scf + multipole jax
potential + scf + multipole torch
quantity + coords jax 287 49 0 0 0 0
quantity + coords torch 228 108 0 0 0 0
orbit (energy/Jacobi + from_name) jax 0 0 115 0 0 0
orbit (energy/Jacobi + from_name) torch 71 44 0 0 0 0
orbit + orbits (main) jax 0 0 578 0 0 0
orbit + orbits (main) torch 341 234 0 0 0 0
evolveddiskdf jax 35 0 0 0 0 0
evolveddiskdf torch 34 1 0 0 0 0
jeans + dynamfric jax 17 2 2 0 0 0
jeans + dynamfric torch 7 13 1 0 0 0
qdf + pv2qdf + streamgapdf_impulse + noninertial jax 57 75 1 0 0 0
qdf + pv2qdf + streamgapdf_impulse + noninertial torch 15 118 0 0 0 0
streamgapdf jax 28 2 0 0 0 0
streamgapdf torch 28 2 0 0 0 0
diskdf jax 129 0 0 0 0 0
diskdf torch 116 13 0 0 0 0
streamdf + streamspraydf + streamTrack jax 142 17 0 0 0 0
streamdf + streamspraydf + streamTrack torch 69 84 0 0 6 0

…oses

The promote_scalars all-numpy coercion (this PR) makes the coords transforms
return backend arrays under a forced torch backend; the UNMIGRATED streamdf
(Track F) feeds that output into its numpy track-building, so these 6 tests --
which were accidentally passing because the old coords pass-through kept
streamdf's numpy path alive under forced torch -- now correctly fail (wrong
track / missing _interpolatedObsTrackAA). They join the 29 existing
streamdf-torch xfails (streamdf is unmigrated; the numpy / default path is
byte-identical and unaffected). They get un-ledgered when streamdf is migrated.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant