Skip to content

Rewrite xpublish-opendap with native DAP2/DAP4 protocol layer#100

Open
jhamman wants to merge 27 commits into
mainfrom
feature/dap4
Open

Rewrite xpublish-opendap with native DAP2/DAP4 protocol layer#100
jhamman wants to merge 27 commits into
mainfrom
feature/dap4

Conversation

@jhamman
Copy link
Copy Markdown
Contributor

@jhamman jhamman commented Mar 4, 2026

This is a ground-up rewrite of xpublish-opendap. It replaces the opendap-protocol dependency with a native in-repo protocol implementation, adds full DAP4 support, and introduces an async-first architecture with
memory-bounded streaming.

Note

This PR was built with Claude but not entirely by Claude. I wrote and edited much of the design docs and drove on the high degree of testing contained here. That said, there is a big diff in this PR. I don't expect any human to volunteer to review the whole thing. Instead, I think we should start using this in real world applications for a bit to build confidence in it before deciding if its worth merging.

What changed

  • Native protocol layer (xpublish_opendap/dap/): Clean-room implementation of DAP2 and DAP4 encoding, replacing the external opendap-protocol library. Includes a type system mapping 15 numpy dtypes to DAP wire types, a
    recursive-descent constraint expression parser, and xarray-to-DAP subsetting with memory estimation.
  • DAP4 support: DMR (Dataset Metadata Response) XML generation, chunked binary encoding with CRC32 checksums and native byte order, DAP4 constraint syntax, and Dataset Services Response. All four DAP4 response types
    (.dmr, .dap, .dsr, error) alongside all six DAP2 types.
  • Async streaming architecture: All route handlers are async def. Data loading runs in a thread pool executor with semaphore-bounded concurrency. Large arrays use slab streaming (split along the most-chunked dimension,
    yield per-slab) to bound HTTP response memory. A dask-graph fast path fuses dtype conversion into the computation graph for chunked data.
  • Constraint-first pipeline: Subsetting is applied before data loading (not after), and requests exceeding max_request_memory_bytes (default 512 MB) are rejected with 413.

Test suite

In my opinion, this is the best part of this PR. We now have a very thorough test suite

440 tests:

  • Unit tests: type mapping, constraint parsing, xarray adapter, DAP2 protocol (DDS/DAS/DODS/XDR encoding), DAP4 protocol (DMR/chunked binary/CRC32), slab streaming, dask-graph fast path, error responses, HTTP headers
  • Cross-language integration tests: netCDF4-python, PyDAP, xarray (multiple engines), NCO (ncks, ncdump), Julia (NCDatasets.jl), R (ncdf4)
  • 99% code coverage, strict mypy, ruff linting

Benchmarks

New benchmark suite (benchmarks/) with ~21 scenarios covering metadata generation, data responses, and constraint parsing across multiple dataset sizes. Includes an isolated IO benchmark (io_bench.py) for profiling the
encode pipeline without HTTP overhead, and a comparison workflow for regression detection between the release and feature branch.

Breaking changes

  • Removed opendap-protocol dependency
  • All handlers are now async (was sync)
  • New module structure (dap/ replaces dap_xarray.py)
  • DAP2 int8 promoted to Int16 (per OPeNDAP spec)

jhamman and others added 21 commits March 1, 2026 17:34
Replace the opendap-protocol dependency with an in-repo DAP2 implementation
providing full protocol conformance, async I/O, and memory management.

New modules (bottom-up dependency order):
- dap/types.py: DapType dataclass, numpy↔DAP2 type mapping (15 dtypes)
- errors.py: DapError hierarchy (5 error types with codes)
- dap/constraint.py: recursive-descent DAP2 constraint expression parser
- dap/xarray_adapter.py: constraint→xarray subsetting with memory estimation
- dap/dap2/{dds,das,dods}.py: DDS/DAS text and DODS XDR binary generators
- dap/dap2/{responses,headers}.py: Error/Version/Help responses, HTTP headers
- io.py: async infrastructure (thread pool, semaphores, load_dataset_async)
- plugin.py: fully async handlers with constraint-first pipeline

Key improvements over the old implementation:
- All six DAP2 response types (.dds, .das, .dods, .ver, .help, error)
- Proper HTTP headers (Content-Description, XDODS-Server)
- Constraint-first subsetting (applied before data loading)
- Memory threshold enforcement (413 for oversized requests)
- Async data loading off the event loop
- Streaming DODS responses
- DAP error responses for constraint/variable/index errors

104 tests passing (27 type, 15 parser, 13 adapter, 19 protocol, 24 route, 6 legacy).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement full DAP4 protocol layer alongside the existing DAP2 support:
- DMR (Dataset Metadata Response) XML generation with DAP4 type mapping
- DAP4 binary data responses with chunked serialization and CRC32 checksums
- DSR (Dataset Services Response) endpoint
- DAP4 constraint expression parsing (semicolons, leading slashes, filters)
- DAP4 error responses in XML format
- DAP4-specific HTTP headers (XDAP: 4.0)

New routes: .dmr, .dap, .dsr on the plugin router.
Includes 191 unit tests covering DMR generation, binary encoding,
constraint parsing, type mapping, and route handlers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test real DAP clients against a live xpublish server using the existing
xprocess-managed server fixture. Covers six client types:

- httpx: raw HTTP validation of DAP2 and DAP4 status codes, headers,
  content types, constraint expressions, and XML structure (14 tests)
- netCDF4-python: dimensions, variables, coordinates, attributes (8 tests)
- pydap: DAP2 and DAP4 client access with data validation (8 tests)
- xarray: netcdf4 and pydap engine integration (7 tests)
- ncdump: subprocess header, variable, and attribute inspection (3 tests)
- ncks (NCO): metadata listing, variable extraction, hyperslab (3 tests)

35 tests pass, 6 are xfail due to a known bug: the server does not
resolve DAP2 Grid member paths (e.g. air.air[0][0][0]) that clients
send when fetching data from Grid-typed variables.

Also registers the "integration" pytest marker and adds nco to CI deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DAP2 clients (netCDF4-C, pydap, ncks) send Grid-qualified constraint
paths like `air.air[0][0][0]` for variables declared as Grid types in
the DDS. The server rejected these with VariableNotFoundError because
it looked up `air.air` verbatim against xarray variable names.

Add `_resolve_grid_path()` helper that resolves dotted Grid member
paths back to xarray variable names (e.g. `air.air` → `air` for Array
members, `air.time` → `time` for Map members). Remove xfail markers
from the 5 integration tests that now pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compare ncks output against both a local netCDF file and the OPeNDAP
endpoint to verify actual data values match, not just dimensions/exit
codes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add integration tests that exercise xpublish-opendap's DAP2 responses
through R (ncdf4) and Julia (NCDatasets.jl), both of which bind to the
netCDF-C library's OPeNDAP client — a completely separate code path from
PyDAP. Each language has 7 test cases (open, dimensions, variables,
variable_shape, read_lat, read_data_slice, attributes) dispatched via
subprocess with per-test granularity. Tests skip gracefully when
Rscript/julia are not on PATH. CI jobs added for both languages on
ubuntu-latest, independent of the main 3x3 matrix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… and error bodies

Adds ~59 new tests (249 total unit tests) covering previously unexercised
encoding paths for integer types, strings, booleans, and datetime coords
across DAP2 and DAP4 protocols. Also covers DAS attribute formatting for
arrays, booleans, NaN/Inf/empty strings, and validates DAP4 error XML
structure including httpcode attributes and error codes.

New test classes:
- TestDODSTypeEncoding: XDR encoding for int8/16/32, uint16/32, string, bool, scalar
- TestMultiTypeDDS/DAS/DODS: end-to-end with multi-dtype dataset fixture
- TestInt32/UInt32/Int8/UInt16/Bool/UInt64Max Encoding: DAP4 native binary
- TestEdgeCaseSubsetting: single-element hyperslabs, large strides, coord auto-inclusion
- TestDAP4ErrorResponseBody: error XML structure validation
- TestDAP4HeaderConstants: protocol header constants (new file)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e gaps

Add 61 new tests covering async data loading (io.py), streaming responses,
constraint/query edge cases, cross-protocol DAP2/DAP4 consistency, and
special data types (timedelta, NaT, bytes/unicode, empty arrays, lossy
int64 encoding). Enable pytest asyncio_mode=auto in pyproject.toml.

New files: tests/test_io.py (15 tests), tests/test_async_streaming.py (7 tests)
Extended: test_dap2_responses (+9), test_plugin_routes (+11),
test_dap4_routes (+6), test_dap4_data (+5), test_xarray_adapter (+2),
test_types (+1). Total: 310 tests, all passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
28 new tests covering DMR attribute formatting, constraint parser edge
cases, hyperslab validation branches, memory estimation fallbacks, DAS
type fallbacks, scalar coordinate DDS, exception handlers in plugin
routes, and float16 dtype resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove ~38 lines of unreachable code: unused parser methods in
constraint.py, integer fallback-by-itemsize branches in types.py,
unreachable try/except blocks in das.py and dmr.py, and an unreachable
else branch in dods.py. Add 3 tests covering the remaining defensive
fallbacks (complex ndarray attr, unknown type attr, version import).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docstring capitalization (D403), import reformatting, and blank line
removal from pre-commit formatter run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…asymmetry

- Add # noqa comments for acceptable ruff violations (PLW2901, PLR2004,
  PLR0911) in type-dispatch and parser functions
- Remove unnecessary # noqa: PLR0911, PLR0912 from resolve_dap_type
  after dead code removal reduced return/branch count
- Fix remaining ruff violations: D301 raw docstrings in dods.py, B905
  strict= in xarray_adapter.py, D107 noqa for error __init__ methods
- Replace assert dap_type.dap4_name with explicit ValueError guards in
  dmr.py (assert is stripped by python -O)
- Fix attrs asymmetry: use cf_encode_variable for data variables in
  das.py and dmr.py so datetime data vars get CF-encoded attrs
- Move TestVersionImportFallback from test_types.py to test_init.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add [tool.mypy] strict config to pyproject.toml with per-module
  overrides for untyped third-party libs (xpublish, pluggy, cachey)
- Update pre-commit to install type-checking dependencies (numpy,
  xarray, fastapi, pydantic, xpublish) and remove --ignore-missing-imports
- Fix Hashable→str casts for xarray dimension/variable names in
  xarray_adapter.py, dds.py, and dmr.py
- Fix Optional[str] return types in dmr._attribute_dap4_type by adding
  dap4_name None guards (fall back to 'String')
- Add type: ignore comments for untyped xpublish/pluggy in plugin.py
- Add 5 tests for all dap4_name-is-None defensive guards in dmr.py,
  restoring 100% test coverage

Verification: mypy --strict (0 errors), ruff (0 errors),
pytest (346 passed, 100% coverage)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable mypy strict checking on test files via pyproject.toml overrides
(disallow_untyped_defs/calls relaxed for tests). Add type narrowing
assertions for Optional fields (plan.variables, item.slices, XML
Element.find results). Expand ruff per-file-ignores for tests (D100-D103,
PLC0415, PLR2004, PLW1510) and exclude auto-generated _version.py.
Format all test files with ruff format and fix E741 ambiguous variable
names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…work

Add docs/FEATURE_COMPLETENESS.md comparing the implementation against the
DAP2/DAP4 specs and Hyrax reference implementation. Covers routes, data
types, container types, constraint expressions, binary encoding, error
handling, and client compatibility.

Rewrite TODO.txt to list only feasible future work items, excluding
features incompatible with xarray's data model (Sequence, nested
Structure, Enum, Opaque).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…licate coords

- Add dask threaded scheduler for parallel chunk loading (3-8x speedup
  for multi-chunk remote datasets)
- Refactor DODS/DAP4 generators to load+encode per-variable in thread
  pool executor, keeping all data loading and numpy encoding off the
  event loop. Coordinates are cached and reused for Grid Map vectors.
- Remove duplicate coordinate serialization in DDS/DODS: coordinates
  used as Grid Maps are only emitted inside Grid declarations, not
  also as top-level arrays
- Add load_variable() helper and dask_num_workers config to OpenDapPlugin
- Move benchmarks into xpublish-opendap repo with response size display

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Slab streaming: for large dask-backed arrays, split along the most-chunked
dimension and yield encoded bytes per-slab via async generators. This bounds
HTTP response memory to slab size rather than full variable size. Includes
prefetch-1 pipeline for both DAP2 and DAP4.

Dask-graph fast path: for dask-backed numeric arrays, fuse dtype conversion
into the dask graph (ravel → rechunk 20 MB → block.astype(wire_dtype).compute)
instead of load-then-encode. Benchmarks show ~37% faster on chunked cloud data.
Falls back to the existing eager path for numpy, strings, datetime, and scalars.

Fix datetime slab streaming bug: CF encoding is data-dependent (each slab picks
its own reference time), so datetime64/timedelta64 arrays now bypass slab
streaming entirely. Also fixed pre-existing mypy errors in io.py and dods.py
(Hashable → str casts at xarray boundaries, generic type parameters).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Black quote normalization (single → double), trailing comma insertion,
and line wrapping. No logic changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ab fixes

- bench.sh: isolate benchmark runner in temp directory so release and dap4
  runs use their own installed packages instead of sharing the source tree;
  organize results into timestamped subdirectories
- scenarios.py: add expected_min_bytes validation per scenario; fix DAP2
  hyperslabs to two-value syntax for opendap-protocol compat; remove
  constraint-parsing-only scenarios
- run.py: validate response size against expected_min_bytes, flag failures
- results.py: add human-readable response size to comparison output
- compare.py: auto-formatting only (quote normalization)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- benchmarks/io_bench.py: standalone benchmark isolating the data loading +
  encoding pipeline without HTTP overhead; tests eager, slab-prefetch, and
  dask-graph strategies to profile where time is spent
- docs/design/streaming-io.md: design document exploring the data path for
  lazy loading and streaming cloud-backed datasets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reconcile all design documents with the actual codebase:

- DESIGN.md: Fix function signatures (generate_dds/das/dods/dmr/dap4_data),
  replace load_variables_async with actual io.py API, document slab streaming
  and dask-graph fast path, correct directory layout, update data flow diagrams
- FEATURE_COMPLETENESS.md: Fix XDAP header status, DAP4 string encoding
  (uint64 not int64), add R/Julia/httpx to client compatibility table
- streaming-io.md: Mark Strategy 2 as implemented, add dask-graph fast path
  section, update recommendation to reflect what was built
- benchmarks/README.md: Fix scenario count (16 not 21) and categories (3 not 4)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jhamman
Copy link
Copy Markdown
Contributor Author

jhamman commented Mar 4, 2026

Some examples from our the latest benchmarks: run on my laptop at home so all the caveats apply w.r.t. network bandwidth, etc.

=== air dataset ===

Metric: mean_latency_s
Baseline: feature/dap4@68babb6 (release_air) (release_air.json)

  [1] feature/dap4@68babb6 (release_air) (baseline)  2026-03-04T05:18:00.622551+00:00
  [2] feature/dap4@68babb6 (dap4_air)  2026-03-04T05:18:22.156148+00:00

Scenario                          [1]         [2]       delta
-------------------------------------------------------------
metadata-dds-full               5.4ms       2.1ms  -60.9% faster    [300B/215B]
metadata-das-full               8.1ms       2.0ms  -75.2% faster    [1.4KB/1.3KB]
metadata-ver                      N/A       0.6ms                [0B/37B]
metadata-dmr-full                 N/A       2.1ms                [0B/3.7KB]
metadata-dsr                      N/A       0.7ms                [0B/1.1KB]
metadata-dds-projected          4.7ms       2.0ms  -56.9% faster    [215B/215B]
data-dap2-scalar                6.1ms       3.6ms  -41.4% faster    [265B/264B]
data-dap2-single-timestep       6.1ms       3.6ms  -41.8% faster    [10.9KB/10.9KB]
data-dap2-10-timesteps          6.3ms       3.7ms  -41.8% faster    [104.1KB/104.1KB]
data-dap2-zonal-transect        6.2ms       3.4ms  -45.2% faster    [891B/890B]
data-dap2-strided                 N/A       3.6ms                [0B/14.1KB]
data-dap4-scalar                  N/A       3.8ms                [0B/3.8KB]
data-dap4-single-timestep         N/A       3.8ms                [0B/14.4KB]
data-dap4-10-timesteps            N/A       4.2ms                [0B/107.6KB]
-------------------------------------------------------------

=== ifs dataset === (icechunk data stored in cloudflare R2)

Metric: mean_latency_s
Baseline: feature/dap4@b2eca17 (release_ifs) (release_ifs.json)

  [1] feature/dap4@b2eca17 (release_ifs) (baseline)  2026-03-04T04:49:41.237493+00:00
  [2] feature/dap4@b2eca17 (dap4_ifs)  2026-03-04T05:08:19.784679+00:00

Scenario                          [1]         [2]       delta
-------------------------------------------------------------
metadata-dds-full              24.7ms       8.0ms  -67.7% faster    [3.2KB/3.1KB]
metadata-das-full              14.9ms       2.5ms  -83.3% faster    [5.5KB/5.4KB]
metadata-ver                      N/A       0.6ms                [0B/37B]
metadata-dmr-full                 N/A       3.0ms                [0B/11.9KB]
metadata-dsr                      N/A       0.6ms                [0B/1.1KB]
metadata-dds-projected          4.5ms       2.2ms  -50.0% faster    [301B/302B]
data-dap2-scalar              411.2ms     390.2ms  -5.1% faster    [377B/365B]
data-dap2-single-timestep    3367.4ms    3329.8ms  -1.1% ~same    [37.2KB/37.0KB]
data-dap2-10-timesteps      30376.1ms   30115.8ms  -0.9% ~same    [362.9KB/362.7KB]
data-dap2-zonal-transect      428.7ms     413.9ms  -3.5% ~same    [41.0KB/41.0KB]
data-dap2-two-var             778.7ms     864.4ms  +11.0% slower    [725B/706B]
data-dap2-strided                 N/A    8071.9ms                [0B/26.6KB]
data-dap4-scalar                  N/A     432.5ms                [0B/3.6KB]
data-dap4-single-timestep         N/A    3381.4ms                [0B/40.3KB]
data-dap4-10-timesteps            N/A   30107.4ms                [0B/365.9KB]
data-dap4-multi-var               N/A     832.3ms                [0B/4.4KB]
-------------------------------------------------------------

I would say these are not amazing but at least they are not a step backwards!

jhamman and others added 6 commits March 3, 2026 21:20
- TestSlabFallbackPaths: exercise _xdr_encode_slab_data (Byte, Int16,
  UInt16, float64, scalar branches), _load_and_encode_slab_xdr and
  _load_and_encode_slab_dap4 numpy fallback paths
- TestDaskGraphBytePadding: verify uint8 dask-graph fast path padding
  (both padding-needed and no-padding cases)
- TestNoDaskFallbacks: mock HAS_DASK=False to cover get_slab_boundaries,
  _is_dask_graph_eligible, load_variable, and load_dataset_async fallbacks

Coverage: 99% (1187 stmts, 3 missed — import guard and unreachable reshape)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Emit all coordinates as top-level arrays in DDS/DODS (standard DAP2
  behavior), not just orphan coordinates. Coordinates are still emitted
  as Grid Maps too, matching reference servers like Hyrax.
- Remove _get_grid_map_coords helper and grid_map_coords filtering.
- Add pytest-asyncio to dev dependencies (required by asyncio_mode=auto).
- Exclude tests/integration from main CI job (dedicated jobs exist).
- Remove xpublish from mypy additional_dependencies to fit 250MB tier.
- Update test binary offset assertions for new top-level coord data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Exclude benchmarks/ and noxfile.py from mypy and ruff (tooling scripts)
- Add test-appropriate ruff suppressions: A002, B905, E501
- Fix mypy issues in test_slab_streaming.py (type annotation, arg-type)
- Fix ruff issues in library code (unused vars, long docstring, noqa)
- Remove unused dap_type assignment in dap4/data.py slab streaming
- Add pydap to mypy ignore_missing_imports list
- Apply add-trailing-comma and ruff-format auto-fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add prune benchmarks/ and exclude DESIGN.md/PRD.md to MANIFEST.in
so check-manifest passes. Also exclude *.pyc globally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abkfenris
Copy link
Copy Markdown
Member

/Keanu Woah

This is gonna take a bit to poke through, but some thoughts so far:

  • Multi-language testing is great! That was definitely something that hurt previously, and hearing about weird errors in specific languages with no ways to replicate those tests wasn't fun (here and elsewhere in Xpublish).
  • I'm not sure what the current standard is for this, but it feels like some of the files for directing Claude/tracking it's progress through this work are mixed in where they might appear to apply for more general development. It feels like they are useful records, but maybe should be filed not at the top level.
    • Benchmarks are a little specific to this PR as well, but it would be cool to have them included as a part of the CI pipeline, even if slower PRs don't cause a failure.
  • I'd love to see the prompts/sessions. Both to understand how to drive agents better, but I'm also thinking about how to do some OGC spec driven development on Xpublish-EDR to get that up to snuff. Also for a place to start with datatree comparability once we figure out what the overall Xpublish story might be for it.
  • In some ways this feels like its so big that we're committing to all future development likely requiring at least some agentic assistance due to the amount that needs to be understood. Do we need AGENTS.md or similar files to help direct their development?
  • I did not realize Github PR files view could break this chaotically so much with big PRs.

Copy link
Copy Markdown
Member

@abkfenris abkfenris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pondering through plugin.py and using that as a jumping board to start understanding the rest.

Comment on lines +70 to +122
def _extract_constraint(request: Request) -> str:
"""Extract the constraint expression from the request URL query."""
query = request.url.query or ""
return unquote(query)

def _parse_constraint(raw: str) -> Constraint:
"""Parse a raw constraint string into a Constraint object."""
if not raw:
return Constraint()
return parse_dap2_constraint(raw)

def _dap2_headers(response_type: str) -> dict[str, str]:
"""Build DAP2 response headers."""
headers = dict(DAP2_HEADERS)
if response_type in CONTENT_DESCRIPTIONS:
headers["Content-Description"] = CONTENT_DESCRIPTIONS[response_type]
return headers

def _error_response(error: DapError, status_code: int = 400) -> Response:
"""Build a DAP2 error response."""
body = "".join(generate_error(error.code, error.message))
return Response(
content=body,
media_type=CONTENT_TYPES["error"],
status_code=status_code,
headers=_dap2_headers("error"),
)

def _dap4_response_headers() -> dict[str, str]:
"""Build DAP4 response headers."""
return dict(DAP4_HEADERS)

def _dap4_error_response(error: DapError, status_code: int = 400) -> Response:
"""Build a DAP4 error response."""
body = generate_dap4_error(error.code, error.message, http_code=status_code)
return Response(
content=body,
media_type=DAP4_CONTENT_TYPES["error"],
status_code=status_code,
headers=_dap4_response_headers(),
)

def _extract_dap4_constraint(request: Request) -> str:
"""Extract the DAP4 CE from the dap4.ce query parameter."""
params = parse_qs(request.url.query or "")
ce_values = params.get("dap4.ce", [])
return ce_values[0] if ce_values else ""

def _parse_dap4_constraint(raw: str) -> Constraint:
"""Parse a raw DAP4 constraint string into a Constraint object."""
if not raw:
return Constraint()
return parse_dap4_constraint(raw)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be in the OpenDapPlugin class? They make it a little more confusing to read, and possibly test.

Comment on lines +152 to +169
@router.get(".das")
async def das_response(
request: Request,
ds: xr.Dataset = Depends(deps.dataset),
) -> Response:
"""DAP2 Dataset Attribute Structure response."""
try:
raw_constraint = _extract_constraint(request)
constraint = _parse_constraint(raw_constraint)
plan = plan_subsetting(ds, constraint)
subsetted = apply_plan(ds, plan)

return dataset
body = "".join(generate_das(subsetted))
return Response(
content=body,
media_type=CONTENT_TYPES["das"],
headers=_dap2_headers("das"),
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the responses could be streamed if the media types and headers are hoisted to the @router.get(), though maybe I'm reading where the errors are coming from wrong. I think this will also give more info for API docs clarity.

Suggested change
@router.get(".das")
async def das_response(
request: Request,
ds: xr.Dataset = Depends(deps.dataset),
) -> Response:
"""DAP2 Dataset Attribute Structure response."""
try:
raw_constraint = _extract_constraint(request)
constraint = _parse_constraint(raw_constraint)
plan = plan_subsetting(ds, constraint)
subsetted = apply_plan(ds, plan)
return dataset
body = "".join(generate_das(subsetted))
return Response(
content=body,
media_type=CONTENT_TYPES["das"],
headers=_dap2_headers("das"),
)
@router.get(".das", media_type=CONTENT_TYPES["das"], headers=_dap2_headers("das"), response_class=StreamingResponse)
async def das_response(
request: Request,
ds: xr.Dataset = Depends(deps.dataset),
) -> Response:
"""DAP2 Dataset Attribute Structure response."""
try:
raw_constraint = _extract_constraint(request)
constraint = _parse_constraint(raw_constraint)
plan = plan_subsetting(ds, constraint)
subsetted = apply_plan(ds, plan)
yield from generate_das(subsetted))

Comment on lines +124 to +150
@router.get(".dds")
async def dds_response(
request: Request,
dataset_id: str = "default",
ds: xr.Dataset = Depends(deps.dataset),
cache: cachey.Cache = Depends(deps.cache),
) -> dap.Dataset:
"""Get a dataset that has been translated to opendap."""
# get cached dataset if it exists
cache_key = f"opendap_dataset_{dataset_id}"
dataset = cache.get(cache_key)
) -> Response:
"""DAP2 Dataset Descriptor Structure response."""
try:
raw_constraint = _extract_constraint(request)
constraint = _parse_constraint(raw_constraint)
plan = plan_subsetting(ds, constraint)
subsetted = apply_plan(ds, plan)

# if not, convert the xarray dataset to opendap
if dataset is None:
dataset = dap_xarray.dap_dataset(ds, dataset_id)
body = "".join(generate_dds(subsetted, dataset_id))
return Response(
content=body,
media_type=CONTENT_TYPES["dds"],
headers=_dap2_headers("dds"),
)
except DapError as e:
return _error_response(e)
except Exception:
logger.exception("Unexpected error in DDS handler")
return _error_response(
DapError(code=500, message="Internal server error"),
status_code=500,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be cleaner to split the shared planning steps off into their own dependency and include the exception handling they might trigger?

Suggested change
@router.get(".dds")
async def dds_response(
request: Request,
dataset_id: str = "default",
ds: xr.Dataset = Depends(deps.dataset),
cache: cachey.Cache = Depends(deps.cache),
) -> dap.Dataset:
"""Get a dataset that has been translated to opendap."""
# get cached dataset if it exists
cache_key = f"opendap_dataset_{dataset_id}"
dataset = cache.get(cache_key)
) -> Response:
"""DAP2 Dataset Descriptor Structure response."""
try:
raw_constraint = _extract_constraint(request)
constraint = _parse_constraint(raw_constraint)
plan = plan_subsetting(ds, constraint)
subsetted = apply_plan(ds, plan)
# if not, convert the xarray dataset to opendap
if dataset is None:
dataset = dap_xarray.dap_dataset(ds, dataset_id)
body = "".join(generate_dds(subsetted, dataset_id))
return Response(
content=body,
media_type=CONTENT_TYPES["dds"],
headers=_dap2_headers("dds"),
)
except DapError as e:
return _error_response(e)
except Exception:
logger.exception("Unexpected error in DDS handler")
return _error_response(
DapError(code=500, message="Internal server error"),
status_code=500,
)
def dap2_plan(request: Request, ds: xr.Dataset = Depends(deps.dataset)):
"""Plan a DAP2 query"""
try:
raw_constraint = _extract_constraint(request)
constraint = _parse_constraint(raw_constraint)
plan = plan_subsetting(ds, constraint)
except DapError as e:
return _error_response(e)
return plan
@router.get(".dds")
async def dds_response(
dataset_id: str = "default",
ds: xr.Dataset = Depends(deps.dataset),
plan = Depends(dap2_plan)
) -> Response:
"""DAP2 Dataset Descriptor Structure response."""
try:
subsetted = apply_plan(ds, plan)
body = "".join(generate_dds(subsetted, dataset_id))
return Response(
content=body,
media_type=CONTENT_TYPES["dds"],
headers=_dap2_headers("dds"),
)
except Exception:
logger.exception("Unexpected error in DDS handler")
return _error_response(
DapError(code=500, message="Internal server error"),
status_code=500,
)

Comment on lines +17 to +41
from xpublish_opendap.dap.constraint import (
Constraint,
parse_dap2_constraint,
parse_dap4_constraint,
)
from fastapi.responses import StreamingResponse
from xpublish import (
Dependencies,
Plugin,
hookimpl,
from xpublish_opendap.dap.dap2.das import generate_das
from xpublish_opendap.dap.dap2.dds import generate_dds
from xpublish_opendap.dap.dap2.dods import generate_dods
from xpublish_opendap.dap.dap2.headers import (
CONTENT_DESCRIPTIONS,
CONTENT_TYPES,
DAP2_HEADERS,
)
from xpublish_opendap.dap.dap2.responses import (
generate_error,
generate_help,
generate_version,
)
from xpublish_opendap.dap.dap4.data import generate_dap4_data
from xpublish_opendap.dap.dap4.dmr import generate_dmr
from xpublish_opendap.dap.dap4.headers import CONTENT_TYPES as DAP4_CONTENT_TYPES
from xpublish_opendap.dap.dap4.headers import DAP4_HEADERS
from xpublish_opendap.dap.dap4.responses import generate_dap4_error, generate_dsr
from xpublish_opendap.dap.xarray_adapter import apply_plan, plan_subsetting
from xpublish_opendap.errors import DapError, RequestTooLargeError
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier to keep track of related functionality if instead of importing individual functions, the namespaces were imported.

Suggested change
from xpublish_opendap.dap.constraint import (
Constraint,
parse_dap2_constraint,
parse_dap4_constraint,
)
from fastapi.responses import StreamingResponse
from xpublish import (
Dependencies,
Plugin,
hookimpl,
from xpublish_opendap.dap.dap2.das import generate_das
from xpublish_opendap.dap.dap2.dds import generate_dds
from xpublish_opendap.dap.dap2.dods import generate_dods
from xpublish_opendap.dap.dap2.headers import (
CONTENT_DESCRIPTIONS,
CONTENT_TYPES,
DAP2_HEADERS,
)
from xpublish_opendap.dap.dap2.responses import (
generate_error,
generate_help,
generate_version,
)
from xpublish_opendap.dap.dap4.data import generate_dap4_data
from xpublish_opendap.dap.dap4.dmr import generate_dmr
from xpublish_opendap.dap.dap4.headers import CONTENT_TYPES as DAP4_CONTENT_TYPES
from xpublish_opendap.dap.dap4.headers import DAP4_HEADERS
from xpublish_opendap.dap.dap4.responses import generate_dap4_error, generate_dsr
from xpublish_opendap.dap.xarray_adapter import apply_plan, plan_subsetting
from xpublish_opendap.errors import DapError, RequestTooLargeError
from xpublish_opendap.dap import constraint, dap2, dap4, xarray_adapter
from xpublish_opendap.errors import DapError, RequestTooLargeError

ianhi added a commit to earth-mover/icechunk that referenced this pull request Mar 6, 2026
## Description

Adds an AI usage policy so that we have something to refer to going
forward. Also a first effort at codifying where we feel the boundary
should be. I tried to strike a balance of recognizing that we use agents
quite successfully on this project, while trying to disallow
pathological applications.

I did want to leave an escape hatch of a large AI generated PR for use
in large restructuring PRs such as -
xpublish-community/xpublish-opendap#100 or for
example anythinng we may do for converting to mystmd. With a requirement
that there be discussion prior to opening.

Ultimately the core of this has to come down to honest an open
communication from the submitter and the reviewer.

## AI Usage Disclosure


- [x] AI tools were used in this contribution

I used claude to help gather and research other proejct AI policies. I
also used claude for spell and grammar checking as well stuff like the
mkdocs toctree.

- [Matplotlib — Restrictions on Generative AI
Usage](https://matplotlib.org/devdocs/devel/contribute.html#generative-ai)
-
[melissawm/open-source-ai-contribution-policies](https://github.com/melissawm/open-source-ai-contribution-policies)
- [Scientific Python blog — Community Considerations Around AI
Contributions](https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/)

---------

Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com>
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants