Rewrite xpublish-opendap with native DAP2/DAP4 protocol layer#100
Rewrite xpublish-opendap with native DAP2/DAP4 protocol layer#100jhamman wants to merge 27 commits into
Conversation
Replace the opendap-protocol dependency with an in-repo DAP2 implementation
providing full protocol conformance, async I/O, and memory management.
New modules (bottom-up dependency order):
- dap/types.py: DapType dataclass, numpy↔DAP2 type mapping (15 dtypes)
- errors.py: DapError hierarchy (5 error types with codes)
- dap/constraint.py: recursive-descent DAP2 constraint expression parser
- dap/xarray_adapter.py: constraint→xarray subsetting with memory estimation
- dap/dap2/{dds,das,dods}.py: DDS/DAS text and DODS XDR binary generators
- dap/dap2/{responses,headers}.py: Error/Version/Help responses, HTTP headers
- io.py: async infrastructure (thread pool, semaphores, load_dataset_async)
- plugin.py: fully async handlers with constraint-first pipeline
Key improvements over the old implementation:
- All six DAP2 response types (.dds, .das, .dods, .ver, .help, error)
- Proper HTTP headers (Content-Description, XDODS-Server)
- Constraint-first subsetting (applied before data loading)
- Memory threshold enforcement (413 for oversized requests)
- Async data loading off the event loop
- Streaming DODS responses
- DAP error responses for constraint/variable/index errors
104 tests passing (27 type, 15 parser, 13 adapter, 19 protocol, 24 route, 6 legacy).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement full DAP4 protocol layer alongside the existing DAP2 support: - DMR (Dataset Metadata Response) XML generation with DAP4 type mapping - DAP4 binary data responses with chunked serialization and CRC32 checksums - DSR (Dataset Services Response) endpoint - DAP4 constraint expression parsing (semicolons, leading slashes, filters) - DAP4 error responses in XML format - DAP4-specific HTTP headers (XDAP: 4.0) New routes: .dmr, .dap, .dsr on the plugin router. Includes 191 unit tests covering DMR generation, binary encoding, constraint parsing, type mapping, and route handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test real DAP clients against a live xpublish server using the existing xprocess-managed server fixture. Covers six client types: - httpx: raw HTTP validation of DAP2 and DAP4 status codes, headers, content types, constraint expressions, and XML structure (14 tests) - netCDF4-python: dimensions, variables, coordinates, attributes (8 tests) - pydap: DAP2 and DAP4 client access with data validation (8 tests) - xarray: netcdf4 and pydap engine integration (7 tests) - ncdump: subprocess header, variable, and attribute inspection (3 tests) - ncks (NCO): metadata listing, variable extraction, hyperslab (3 tests) 35 tests pass, 6 are xfail due to a known bug: the server does not resolve DAP2 Grid member paths (e.g. air.air[0][0][0]) that clients send when fetching data from Grid-typed variables. Also registers the "integration" pytest marker and adds nco to CI deps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DAP2 clients (netCDF4-C, pydap, ncks) send Grid-qualified constraint paths like `air.air[0][0][0]` for variables declared as Grid types in the DDS. The server rejected these with VariableNotFoundError because it looked up `air.air` verbatim against xarray variable names. Add `_resolve_grid_path()` helper that resolves dotted Grid member paths back to xarray variable names (e.g. `air.air` → `air` for Array members, `air.time` → `time` for Map members). Remove xfail markers from the 5 integration tests that now pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compare ncks output against both a local netCDF file and the OPeNDAP endpoint to verify actual data values match, not just dimensions/exit codes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add integration tests that exercise xpublish-opendap's DAP2 responses through R (ncdf4) and Julia (NCDatasets.jl), both of which bind to the netCDF-C library's OPeNDAP client — a completely separate code path from PyDAP. Each language has 7 test cases (open, dimensions, variables, variable_shape, read_lat, read_data_slice, attributes) dispatched via subprocess with per-test granularity. Tests skip gracefully when Rscript/julia are not on PATH. CI jobs added for both languages on ubuntu-latest, independent of the main 3x3 matrix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… and error bodies Adds ~59 new tests (249 total unit tests) covering previously unexercised encoding paths for integer types, strings, booleans, and datetime coords across DAP2 and DAP4 protocols. Also covers DAS attribute formatting for arrays, booleans, NaN/Inf/empty strings, and validates DAP4 error XML structure including httpcode attributes and error codes. New test classes: - TestDODSTypeEncoding: XDR encoding for int8/16/32, uint16/32, string, bool, scalar - TestMultiTypeDDS/DAS/DODS: end-to-end with multi-dtype dataset fixture - TestInt32/UInt32/Int8/UInt16/Bool/UInt64Max Encoding: DAP4 native binary - TestEdgeCaseSubsetting: single-element hyperslabs, large strides, coord auto-inclusion - TestDAP4ErrorResponseBody: error XML structure validation - TestDAP4HeaderConstants: protocol header constants (new file) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e gaps Add 61 new tests covering async data loading (io.py), streaming responses, constraint/query edge cases, cross-protocol DAP2/DAP4 consistency, and special data types (timedelta, NaT, bytes/unicode, empty arrays, lossy int64 encoding). Enable pytest asyncio_mode=auto in pyproject.toml. New files: tests/test_io.py (15 tests), tests/test_async_streaming.py (7 tests) Extended: test_dap2_responses (+9), test_plugin_routes (+11), test_dap4_routes (+6), test_dap4_data (+5), test_xarray_adapter (+2), test_types (+1). Total: 310 tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
28 new tests covering DMR attribute formatting, constraint parser edge cases, hyperslab validation branches, memory estimation fallbacks, DAS type fallbacks, scalar coordinate DDS, exception handlers in plugin routes, and float16 dtype resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove ~38 lines of unreachable code: unused parser methods in constraint.py, integer fallback-by-itemsize branches in types.py, unreachable try/except blocks in das.py and dmr.py, and an unreachable else branch in dods.py. Add 3 tests covering the remaining defensive fallbacks (complex ndarray attr, unknown type attr, version import). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Docstring capitalization (D403), import reformatting, and blank line removal from pre-commit formatter run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…asymmetry - Add # noqa comments for acceptable ruff violations (PLW2901, PLR2004, PLR0911) in type-dispatch and parser functions - Remove unnecessary # noqa: PLR0911, PLR0912 from resolve_dap_type after dead code removal reduced return/branch count - Fix remaining ruff violations: D301 raw docstrings in dods.py, B905 strict= in xarray_adapter.py, D107 noqa for error __init__ methods - Replace assert dap_type.dap4_name with explicit ValueError guards in dmr.py (assert is stripped by python -O) - Fix attrs asymmetry: use cf_encode_variable for data variables in das.py and dmr.py so datetime data vars get CF-encoded attrs - Move TestVersionImportFallback from test_types.py to test_init.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add [tool.mypy] strict config to pyproject.toml with per-module overrides for untyped third-party libs (xpublish, pluggy, cachey) - Update pre-commit to install type-checking dependencies (numpy, xarray, fastapi, pydantic, xpublish) and remove --ignore-missing-imports - Fix Hashable→str casts for xarray dimension/variable names in xarray_adapter.py, dds.py, and dmr.py - Fix Optional[str] return types in dmr._attribute_dap4_type by adding dap4_name None guards (fall back to 'String') - Add type: ignore comments for untyped xpublish/pluggy in plugin.py - Add 5 tests for all dap4_name-is-None defensive guards in dmr.py, restoring 100% test coverage Verification: mypy --strict (0 errors), ruff (0 errors), pytest (346 passed, 100% coverage) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable mypy strict checking on test files via pyproject.toml overrides (disallow_untyped_defs/calls relaxed for tests). Add type narrowing assertions for Optional fields (plan.variables, item.slices, XML Element.find results). Expand ruff per-file-ignores for tests (D100-D103, PLC0415, PLR2004, PLW1510) and exclude auto-generated _version.py. Format all test files with ruff format and fix E741 ambiguous variable names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…work Add docs/FEATURE_COMPLETENESS.md comparing the implementation against the DAP2/DAP4 specs and Hyrax reference implementation. Covers routes, data types, container types, constraint expressions, binary encoding, error handling, and client compatibility. Rewrite TODO.txt to list only feasible future work items, excluding features incompatible with xarray's data model (Sequence, nested Structure, Enum, Opaque). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…licate coords - Add dask threaded scheduler for parallel chunk loading (3-8x speedup for multi-chunk remote datasets) - Refactor DODS/DAP4 generators to load+encode per-variable in thread pool executor, keeping all data loading and numpy encoding off the event loop. Coordinates are cached and reused for Grid Map vectors. - Remove duplicate coordinate serialization in DDS/DODS: coordinates used as Grid Maps are only emitted inside Grid declarations, not also as top-level arrays - Add load_variable() helper and dask_num_workers config to OpenDapPlugin - Move benchmarks into xpublish-opendap repo with response size display Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Slab streaming: for large dask-backed arrays, split along the most-chunked dimension and yield encoded bytes per-slab via async generators. This bounds HTTP response memory to slab size rather than full variable size. Includes prefetch-1 pipeline for both DAP2 and DAP4. Dask-graph fast path: for dask-backed numeric arrays, fuse dtype conversion into the dask graph (ravel → rechunk 20 MB → block.astype(wire_dtype).compute) instead of load-then-encode. Benchmarks show ~37% faster on chunked cloud data. Falls back to the existing eager path for numpy, strings, datetime, and scalars. Fix datetime slab streaming bug: CF encoding is data-dependent (each slab picks its own reference time), so datetime64/timedelta64 arrays now bypass slab streaming entirely. Also fixed pre-existing mypy errors in io.py and dods.py (Hashable → str casts at xarray boundaries, generic type parameters). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Black quote normalization (single → double), trailing comma insertion, and line wrapping. No logic changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ab fixes - bench.sh: isolate benchmark runner in temp directory so release and dap4 runs use their own installed packages instead of sharing the source tree; organize results into timestamped subdirectories - scenarios.py: add expected_min_bytes validation per scenario; fix DAP2 hyperslabs to two-value syntax for opendap-protocol compat; remove constraint-parsing-only scenarios - run.py: validate response size against expected_min_bytes, flag failures - results.py: add human-readable response size to comparison output - compare.py: auto-formatting only (quote normalization) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- benchmarks/io_bench.py: standalone benchmark isolating the data loading + encoding pipeline without HTTP overhead; tests eager, slab-prefetch, and dask-graph strategies to profile where time is spent - docs/design/streaming-io.md: design document exploring the data path for lazy loading and streaming cloud-backed datasets Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reconcile all design documents with the actual codebase: - DESIGN.md: Fix function signatures (generate_dds/das/dods/dmr/dap4_data), replace load_variables_async with actual io.py API, document slab streaming and dask-graph fast path, correct directory layout, update data flow diagrams - FEATURE_COMPLETENESS.md: Fix XDAP header status, DAP4 string encoding (uint64 not int64), add R/Julia/httpx to client compatibility table - streaming-io.md: Mark Strategy 2 as implemented, add dask-graph fast path section, update recommendation to reflect what was built - benchmarks/README.md: Fix scenario count (16 not 21) and categories (3 not 4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Some examples from our the latest benchmarks: run on my laptop at home so all the caveats apply w.r.t. network bandwidth, etc. I would say these are not amazing but at least they are not a step backwards! |
- TestSlabFallbackPaths: exercise _xdr_encode_slab_data (Byte, Int16, UInt16, float64, scalar branches), _load_and_encode_slab_xdr and _load_and_encode_slab_dap4 numpy fallback paths - TestDaskGraphBytePadding: verify uint8 dask-graph fast path padding (both padding-needed and no-padding cases) - TestNoDaskFallbacks: mock HAS_DASK=False to cover get_slab_boundaries, _is_dask_graph_eligible, load_variable, and load_dataset_async fallbacks Coverage: 99% (1187 stmts, 3 missed — import guard and unreachable reshape) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Emit all coordinates as top-level arrays in DDS/DODS (standard DAP2 behavior), not just orphan coordinates. Coordinates are still emitted as Grid Maps too, matching reference servers like Hyrax. - Remove _get_grid_map_coords helper and grid_map_coords filtering. - Add pytest-asyncio to dev dependencies (required by asyncio_mode=auto). - Exclude tests/integration from main CI job (dedicated jobs exist). - Remove xpublish from mypy additional_dependencies to fit 250MB tier. - Update test binary offset assertions for new top-level coord data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Exclude benchmarks/ and noxfile.py from mypy and ruff (tooling scripts) - Add test-appropriate ruff suppressions: A002, B905, E501 - Fix mypy issues in test_slab_streaming.py (type annotation, arg-type) - Fix ruff issues in library code (unused vars, long docstring, noqa) - Remove unused dap_type assignment in dap4/data.py slab streaming - Add pydap to mypy ignore_missing_imports list - Apply add-trailing-comma and ruff-format auto-fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Add prune benchmarks/ and exclude DESIGN.md/PRD.md to MANIFEST.in so check-manifest passes. Also exclude *.pyc globally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/Keanu Woah This is gonna take a bit to poke through, but some thoughts so far:
|
abkfenris
left a comment
There was a problem hiding this comment.
Some pondering through plugin.py and using that as a jumping board to start understanding the rest.
| def _extract_constraint(request: Request) -> str: | ||
| """Extract the constraint expression from the request URL query.""" | ||
| query = request.url.query or "" | ||
| return unquote(query) | ||
|
|
||
| def _parse_constraint(raw: str) -> Constraint: | ||
| """Parse a raw constraint string into a Constraint object.""" | ||
| if not raw: | ||
| return Constraint() | ||
| return parse_dap2_constraint(raw) | ||
|
|
||
| def _dap2_headers(response_type: str) -> dict[str, str]: | ||
| """Build DAP2 response headers.""" | ||
| headers = dict(DAP2_HEADERS) | ||
| if response_type in CONTENT_DESCRIPTIONS: | ||
| headers["Content-Description"] = CONTENT_DESCRIPTIONS[response_type] | ||
| return headers | ||
|
|
||
| def _error_response(error: DapError, status_code: int = 400) -> Response: | ||
| """Build a DAP2 error response.""" | ||
| body = "".join(generate_error(error.code, error.message)) | ||
| return Response( | ||
| content=body, | ||
| media_type=CONTENT_TYPES["error"], | ||
| status_code=status_code, | ||
| headers=_dap2_headers("error"), | ||
| ) | ||
|
|
||
| def _dap4_response_headers() -> dict[str, str]: | ||
| """Build DAP4 response headers.""" | ||
| return dict(DAP4_HEADERS) | ||
|
|
||
| def _dap4_error_response(error: DapError, status_code: int = 400) -> Response: | ||
| """Build a DAP4 error response.""" | ||
| body = generate_dap4_error(error.code, error.message, http_code=status_code) | ||
| return Response( | ||
| content=body, | ||
| media_type=DAP4_CONTENT_TYPES["error"], | ||
| status_code=status_code, | ||
| headers=_dap4_response_headers(), | ||
| ) | ||
|
|
||
| def _extract_dap4_constraint(request: Request) -> str: | ||
| """Extract the DAP4 CE from the dap4.ce query parameter.""" | ||
| params = parse_qs(request.url.query or "") | ||
| ce_values = params.get("dap4.ce", []) | ||
| return ce_values[0] if ce_values else "" | ||
|
|
||
| def _parse_dap4_constraint(raw: str) -> Constraint: | ||
| """Parse a raw DAP4 constraint string into a Constraint object.""" | ||
| if not raw: | ||
| return Constraint() | ||
| return parse_dap4_constraint(raw) |
There was a problem hiding this comment.
Do these need to be in the OpenDapPlugin class? They make it a little more confusing to read, and possibly test.
| @router.get(".das") | ||
| async def das_response( | ||
| request: Request, | ||
| ds: xr.Dataset = Depends(deps.dataset), | ||
| ) -> Response: | ||
| """DAP2 Dataset Attribute Structure response.""" | ||
| try: | ||
| raw_constraint = _extract_constraint(request) | ||
| constraint = _parse_constraint(raw_constraint) | ||
| plan = plan_subsetting(ds, constraint) | ||
| subsetted = apply_plan(ds, plan) | ||
|
|
||
| return dataset | ||
| body = "".join(generate_das(subsetted)) | ||
| return Response( | ||
| content=body, | ||
| media_type=CONTENT_TYPES["das"], | ||
| headers=_dap2_headers("das"), | ||
| ) |
There was a problem hiding this comment.
I think the responses could be streamed if the media types and headers are hoisted to the @router.get(), though maybe I'm reading where the errors are coming from wrong. I think this will also give more info for API docs clarity.
| @router.get(".das") | |
| async def das_response( | |
| request: Request, | |
| ds: xr.Dataset = Depends(deps.dataset), | |
| ) -> Response: | |
| """DAP2 Dataset Attribute Structure response.""" | |
| try: | |
| raw_constraint = _extract_constraint(request) | |
| constraint = _parse_constraint(raw_constraint) | |
| plan = plan_subsetting(ds, constraint) | |
| subsetted = apply_plan(ds, plan) | |
| return dataset | |
| body = "".join(generate_das(subsetted)) | |
| return Response( | |
| content=body, | |
| media_type=CONTENT_TYPES["das"], | |
| headers=_dap2_headers("das"), | |
| ) | |
| @router.get(".das", media_type=CONTENT_TYPES["das"], headers=_dap2_headers("das"), response_class=StreamingResponse) | |
| async def das_response( | |
| request: Request, | |
| ds: xr.Dataset = Depends(deps.dataset), | |
| ) -> Response: | |
| """DAP2 Dataset Attribute Structure response.""" | |
| try: | |
| raw_constraint = _extract_constraint(request) | |
| constraint = _parse_constraint(raw_constraint) | |
| plan = plan_subsetting(ds, constraint) | |
| subsetted = apply_plan(ds, plan) | |
| yield from generate_das(subsetted)) |
| @router.get(".dds") | ||
| async def dds_response( | ||
| request: Request, | ||
| dataset_id: str = "default", | ||
| ds: xr.Dataset = Depends(deps.dataset), | ||
| cache: cachey.Cache = Depends(deps.cache), | ||
| ) -> dap.Dataset: | ||
| """Get a dataset that has been translated to opendap.""" | ||
| # get cached dataset if it exists | ||
| cache_key = f"opendap_dataset_{dataset_id}" | ||
| dataset = cache.get(cache_key) | ||
| ) -> Response: | ||
| """DAP2 Dataset Descriptor Structure response.""" | ||
| try: | ||
| raw_constraint = _extract_constraint(request) | ||
| constraint = _parse_constraint(raw_constraint) | ||
| plan = plan_subsetting(ds, constraint) | ||
| subsetted = apply_plan(ds, plan) | ||
|
|
||
| # if not, convert the xarray dataset to opendap | ||
| if dataset is None: | ||
| dataset = dap_xarray.dap_dataset(ds, dataset_id) | ||
| body = "".join(generate_dds(subsetted, dataset_id)) | ||
| return Response( | ||
| content=body, | ||
| media_type=CONTENT_TYPES["dds"], | ||
| headers=_dap2_headers("dds"), | ||
| ) | ||
| except DapError as e: | ||
| return _error_response(e) | ||
| except Exception: | ||
| logger.exception("Unexpected error in DDS handler") | ||
| return _error_response( | ||
| DapError(code=500, message="Internal server error"), | ||
| status_code=500, | ||
| ) |
There was a problem hiding this comment.
Might it be cleaner to split the shared planning steps off into their own dependency and include the exception handling they might trigger?
| @router.get(".dds") | |
| async def dds_response( | |
| request: Request, | |
| dataset_id: str = "default", | |
| ds: xr.Dataset = Depends(deps.dataset), | |
| cache: cachey.Cache = Depends(deps.cache), | |
| ) -> dap.Dataset: | |
| """Get a dataset that has been translated to opendap.""" | |
| # get cached dataset if it exists | |
| cache_key = f"opendap_dataset_{dataset_id}" | |
| dataset = cache.get(cache_key) | |
| ) -> Response: | |
| """DAP2 Dataset Descriptor Structure response.""" | |
| try: | |
| raw_constraint = _extract_constraint(request) | |
| constraint = _parse_constraint(raw_constraint) | |
| plan = plan_subsetting(ds, constraint) | |
| subsetted = apply_plan(ds, plan) | |
| # if not, convert the xarray dataset to opendap | |
| if dataset is None: | |
| dataset = dap_xarray.dap_dataset(ds, dataset_id) | |
| body = "".join(generate_dds(subsetted, dataset_id)) | |
| return Response( | |
| content=body, | |
| media_type=CONTENT_TYPES["dds"], | |
| headers=_dap2_headers("dds"), | |
| ) | |
| except DapError as e: | |
| return _error_response(e) | |
| except Exception: | |
| logger.exception("Unexpected error in DDS handler") | |
| return _error_response( | |
| DapError(code=500, message="Internal server error"), | |
| status_code=500, | |
| ) | |
| def dap2_plan(request: Request, ds: xr.Dataset = Depends(deps.dataset)): | |
| """Plan a DAP2 query""" | |
| try: | |
| raw_constraint = _extract_constraint(request) | |
| constraint = _parse_constraint(raw_constraint) | |
| plan = plan_subsetting(ds, constraint) | |
| except DapError as e: | |
| return _error_response(e) | |
| return plan | |
| @router.get(".dds") | |
| async def dds_response( | |
| dataset_id: str = "default", | |
| ds: xr.Dataset = Depends(deps.dataset), | |
| plan = Depends(dap2_plan) | |
| ) -> Response: | |
| """DAP2 Dataset Descriptor Structure response.""" | |
| try: | |
| subsetted = apply_plan(ds, plan) | |
| body = "".join(generate_dds(subsetted, dataset_id)) | |
| return Response( | |
| content=body, | |
| media_type=CONTENT_TYPES["dds"], | |
| headers=_dap2_headers("dds"), | |
| ) | |
| except Exception: | |
| logger.exception("Unexpected error in DDS handler") | |
| return _error_response( | |
| DapError(code=500, message="Internal server error"), | |
| status_code=500, | |
| ) |
| from xpublish_opendap.dap.constraint import ( | ||
| Constraint, | ||
| parse_dap2_constraint, | ||
| parse_dap4_constraint, | ||
| ) | ||
| from fastapi.responses import StreamingResponse | ||
| from xpublish import ( | ||
| Dependencies, | ||
| Plugin, | ||
| hookimpl, | ||
| from xpublish_opendap.dap.dap2.das import generate_das | ||
| from xpublish_opendap.dap.dap2.dds import generate_dds | ||
| from xpublish_opendap.dap.dap2.dods import generate_dods | ||
| from xpublish_opendap.dap.dap2.headers import ( | ||
| CONTENT_DESCRIPTIONS, | ||
| CONTENT_TYPES, | ||
| DAP2_HEADERS, | ||
| ) | ||
| from xpublish_opendap.dap.dap2.responses import ( | ||
| generate_error, | ||
| generate_help, | ||
| generate_version, | ||
| ) | ||
| from xpublish_opendap.dap.dap4.data import generate_dap4_data | ||
| from xpublish_opendap.dap.dap4.dmr import generate_dmr | ||
| from xpublish_opendap.dap.dap4.headers import CONTENT_TYPES as DAP4_CONTENT_TYPES | ||
| from xpublish_opendap.dap.dap4.headers import DAP4_HEADERS | ||
| from xpublish_opendap.dap.dap4.responses import generate_dap4_error, generate_dsr | ||
| from xpublish_opendap.dap.xarray_adapter import apply_plan, plan_subsetting | ||
| from xpublish_opendap.errors import DapError, RequestTooLargeError |
There was a problem hiding this comment.
It might be easier to keep track of related functionality if instead of importing individual functions, the namespaces were imported.
| from xpublish_opendap.dap.constraint import ( | |
| Constraint, | |
| parse_dap2_constraint, | |
| parse_dap4_constraint, | |
| ) | |
| from fastapi.responses import StreamingResponse | |
| from xpublish import ( | |
| Dependencies, | |
| Plugin, | |
| hookimpl, | |
| from xpublish_opendap.dap.dap2.das import generate_das | |
| from xpublish_opendap.dap.dap2.dds import generate_dds | |
| from xpublish_opendap.dap.dap2.dods import generate_dods | |
| from xpublish_opendap.dap.dap2.headers import ( | |
| CONTENT_DESCRIPTIONS, | |
| CONTENT_TYPES, | |
| DAP2_HEADERS, | |
| ) | |
| from xpublish_opendap.dap.dap2.responses import ( | |
| generate_error, | |
| generate_help, | |
| generate_version, | |
| ) | |
| from xpublish_opendap.dap.dap4.data import generate_dap4_data | |
| from xpublish_opendap.dap.dap4.dmr import generate_dmr | |
| from xpublish_opendap.dap.dap4.headers import CONTENT_TYPES as DAP4_CONTENT_TYPES | |
| from xpublish_opendap.dap.dap4.headers import DAP4_HEADERS | |
| from xpublish_opendap.dap.dap4.responses import generate_dap4_error, generate_dsr | |
| from xpublish_opendap.dap.xarray_adapter import apply_plan, plan_subsetting | |
| from xpublish_opendap.errors import DapError, RequestTooLargeError | |
| from xpublish_opendap.dap import constraint, dap2, dap4, xarray_adapter | |
| from xpublish_opendap.errors import DapError, RequestTooLargeError |
## Description Adds an AI usage policy so that we have something to refer to going forward. Also a first effort at codifying where we feel the boundary should be. I tried to strike a balance of recognizing that we use agents quite successfully on this project, while trying to disallow pathological applications. I did want to leave an escape hatch of a large AI generated PR for use in large restructuring PRs such as - xpublish-community/xpublish-opendap#100 or for example anythinng we may do for converting to mystmd. With a requirement that there be discussion prior to opening. Ultimately the core of this has to come down to honest an open communication from the submitter and the reviewer. ## AI Usage Disclosure - [x] AI tools were used in this contribution I used claude to help gather and research other proejct AI policies. I also used claude for spell and grammar checking as well stuff like the mkdocs toctree. - [Matplotlib — Restrictions on Generative AI Usage](https://matplotlib.org/devdocs/devel/contribute.html#generative-ai) - [melissawm/open-source-ai-contribution-policies](https://github.com/melissawm/open-source-ai-contribution-policies) - [Scientific Python blog — Community Considerations Around AI Contributions](https://blog.scientific-python.org/scientific-python/community-considerations-around-ai/) --------- Co-authored-by: Ryan Abernathey <ryan.abernathey@gmail.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
This is a ground-up rewrite of xpublish-opendap. It replaces the opendap-protocol dependency with a native in-repo protocol implementation, adds full DAP4 support, and introduces an async-first architecture with
memory-bounded streaming.
Note
This PR was built with Claude but not entirely by Claude. I wrote and edited much of the design docs and drove on the high degree of testing contained here. That said, there is a big diff in this PR. I don't expect any human to volunteer to review the whole thing. Instead, I think we should start using this in real world applications for a bit to build confidence in it before deciding if its worth merging.
What changed
xpublish_opendap/dap/): Clean-room implementation of DAP2 and DAP4 encoding, replacing the external opendap-protocol library. Includes a type system mapping 15 numpy dtypes to DAP wire types, arecursive-descent constraint expression parser, and xarray-to-DAP subsetting with memory estimation.
(.dmr, .dap, .dsr, error) alongside all six DAP2 types.
yield per-slab) to bound HTTP response memory. A dask-graph fast path fuses dtype conversion into the computation graph for chunked data.
Test suite
In my opinion, this is the best part of this PR. We now have a very thorough test suite
440 tests:
Benchmarks
New benchmark suite (benchmarks/) with ~21 scenarios covering metadata generation, data responses, and constraint parsing across multiple dataset sizes. Includes an isolated IO benchmark (io_bench.py) for profiling the
encode pipeline without HTTP overhead, and a comparison workflow for regression detection between the release and feature branch.
Breaking changes