Skip to content

Add gdal mdim get-refs algorithm#14677

Draft
mdsumner wants to merge 2 commits into
OSGeo:masterfrom
mdsumner:feature/mdim-get-refs
Draft

Add gdal mdim get-refs algorithm#14677
mdsumner wants to merge 2 commits into
OSGeo:masterfrom
mdsumner:feature/mdim-get-refs

Conversation

@mdsumner
Copy link
Copy Markdown
Contributor

@mdsumner mdsumner commented May 28, 2026

gdal mdim get-refs command.

Creates a vector layer (no geometry) that lists the path, offset, size and info of chunk byte references from a multidimensional raster array.

The emitted table also includes dim_0, ...dim_n fields that identify the chunk's address (and allows filter queries), and present a boolean for whether the chunk exists.

Metadata is also recorded in the layer for ARRAY_NAME, DTYPE, DIM_0{:N}, and .papszInfo items in the form CODEC_%s eg. CODEC_COMPRESSION, CODEC_FILTER.

AI tool usage

  • AI Claude assisted

Tasklist

  • Make sure code is correctly formatted (cf pre-commit configuration)
  • Add test case(s)
  • Add documentation
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

Description

Usage: gdal mdim get-refs [OPTIONS] <INPUT> <OUTPUT>

Return byte references from a multidimensional raster source as vector/table layer.

Positional arguments:
  -i, --dataset, --input <INPUT>                       Input multidimensional raster dataset [required]
  -o, --output <OUTPUT>                                Output vector dataset [required]

Common Options:
  -h, --help                                           Display help message and exit
  --json-usage                                         Display usage as JSON document and exit
  --config <KEY>=<VALUE>                               Configuration option [may be repeated]
  -q, --quiet                                          Quiet mode (no progress bar)

Options:
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format [required]
  --array <ARRAY>                                      Name of the array, used to restrict the output to the specified array. [required]
  --overwrite                                          Whether overwriting existing output is allowed
  --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]

Advanced Options:
  --oo, --open-option <KEY>=<VALUE>                    Open options [may be repeated]
  --if, --input-format <INPUT-FORMAT>                  Input formats [may be repeated]

For more details, consult https://gdal.org/programs/gdal_mdim_get_refs.html

Example

gdal mdim get-refs --array /Band1 autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc \
    /tmp/Band1.parquet --of Parquet

gdal vector info /tmp/Band1.parquet --layer Band1 --features
INFO: Open of `/tmp/Band1.parquet'
      using driver `Parquet' successful.

Layer name: Band1
Metadata:
  ARRAY_NAME=/Band1
  DTYPE=Byte
  DIM_0_NAME=y
  DIM_0_SIZE=20
  DIM_0_BLOCK=10
  DIM_0_CHUNKS=2
  DIM_1_NAME=x
  DIM_1_SIZE=20
  DIM_1_BLOCK=10
  DIM_1_CHUNKS=2
  CODEC_COMPRESSION=DEFLATE
  CODEC_FILTER=SHUFFLE
Geometry: None
Feature Count: 4
Layer SRS WKT:
(unknown)
dim_0: Integer64 (0.0)
dim_1: Integer64 (0.0)
present: Integer(Boolean) (0.0)
path: String (0.0)
offset: Integer64 (0.0)
size: Integer64 (0.0)
info: String (0.0)
OGRFeature(Band1):0
  dim_0 (Integer64) = 0
  dim_1 (Integer64) = 0
  present (Integer(Boolean)) = 1
  path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc
  offset (Integer64) = 6108
  size (Integer64) = 86
  info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE

OGRFeature(Band1):1
  dim_0 (Integer64) = 0
  dim_1 (Integer64) = 1
  present (Integer(Boolean)) = 1
  path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc
  offset (Integer64) = 6194
  size (Integer64) = 74
  info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE

OGRFeature(Band1):2
  dim_0 (Integer64) = 1
  dim_1 (Integer64) = 0
  present (Integer(Boolean)) = 1
  path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc
  offset (Integer64) = 5959
  size (Integer64) = 74
  info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE

OGRFeature(Band1):3
  dim_0 (Integer64) = 1
  dim_1 (Integer64) = 1
  present (Integer(Boolean)) = 1
  path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc
  offset (Integer64) = 6033
  size (Integer64) = 75
  info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE

@mdsumner mdsumner marked this pull request as draft May 28, 2026 00:51
Copy link
Copy Markdown
Member

@rouault rouault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a bad idea, but clearly this doesn't follow our LLM tool policy. The code has clearly not been read carefully. A lot of comments are hyper verbose, with typical "LLM plan garbage" references. I feel some of the tests are also a bit more verbose than what would be strictly needed (although it might just be my aversion for being exposed to prior verbosity)

/******************************************************************************
*
* Project: GDAL
* Purpose: gdal "mdim get-refs" subcommand
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get-refs is a bit mysterious until you read the doc.
brainstorming: get-chunk-location, chunk-loc

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or chunk-info ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list-chunks or extract-chunks - I don't have a really strong feel here (get-chunk-location is absolutely fine so I'll go with that unless a stronger candidate or position arrives)

Comment thread apps/gdalalg_mdim_get_refs.cpp Outdated
} // namespace

/************************************************************************/
/* GDALMdimGetRefsAlgorithm() */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting issue. your pre-commit is not setup ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is setup, I will check

Comment thread apps/gdalalg_mdim_get_refs.cpp Outdated
bool GDALMdimGetRefsAlgorithm::RunImpl(GDALProgressFunc pfnProgress,
void *pProgressData)
{
// ----------------------------------------------------------------------
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all those "stage", "A1", "D6" etc are really LLM noise. Please remove in general.

And you can pretty much remove that particular 5-line comment for a 2 line effective code...

Comment thread apps/gdalalg_mdim_get_refs.cpp Outdated
auto poSrcDS = m_inputDataset.GetDatasetRef();
CPLAssert(poSrcDS);

// A2. GetRootGroup(). Null root => driver lacks mdim support => fail with a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make comment more succinct (remove the ==> fail part .... that's obvious from the code)

Comment thread apps/gdalalg_mdim_get_refs.cpp Outdated
auto poRootGroup = poSrcDS->GetRootGroup();
CPLDebug("MDIM-GET-REFS", "input: %s, root group: %s",
poSrcDS->GetDescription(), poRootGroup ? "present" : "NULL");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


* ``ARRAY_NAME`` — the fullname of the input array
* ``DTYPE`` — the array's data type (e.g. ``Int16``, ``Float32``)
* ``DIM_N_NAME``, ``DIM_N_SIZE``, ``DIM_N_BLOCK``, ``DIM_N_CHUNKS`` — for
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a global pass to replace UTF-8 dash by ASCII dash

* Only a single array can be emitted (``--array`` is required).
* Arrays without natural block size decline with a ``not chunk-enumerable`` error.
* The inline data payload is not extracted as a binary field.
* No geometry column is emitted.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* No geometry column is emitted.

-----------

* Only a single array can be emitted (``--array`` is required).
* Arrays without natural block size decline with a ``not chunk-enumerable`` error.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a limitation IMHO (in the sense there's nothing we can do about that)

Suggested change
* Arrays without natural block size decline with a ``not chunk-enumerable`` error.


.. code-block:: bash

ogrinfo ocean_chunks.parquet -sql \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ogrinfo ocean_chunks.parquet -sql \
gdal vector info ocean_chunks.parquet --features --sql \



@pytest.mark.require_driver("Parquet")
def test_gdalalg_mdim_get_refs_parquet_boolean_subtype(tmp_vsimem):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it exercice any code path not already exercised ? I don't think so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants