Add gdal mdim get-refs algorithm#14677
Conversation
rouault
left a comment
There was a problem hiding this comment.
not a bad idea, but clearly this doesn't follow our LLM tool policy. The code has clearly not been read carefully. A lot of comments are hyper verbose, with typical "LLM plan garbage" references. I feel some of the tests are also a bit more verbose than what would be strictly needed (although it might just be my aversion for being exposed to prior verbosity)
| /****************************************************************************** | ||
| * | ||
| * Project: GDAL | ||
| * Purpose: gdal "mdim get-refs" subcommand |
There was a problem hiding this comment.
get-refs is a bit mysterious until you read the doc.
brainstorming: get-chunk-location, chunk-loc
There was a problem hiding this comment.
list-chunks or extract-chunks - I don't have a really strong feel here (get-chunk-location is absolutely fine so I'll go with that unless a stronger candidate or position arrives)
| } // namespace | ||
|
|
||
| /************************************************************************/ | ||
| /* GDALMdimGetRefsAlgorithm() */ |
There was a problem hiding this comment.
formatting issue. your pre-commit is not setup ?
There was a problem hiding this comment.
it is setup, I will check
| bool GDALMdimGetRefsAlgorithm::RunImpl(GDALProgressFunc pfnProgress, | ||
| void *pProgressData) | ||
| { | ||
| // ---------------------------------------------------------------------- |
There was a problem hiding this comment.
all those "stage", "A1", "D6" etc are really LLM noise. Please remove in general.
And you can pretty much remove that particular 5-line comment for a 2 line effective code...
| auto poSrcDS = m_inputDataset.GetDatasetRef(); | ||
| CPLAssert(poSrcDS); | ||
|
|
||
| // A2. GetRootGroup(). Null root => driver lacks mdim support => fail with a |
There was a problem hiding this comment.
make comment more succinct (remove the ==> fail part .... that's obvious from the code)
| auto poRootGroup = poSrcDS->GetRootGroup(); | ||
| CPLDebug("MDIM-GET-REFS", "input: %s, root group: %s", | ||
| poSrcDS->GetDescription(), poRootGroup ? "present" : "NULL"); | ||
|
|
|
|
||
| * ``ARRAY_NAME`` — the fullname of the input array | ||
| * ``DTYPE`` — the array's data type (e.g. ``Int16``, ``Float32``) | ||
| * ``DIM_N_NAME``, ``DIM_N_SIZE``, ``DIM_N_BLOCK``, ``DIM_N_CHUNKS`` — for |
There was a problem hiding this comment.
make a global pass to replace UTF-8 dash by ASCII dash
| * Only a single array can be emitted (``--array`` is required). | ||
| * Arrays without natural block size decline with a ``not chunk-enumerable`` error. | ||
| * The inline data payload is not extracted as a binary field. | ||
| * No geometry column is emitted. |
There was a problem hiding this comment.
| * No geometry column is emitted. |
| ----------- | ||
|
|
||
| * Only a single array can be emitted (``--array`` is required). | ||
| * Arrays without natural block size decline with a ``not chunk-enumerable`` error. |
There was a problem hiding this comment.
not really a limitation IMHO (in the sense there's nothing we can do about that)
| * Arrays without natural block size decline with a ``not chunk-enumerable`` error. |
|
|
||
| .. code-block:: bash | ||
|
|
||
| ogrinfo ocean_chunks.parquet -sql \ |
There was a problem hiding this comment.
| ogrinfo ocean_chunks.parquet -sql \ | |
| gdal vector info ocean_chunks.parquet --features --sql \ |
|
|
||
|
|
||
| @pytest.mark.require_driver("Parquet") | ||
| def test_gdalalg_mdim_get_refs_parquet_boolean_subtype(tmp_vsimem): |
There was a problem hiding this comment.
does it exercice any code path not already exercised ? I don't think so
gdal mdim get-refs command.
Creates a vector layer (no geometry) that lists the
path,offset,sizeandinfoof chunk byte references from a multidimensional raster array.The emitted table also includes
dim_0, ...dim_nfields that identify the chunk's address (and allows filter queries), andpresenta boolean for whether the chunk exists.Metadata is also recorded in the layer for
ARRAY_NAME,DTYPE,DIM_0{:N}, and.papszInfoitems in the formCODEC_%seg.CODEC_COMPRESSION,CODEC_FILTER.AI tool usage
Tasklist
Description
Example
gdal mdim get-refs --array /Band1 autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc \ /tmp/Band1.parquet --of Parquet gdal vector info /tmp/Band1.parquet --layer Band1 --features INFO: Open of `/tmp/Band1.parquet' using driver `Parquet' successful. Layer name: Band1 Metadata: ARRAY_NAME=/Band1 DTYPE=Byte DIM_0_NAME=y DIM_0_SIZE=20 DIM_0_BLOCK=10 DIM_0_CHUNKS=2 DIM_1_NAME=x DIM_1_SIZE=20 DIM_1_BLOCK=10 DIM_1_CHUNKS=2 CODEC_COMPRESSION=DEFLATE CODEC_FILTER=SHUFFLE Geometry: None Feature Count: 4 Layer SRS WKT: (unknown) dim_0: Integer64 (0.0) dim_1: Integer64 (0.0) present: Integer(Boolean) (0.0) path: String (0.0) offset: Integer64 (0.0) size: Integer64 (0.0) info: String (0.0) OGRFeature(Band1):0 dim_0 (Integer64) = 0 dim_1 (Integer64) = 0 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 6108 size (Integer64) = 86 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE OGRFeature(Band1):1 dim_0 (Integer64) = 0 dim_1 (Integer64) = 1 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 6194 size (Integer64) = 74 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE OGRFeature(Band1):2 dim_0 (Integer64) = 1 dim_1 (Integer64) = 0 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 5959 size (Integer64) = 74 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE OGRFeature(Band1):3 dim_0 (Integer64) = 1 dim_1 (Integer64) = 1 present (Integer(Boolean)) = 1 path (String) = autotest/gdrivers/data/netcdf/byte_chunked_multiple.nc offset (Integer64) = 6033 size (Integer64) = 75 info (String) = COMPRESSION=DEFLATE; FILTER=SHUFFLE