Skip to content

Latest commit

 

History

History
429 lines (283 loc) · 13.3 KB

File metadata and controls

429 lines (283 loc) · 13.3 KB

gdal pipeline

.. versionadded:: 3.12

.. only:: html

    Process a dataset applying several steps.

.. Index:: gdal pipeline

Description

:program:`gdal pipeline` execute a pipeline, taking a raster or input dataset, execute steps and finally writing a raster or vector dataset.

Most steps proceed in on-demand evaluation of raster blocks or features, unless otherwise stated in their documentation, without "materializing" the resulting dataset of the operation of each step. It may be desirable sometimes for performance purposes to proceed to materializing an intermediate dataset to disk using :ref:`gdal_raster_materialize` or :ref:`gdal_vector_materialize`.

Synopsis

.. program-output:: gdal pipeline --help-doc=main

A pipeline chains several steps, separated with the ! (exclamation mark) character. Including a ! between gdal pipeline and the first step is optional. The first step must be read, calc, concat, mosaic or stack, and the last one info, tile or write. Each step has its own positional or non-positional arguments. Apart from read, calc, concat, mosaic, stack, info, tile, partition and write, all other steps can potentially be used several times in a pipeline.

.. example::
   :title: Compute the footprint of a raster and apply a buffer on the footprint

   .. code-block:: bash

        $ gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write out.gpkg --overwrite

Steps

For steps that have both raster data type as input and output, consult :ref:`gdal_raster_pipeline`. For steps that have both vector data type as input and output, consult :ref:`gdal_vector_pipeline`.

The table below lists steps that convert between raster and vector data.

Step Direction
:ref:`contour <pipeline-contour>` Raster → Vector
:ref:`footprint <pipeline-footprint>` Raster → Vector
:ref:`pixel-info <pipeline-pixel-info>` Raster → Vector
:ref:`polygonize <pipeline-polygonize>` Raster → Vector
:ref:`grid <pipeline-grid>` Vector → Raster
:ref:`rasterize <pipeline-rasterize>` Vector → Raster
:ref:`tee <pipeline-tee>` Vector → Raster

contour

.. program-output:: gdal pipeline --help-doc=contour

Details for options can be found in :ref:`gdal_raster_contour`.

footprint

.. program-output:: gdal pipeline --help-doc=footprint

Details for options can be found in :ref:`gdal_raster_footprint`.

grid

.. program-output:: gdal pipeline --help-doc=grid

Details for options can be found in :ref:`gdal_vector_grid`.

pixel-info

.. program-output:: gdal pipeline --help-doc=pixel-info

Details for options can be found in :ref:`gdal_raster_pixel_info`.

polygonize

.. program-output:: gdal pipeline --help-doc=polygonize

Details for options can be found in :ref:`gdal_raster_polygonize`.

rasterize

.. program-output:: gdal pipeline --help-doc=rasterize

Details for options can be found in :ref:`gdal_vector_rasterize`.

tee

.. program-output:: gdal pipeline --help-doc=tee-raster

Details for options can be found in :ref:`gdal_output_nested_pipeline`.

GDALG output (on-the-fly / streamed dataset)

A pipeline can be serialized as a JSON file using the GDALG output format. The resulting file can then be opened as a dataset using the :ref:`raster.gdalg` or :ref:`vector.gdalg` driver, and apply the specified pipeline in a on-the-fly / streamed way.

The command_line member of the JSON file should nominally be the whole command line without the final write step, and is what is generated by gdal pipeline ! .... ! write out.gdalg.json.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal pipeline ! read in.tif ! footprint ! buffer 20"
}

The final write step can be added but if so it must explicitly specify the stream output format and a non-significant output dataset name.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write --output-format=streamed streamed_dataset"
}

Substitutions

It is also possible to use :program:`gdal pipeline` to use a pipeline already serialized in a .gdalg.json file, and customize its existing steps, typically changing an input filename, specifying an output filename, or adding/modifying arguments of steps.

The syntax is:

gdal pipeline <filename.gdalg.json> --<step-name>.<arg-name>=value

When specifying an existing argument of a step of a pipeline, the value from the pipeline is overridden by the one specified on the :program:`gdal pipeline` command line.

Let's imagine we have a :file:`raster_reproject.gdalg.json` with the following content:

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal pipeline ! read in.tif ! reproject --output-crs=EPSG:4326 ! edit --metadata=CHANGES=reprojected"
}

It is possible to run it with the following command line, overriding the input argument of the read step, and implicitly adding a final write step with an output argument.

$ gdal pipeline raster_reproject.gdalg.json --read.input=other_input.tif --write.output=out.tif

When there is no ambiguity, it is also possible to omit the step name, and just specify the argument name (if there is an ambiguity, :program:`gdal pipeline` will emit an error, so this is safe to do):

$ gdal pipeline raster_reproject.gdalg.json --input=other_input.tif --output=out.tif --co COMPRESS=LZW --overwrite

When a step appears several times in the pipeline, it must specified as <step-name>[<idx>], where <idx> is a zero-based index.

For example, given:

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal pipeline ! read in.tif ! edit --metadata=before=value ! reproject --output-crs=EPSG:4326 ! edit --metadata=CHANGES=reprojected"
}

the following command line may be used:

$ gdal pipeline raster_reproject.gdalg.json --edit[0].metadata=before=modified --output=out.tif

Execution of pipelines and argument substitutions can also be done in Python with:

gdal.Run("pipeline", pipeline="raster_reproject.gdalg.json", output="out.tif", arguments={"edit[0].metadata": "before=modified"})

Placeholder dataset name _PIPE_

.. versionadded:: 3.13

By default, in a pipeline step that accepts multiple input dataset arguments, the first positional argument, input, is implicitly set to the output dataset from the previous step. In some cases, it might be desirable to pipe the output dataset from the previous step into one of the other input dataset arguments instead.

This can be achieved by using the placeholder dataset name _PIPE_ (PIPE with leading and trailing underscore character) as the value for the alternate dataset argument, while explicitly specifying the input positional dataset argument.

.. example::
   :title: Summarize mean elevation within 200m of points of interest

   .. code-block:: bash

      gdal pipeline read points.geojson ! buffer 200 ! \
          zonal-stats \
            --input dem.tif
            --zones _PIPE_ \
            --stat mean ! \
          write \
            --output-format CSV \
            --output /vsistdout/


It is also possible to achieve the same result by using a input nested pipeline as described below.

Nested pipeline

Input nested pipeline

Wherever an input dataset is expected as an auxiliary dataset, it is possible to specify it as the result of a nested pipeline. The content of an input nested pipeline is identical to the outer pipeline, except it must not end with an output-generating step like info, tile or write

.. example::
   :title: Combine the output of shaded relief map and hypsometric rendering on a DEM to create a colorized shaded relief map.

   .. code-block:: bash

        $ gdal pipeline read n43.tif ! \
                        color-map --color-map color_file.txt ! \
                        blend --operator=hsv-value --overlay \
                            [ read n43.tif ! hillshade -z 30 ] ! \
                        write out.tif --overwrite

In the above example, the value of the overlay argument of the blend step is set as the output of the nested pipeline read n43.tif ! hillshade -z 30.

.. only:: html

   .. image:: ../../images/programs/gdal_pipeline_input_nested.svg
      :width: 0
      :height: 0

   .. raw:: html

      <object type="image/svg+xml"
              data="../_images/gdal_pipeline_input_nested.svg">
      </object>

.. only:: not html

   .. image:: ../../images/programs/gdal_pipeline_input_nested.svg

Output nested pipeline

The tee step in a pipeline forwards the input dataset as its output, and additionally executes one or several nested pipelines that take this input dataset as input and do other processing to eventually write the output of that processing. The first step of a tee output nested pipeline must not be read, calc, concat, mosaic or stack, and its last step must be write or tile. The tee operator can be either used in the middle of a pipeline or as its last step.

The below example shows an example where the tee operator executes two output nested pipelines.

.. example::
   :title: Split the content of a "cities" layer according to whether its population is below or above 1 million.

   .. code-block:: bash

      $ gdal pipeline read cities.gpkg ! \
              tee [ filter --where "pop < 1e6" ! write small_cities.gpkg ] \
                  [ filter --where "pop >= 1e6" ! write big_cities.gpkg ]


The below example shows a more complicated use case, including two occurrences of tee, with one of them being an output nested pipeline inside an input nested pipeline.

.. example::
   :title: Combine the output of shaded relief map and hypsometric rendering on
           a DEM to create a colorized shaded relief map, and write intermediate
           hillshade and colorized dataset

   .. code-block:: bash

        $ gdal pipeline read n43.tif ! \
                        color-map --color-map color_file.txt ! \
                        tee [ write colored.tif --overwrite ] ! \
                        blend --operator=hsv-value --overlay \
                            [ read n43.tif ! hillshade -z 30  ! tee [ write hillshade.tif --overwrite ] ] ! \
                        write colored-hillshade.tif --overwrite

.. only:: html

   .. image:: ../../images/programs/gdal_pipeline_output_nested.svg
      :width: 0
      :height: 0

   .. raw:: html

      <object type="image/svg+xml"
              data="../_images/gdal_pipeline_output_nested.svg">
      </object>

.. only:: not html

   .. image:: ../../images/programs/gdal_pipeline_output_nested.svg

Examples

.. example::
   :title: Compute the footprint of a raster and apply a buffer on the footprint

   .. code-block:: bash

        $ gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write out.gpkg --overwrite

.. example::
   :title: Rasterize and reproject

   .. code-block:: bash

        $ gdal pipeline ! read in.gpkg ! rasterize --size 1000,1000 ! reproject --output-crs EPSG:4326 ! write out.tif --overwrite

.. example::
   :title: Use an existing pipeline that rasterizes and reprojects, but change its input file and target CRS, and specify the output file

   .. code-block:: bash

        $ gdal pipeline raster_reproject.gdalg.json --input=my.gpkg --output=out.tif --output-crs=EPSG:32631

.. example::
   :title: Buffer a line dataset to create a new polygon dataset
   :id: gdal-pipeline-buffer-line

   This example uses a ``lines.gpkg`` dataset containing a single layer named ``lines``,
   with a geometry field named ``geom`` and an integer attribute named ``width``. The value
   of this attribute is used as the buffer distance for each feature.

   .. code-block:: bash

        gdal vector pipeline \
            ! read lines.gpkg \
            ! sql "SELECT fid, ST_Buffer(geom, width) AS geom FROM lines" \
            ! set-geom-type --geometry-type Polygon \
            ! write buffered-lines.gpkg --output-layer=BufferedLines --overwrite --overwrite-layer

   .. note::

      When creating derived geometries using SQL, avoid using ``SELECT *``.
      Including the original geometry field will result in multiple geometry
      columns in the output. Instead, explicitly list the required attributes
      and return a single geometry column.