.. versionadded:: 3.12
.. only:: html
Process a dataset applying several steps.
.. Index:: gdal pipeline
:program:`gdal pipeline` execute a pipeline, taking a raster or input dataset, execute steps and finally writing a raster or vector dataset.
Most steps proceed in on-demand evaluation of raster blocks or features, unless otherwise stated in their documentation, without "materializing" the resulting dataset of the operation of each step. It may be desirable sometimes for performance purposes to proceed to materializing an intermediate dataset to disk using :ref:`gdal_raster_materialize` or :ref:`gdal_vector_materialize`.
.. program-output:: gdal pipeline --help-doc=main
A pipeline chains several steps, separated with the ! (exclamation mark) character.
Including a ! between gdal pipeline and the first step is optional.
The first step must be read, calc, concat, mosaic or stack,
and the last one info, tile or write.
Each step has its own positional or non-positional arguments.
Apart from read, calc, concat, mosaic, stack, info, tile, partition and write,
all other steps can potentially be used several times in a pipeline.
.. example::
:title: Compute the footprint of a raster and apply a buffer on the footprint
.. code-block:: bash
$ gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write out.gpkg --overwrite
For steps that have both raster data type as input and output, consult :ref:`gdal_raster_pipeline`. For steps that have both vector data type as input and output, consult :ref:`gdal_vector_pipeline`.
The table below lists steps that convert between raster and vector data.
| Step | Direction |
|---|---|
| :ref:`contour <pipeline-contour>` | Raster → Vector |
| :ref:`footprint <pipeline-footprint>` | Raster → Vector |
| :ref:`pixel-info <pipeline-pixel-info>` | Raster → Vector |
| :ref:`polygonize <pipeline-polygonize>` | Raster → Vector |
| :ref:`grid <pipeline-grid>` | Vector → Raster |
| :ref:`rasterize <pipeline-rasterize>` | Vector → Raster |
| :ref:`tee <pipeline-tee>` | Vector → Raster |
.. program-output:: gdal pipeline --help-doc=contour
Details for options can be found in :ref:`gdal_raster_contour`.
.. program-output:: gdal pipeline --help-doc=footprint
Details for options can be found in :ref:`gdal_raster_footprint`.
.. program-output:: gdal pipeline --help-doc=grid
Details for options can be found in :ref:`gdal_vector_grid`.
.. program-output:: gdal pipeline --help-doc=pixel-info
Details for options can be found in :ref:`gdal_raster_pixel_info`.
.. program-output:: gdal pipeline --help-doc=polygonize
Details for options can be found in :ref:`gdal_raster_polygonize`.
.. program-output:: gdal pipeline --help-doc=rasterize
Details for options can be found in :ref:`gdal_vector_rasterize`.
.. program-output:: gdal pipeline --help-doc=tee-raster
Details for options can be found in :ref:`gdal_output_nested_pipeline`.
A pipeline can be serialized as a JSON file using the GDALG output format.
The resulting file can then be opened as a dataset using the
:ref:`raster.gdalg` or :ref:`vector.gdalg` driver, and apply the specified pipeline in a on-the-fly /
streamed way.
The command_line member of the JSON file should nominally be the whole command
line without the final write step, and is what is generated by
gdal pipeline ! .... ! write out.gdalg.json.
{
"type": "gdal_streamed_alg",
"command_line": "gdal pipeline ! read in.tif ! footprint ! buffer 20"
}The final write step can be added but if so it must explicitly specify the
stream output format and a non-significant output dataset name.
{
"type": "gdal_streamed_alg",
"command_line": "gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write --output-format=streamed streamed_dataset"
}It is also possible to use :program:`gdal pipeline` to use a pipeline already
serialized in a .gdalg.json file, and customize its existing steps, typically
changing an input filename, specifying an output filename, or adding/modifying arguments
of steps.
The syntax is:
gdal pipeline <filename.gdalg.json> --<step-name>.<arg-name>=value
When specifying an existing argument of a step of a pipeline, the value from the pipeline is overridden by the one specified on the :program:`gdal pipeline` command line.
Let's imagine we have a :file:`raster_reproject.gdalg.json` with the following content:
{
"type": "gdal_streamed_alg",
"command_line": "gdal pipeline ! read in.tif ! reproject --output-crs=EPSG:4326 ! edit --metadata=CHANGES=reprojected"
}It is possible to run it with the following command line, overriding the
input argument of the read step, and implicitly adding a final write
step with an output argument.
$ gdal pipeline raster_reproject.gdalg.json --read.input=other_input.tif --write.output=out.tifWhen there is no ambiguity, it is also possible to omit the step name, and just specify the argument name (if there is an ambiguity, :program:`gdal pipeline` will emit an error, so this is safe to do):
$ gdal pipeline raster_reproject.gdalg.json --input=other_input.tif --output=out.tif --co COMPRESS=LZW --overwriteWhen a step appears several times in the pipeline, it must specified as
<step-name>[<idx>], where <idx> is a zero-based index.
For example, given:
{
"type": "gdal_streamed_alg",
"command_line": "gdal pipeline ! read in.tif ! edit --metadata=before=value ! reproject --output-crs=EPSG:4326 ! edit --metadata=CHANGES=reprojected"
}the following command line may be used:
$ gdal pipeline raster_reproject.gdalg.json --edit[0].metadata=before=modified --output=out.tifExecution of pipelines and argument substitutions can also be done in Python with:
gdal.Run("pipeline", pipeline="raster_reproject.gdalg.json", output="out.tif", arguments={"edit[0].metadata": "before=modified"}).. versionadded:: 3.13
By default, in a pipeline step that accepts multiple input dataset arguments,
the first positional argument, input, is implicitly set to the output
dataset from the previous step. In some cases, it might be desirable to pipe
the output dataset from the previous step into one of the other input dataset
arguments instead.
This can be achieved by using the placeholder dataset name _PIPE_ (PIPE with
leading and trailing underscore character) as
the value for the alternate dataset argument, while explicitly specifying the
input positional dataset argument.
.. example::
:title: Summarize mean elevation within 200m of points of interest
.. code-block:: bash
gdal pipeline read points.geojson ! buffer 200 ! \
zonal-stats \
--input dem.tif
--zones _PIPE_ \
--stat mean ! \
write \
--output-format CSV \
--output /vsistdout/
It is also possible to achieve the same result by using a input nested pipeline as described below.
Wherever an input dataset is expected as an auxiliary dataset, it is possible
to specify it as the result of a nested pipeline. The content of an input
nested pipeline is identical to the outer pipeline, except it must not end with
an output-generating step like info, tile or write
.. example::
:title: Combine the output of shaded relief map and hypsometric rendering on a DEM to create a colorized shaded relief map.
.. code-block:: bash
$ gdal pipeline read n43.tif ! \
color-map --color-map color_file.txt ! \
blend --operator=hsv-value --overlay \
[ read n43.tif ! hillshade -z 30 ] ! \
write out.tif --overwrite
In the above example, the value of the overlay argument of the blend
step is set as the output of the nested pipeline read n43.tif ! hillshade -z 30.
.. only:: html
.. image:: ../../images/programs/gdal_pipeline_input_nested.svg
:width: 0
:height: 0
.. raw:: html
<object type="image/svg+xml"
data="../_images/gdal_pipeline_input_nested.svg">
</object>
.. only:: not html .. image:: ../../images/programs/gdal_pipeline_input_nested.svg
The tee step in a pipeline forwards the input dataset as its output,
and additionally executes one or several nested pipelines that take this input
dataset as input and do other processing to eventually write the output of that
processing. The first step of a tee output nested pipeline must not be
read, calc, concat, mosaic or stack, and its last step
must be write or tile. The tee operator can be either used in
the middle of a pipeline or as its last step.
The below example shows an example where the tee operator executes two
output nested pipelines.
.. example::
:title: Split the content of a "cities" layer according to whether its population is below or above 1 million.
.. code-block:: bash
$ gdal pipeline read cities.gpkg ! \
tee [ filter --where "pop < 1e6" ! write small_cities.gpkg ] \
[ filter --where "pop >= 1e6" ! write big_cities.gpkg ]
The below example shows a more complicated use case, including two occurrences of tee,
with one of them being an output nested pipeline inside an input nested pipeline.
.. example::
:title: Combine the output of shaded relief map and hypsometric rendering on
a DEM to create a colorized shaded relief map, and write intermediate
hillshade and colorized dataset
.. code-block:: bash
$ gdal pipeline read n43.tif ! \
color-map --color-map color_file.txt ! \
tee [ write colored.tif --overwrite ] ! \
blend --operator=hsv-value --overlay \
[ read n43.tif ! hillshade -z 30 ! tee [ write hillshade.tif --overwrite ] ] ! \
write colored-hillshade.tif --overwrite
.. only:: html
.. image:: ../../images/programs/gdal_pipeline_output_nested.svg
:width: 0
:height: 0
.. raw:: html
<object type="image/svg+xml"
data="../_images/gdal_pipeline_output_nested.svg">
</object>
.. only:: not html .. image:: ../../images/programs/gdal_pipeline_output_nested.svg
.. example::
:title: Compute the footprint of a raster and apply a buffer on the footprint
.. code-block:: bash
$ gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write out.gpkg --overwrite
.. example::
:title: Rasterize and reproject
.. code-block:: bash
$ gdal pipeline ! read in.gpkg ! rasterize --size 1000,1000 ! reproject --output-crs EPSG:4326 ! write out.tif --overwrite
.. example::
:title: Use an existing pipeline that rasterizes and reprojects, but change its input file and target CRS, and specify the output file
.. code-block:: bash
$ gdal pipeline raster_reproject.gdalg.json --input=my.gpkg --output=out.tif --output-crs=EPSG:32631
.. example::
:title: Buffer a line dataset to create a new polygon dataset
:id: gdal-pipeline-buffer-line
This example uses a ``lines.gpkg`` dataset containing a single layer named ``lines``,
with a geometry field named ``geom`` and an integer attribute named ``width``. The value
of this attribute is used as the buffer distance for each feature.
.. code-block:: bash
gdal vector pipeline \
! read lines.gpkg \
! sql "SELECT fid, ST_Buffer(geom, width) AS geom FROM lines" \
! set-geom-type --geometry-type Polygon \
! write buffered-lines.gpkg --output-layer=BufferedLines --overwrite --overwrite-layer
.. note::
When creating derived geometries using SQL, avoid using ``SELECT *``.
Including the original geometry field will result in multiple geometry
columns in the output. Instead, explicitly list the required attributes
and return a single geometry column.