Skip to content

Commit 6649f2a

Browse files
Merge pull request #598 from databrickslabs/fix/changelog-0.4.3
Final tweaks before release
2 parents 474afce + b89bf6f commit 6649f2a

File tree

4 files changed

+37
-19
lines changed

4 files changed

+37
-19
lines changed

CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ We will continue to maintain Mosaic for the foreseeable future, including bug fi
66

77
This release includes a number of enhancements and fixes, detailed below.
88

9-
### Raster checkpointing is enabled by default
10-
Fuse-based checkpointing for raster operations is now enabled by default and managed through:
9+
### Raster checkpointing functions
10+
Fuse-based checkpointing for raster operations is disabled by default but can be enabled and managed through:
1111
- spark configs `spark.databricks.labs.mosaic.raster.use.checkpoint` and `spark.databricks.labs.mosaic.raster.checkpoint`.
1212
- python: `mos.enable_gdal(spark, with_checkpoint_path=path)`.
1313
- scala: `MosaicGDAL.enableGDALWithCheckpoint(spark, path)`.
@@ -23,6 +23,7 @@ We plan further enhancements to this feature (including automatic cleanup of che
2323
- `RST_Clip` now exposes the GDAL Warp option `CUTLINE_ALL_TOUCHED` which determines whether or not any given pixel is included whether the clipping geometry crosses the centre point of the pixel (false) or any part of the pixel (true). The default is true but this is now configurable.
2424
- Within clipping operations such as `RST_Clip` we now correctly set the CRS in the generated Shapefile Feature Layer used for clipping. This means that the CRS of the input geometry will be respected when clipping rasters.
2525
- Added two new functions for getting and upcasting the datatype of a raster band: `RST_Type` and `RST_UpdateType`. Use these for ensuring that the datatype of a raster is appropriate for the operations being performed, e.g. upcasting the types of integer-typed input rasters before performing raster algebra like NDVI calculations where the result needs to be a float.
26+
- Added `RST_AsFormat`, a function that translates rasters between formats e.g. from NetCDF to GeoTIFF.
2627
- The logic underpinning `RST_MemSize` (and related operations) has been updated to fall back to estimating based on the raster dimensions and data types of each band if the raster is held in-memory.
2728
- `RST_To_Overlapping_Tiles` is renamed `RST_ToOverlappingTiles`. The original expression remains but is marked as deprecated.
2829
- `RST_WorldToRasterCoordY` now returns the correct `y` value (was returning `x`)

docs/source/api/raster-format-readers.rst

Lines changed: 33 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -98,17 +98,27 @@ The output of the reader is a DataFrame with the following columns:
9898
mos.read().format("raster_to_grid")
9999
***********************************
100100
Reads a GDAL raster file and converts it to a grid.
101+
101102
It uses a pattern similar to standard spark.read.format(*).option(*).load(*) pattern.
102103
The only difference is that it uses :code:`mos.read()` instead of :code:`spark.read()`.
104+
103105
The raster pixels are converted to grid cells using specified combiner operation (default is mean).
104106
If the raster pixels are larger than the grid cells, the cell values can be calculated using interpolation.
105107
The interpolation method used is Inverse Distance Weighting (IDW) where the distance function is a k_ring
106108
distance of the grid.
109+
110+
Rasters can be transformed into different formats as part of this process in order to overcome problems with bands
111+
being translated into subdatasets by some GDAL operations. Our recommendation is to specify :code:`GTiff` if you run into problems here.
112+
113+
Raster checkpointing should be enabled to avoid memory issues when working with large rasters. See :doc:`Checkpointing </usage/raster-checkpointing>` for more information.
114+
107115
The reader supports the following options:
108116

109117
* fileExtension - file extension of the raster file (StringType) - default is *.*
110118
* vsizip - if the rasters are zipped files, set this to true (BooleanType)
111119
* resolution - resolution of the output grid (IntegerType)
120+
* sizeInMB - size of subdivided rasters in MB. Must be supplied, must be a positive integer (IntegerType)
121+
* convertToFormat - convert the raster to a different format (StringType)
112122
* combiner - combiner operation to use when converting raster to grid (StringType) - default is mean
113123
* retile - if the rasters are too large they can be re-tiled to smaller tiles (BooleanType)
114124
* tileSize - size of the re-tiled tiles, tiles are always squares of tileSize x tileSize (IntegerType)
@@ -131,14 +141,19 @@ The reader supports the following options:
131141
.. tabs::
132142
.. code-tab:: py
133143

134-
df = mos.read().format("raster_to_grid")\
135-
.option("fileExtension", "*.tif")\
136-
.option("resolution", "8")\
137-
.option("combiner", "mean")\
138-
.option("retile", "true")\
139-
.option("tileSize", "1000")\
140-
.option("kRingInterpolate", "2")\
144+
df = (
145+
mos.read()
146+
.format("raster_to_grid")
147+
.option("sizeInMB", "16")
148+
.option("convertToFormat", "GTiff")
149+
.option("resolution", "0")
150+
.option("readSubdataset", "true")
151+
.option("subdatasetName", "t2m")
152+
.option("retile", "true")
153+
.option("tileSize", "600")
154+
.option("combiner", "avg")
141155
.load("dbfs:/path/to/raster.tif")
156+
)
142157
df.show()
143158
+--------+--------+------------------+
144159
|band_id |cell_id |cell_value |
@@ -151,14 +166,17 @@ The reader supports the following options:
151166

152167
.. code-tab:: scala
153168

154-
val df = MosaicContext.read.format("raster_to_grid")
155-
.option("fileExtension", "*.tif")
156-
.option("resolution", "8")
157-
.option("combiner", "mean")
158-
.option("retile", "true")
159-
.option("tileSize", "1000")
160-
.option("kRingInterpolate", "2")
161-
.load("dbfs:/path/to/raster.tif")
169+
val df = MosaicContext.read
170+
.format("raster_to_grid")
171+
.option("sizeInMB", "16")
172+
.option("convertToFormat", "GTiff")
173+
.option("resolution", "0")
174+
.option("readSubdataset", "true")
175+
.option("subdatasetName", "t2m")
176+
.option("retile", "true")
177+
.option("tileSize", "600")
178+
.option("combiner", "avg")
179+
.load("dbfs:/path/to/raster.tif")
162180
df.show()
163181
+--------+--------+------------------+
164182
|band_id |cell_id |cell_value |

docs/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
author = 'Milos Colic, Stuart Lynn, Michael Johns, Robert Whiffin'
2323

2424
# The full version, including alpha/beta/rc tags
25-
release = "v0.4.2"
25+
release = "v0.4.3"
2626

2727

2828
# -- General configuration ---------------------------------------------------

src/main/scala/com/databricks/labs/mosaic/datasource/multiread/RasterAsGridReader.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,6 @@ class RasterAsGridReader(sparkSession: SparkSession) extends MosaicDataFrameRead
6464
} else {
6565
lit(config("convertToFormat"))
6666
}
67-
6867
val rasterToGridCombiner = getRasterToGridFunc(config("combiner"))
6968

7069
val loadedDf = retiledDf

0 commit comments

Comments
 (0)