All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Columns in the output file are now guaranteed to be in the same order as given
by the
columnsinput value. (#100)
- Valid values for the
outputoption have changed such that a file extension is required, whereas previously this was optional since.gpkgwas the only supported output format. (#97)
- It is now possible to specify alternative file formats for the
outputvalue: in addition to the file extension.gpkgfor GeoPackage format, it is now possible to specify the extensions.parquetfor (Geo)Parquet format, or.fgbfor FlatGeobuf format. (#97)
0.9.0 (2024-10-09)
- Add "L4C" as a valid value for the
doiinput, for convenience (#90)
0.8.0 (2024-08-13)
- Remove hard-coded MAAP API host value in the scripts
bin/algo/describe,bin/algo/delete, andbin/algo/registerlettingmaap-pymake use ofMAAP_API_HOSTenvironment variable (#85)
- Obtain AWS S3 credentials via a role using the EC2 instance metadata rather
than via the
maap-pylibrary (#14) - Log messages with timestamps in ISO 8601 UTC combined date and time representations with milliseconds (#72)
- Read granule files directly from AWS S3 instead of downloading them (#54)
- Optimize AWS S3 read performance to provide ~10% speed improvement (on
average) over downloading files by tuning the
default_cache_type,default_block_size, anddefault_fill_cachekeyword arguments to thefsspec.url_to_fsfunction (#77) - Set default granule
limitto 100000. Although this is not unlimited, it effectively behaves as such because all of the supported GEDI collections have fewer granules than this limit. (#69) - Set default job queue to
maap-dps-worker-32vcpu-64gbto improve performance by running on 32 CPUs (#78) - Succeed even when the result is an empty subset (#79)
- Upgrade to Python 3.12
- Add
fsspec_kwargsinput to allow user to specify keyword arguments to thefsspec.url_to_fsmethod; see MAAP_USAGE.md for details. (#77) - Add
processesinput to allow user to specify the number of processes to use, defaulting to the number of available CPUs (#77)
0.7.0 (2024-04-23)
- #57 Users may
choose to profile their jobs by specifying command-line options for the
scaleneprofiling tool. Seedocs/MAAP_USAGE.mdfor more information. - #44 Granule download failures are now retried up to 10 times to reduce the likelihood that subsetting will fail due to a download failure.
- #56 The
bin/subset.shscript now captures output tostderrand writes it to the log file namedgedi-subset.log. When a job succeeds, the log file will appear in the job's output directory. Otherwise, it will appear in the job's triage directory. - #65 All supported GEDI collections are now cloud-hosted, and granules are now downloaded from the cloud rather than from DAAC servers.
0.6.2 (2023-12-05)
- Updated to use v3.1.3 of maap-py in environment-maappy.yml. Previous versions of maap-py were using the deprecated MAAP Query Service API endpoint.
0.6.1 (2023-09-28)
- #49 Remove all API URLs that contain ops as they have now been retired (e.g., api.ops.maap-project.org).
0.6.0 (2023-06-02)
- #40 All geometries in an AOI (area of interest) file are now used for granule selection and subsetting. Previously, only the first geometry was used, resulting in a much smaller subset than expected, in cases where the AOI contains multiple geometries.
- #41 Granules without a download link in their metadata are now skipped. Previously, encountering such granules would cause a job failure, due to being unable to download a file without having a download link.
- #42 Granules with metadata containing multiple boundaries within the horizontal spatial domain are now supported. In such cases, a single boundary is obtained by taking the union of the individual boundaries. If the result intersects with the AOI, then the granule is included in the subset. Previously, although rare, such granule metadata would cause a job failure.
- Upgraded Python to version 3.11 to take advantage of the addition of fine-grained error locations in tracebacks to help with debugging errors.
- The
beamcolumn is no longer automatically included in the output file. If you wish to include thebeamcolumn, you must specify it explicitly in thecolumnsinput. - The default value for
limitwas reduced from 10000 to 1000. The AOI for most subsetting operations are likely to incur a request for well under 1000 granules for downloading, so a larger default value might only lead to longer CMR query times.
- #38: Temporal
filtering is now supported, such that specifying a temporal range will
limit the granules downloaded from the CMR, pulling only granules obtained
within the specified range. See
docs/MAAP_USAGE.mdfor more information. - Added an input parameter named
outputto allow user to specify the name of the output file, rather than hard-code the name togedi-subset.gpkg. Seedocs/MAAP_USAGE.mdfor more information.
0.5.0 (2023-04-11)
- #36: All CMR queries now use the NASA CMR, because the MAAP CMR is being retired. If you wish to query the MAAP CMR until it is taken down, you may still use an earlier version of this algorithm (ideally, 0.4.0).
0.4.0 (2022-11-14)
- #6: Allow user to specify which BEAMs to subset
0.3.0 (2022-11-10)
- #5: Nested variables must now be specified by path relative to each BEAM group. This not only avoids ambiguity for variables of the same name (but different paths), but also makes a variable's location explicit.
- #17: Granule files that cannot be successfully read are skipped, rather than causing job failure. Offending files are retained to facilitate analysis.
- #1 User must
specify values for
latandlonas inputs, allowing the user to choose which lat/lon datasets are used. - #2: User must
specify
doias an input, now allowing subsetting of L2A as well as L4A data. - #7: Columns from 2D variables can be selected.
- #8: Specifying a query is now optional, to allow selecting all rows for specified columns.
0.2.7 (2022-10-18)
- Promoted the GEDI Subsetting algorithm to this repository from the
MAAP-Project/maap-documentation-examples repository. This
0.2.7version replicates thegedi-subset-0.2.7version released from that repository.