- Algorithm Outline
- Algorithm Inputs
- Running a GEDI Subsetting DPS Job
- Getting the GeoJSON URL for a geoBoundary
- Contributing
- Citations
At a high level, the GEDI subsetting algorithm does the following:
- Queries the MAAP CMR for GEDI L4A granules intersecting a specified AOI (GeoJSON)
- Downloads the data file (h5) for each intersecting granule (up to specified limit)
- Subsets each data file
- Combines all subset files into a single output file named
gedi_subset.gpkg
, in GeoPackage format, readable withgeopandas
as aGeoDataFrame
.
To run a GEDI subsetting DPS job, you must supply the following inputs:
aoi
(required): URL to a GeoJSON file representing your area of interestcolumns
: Comma-separated list of column names to include in output file. (Default:agbd, agbd_se, l2_quality_flag, l4_quality_flag, sensitivity, sensitivity_a2
)query
: Query expression for subsetting the rows in the output file. (Default:l2_quality_flag == 1 and l4_quality_flag == 1 and sensitivity > 0.95 and sensitivity_a2 > 0.95"
)limit
: Maximum number of GEDI granule data files to download (among those that intersect the specified AOI). (Default: 10,000)
IMPORTANT |
---|
When supplying input values (either via the ADE UI or programmatically, as shown in the next section), to use the default value (where indicated) for an input, enter a dash (- ) as the input value, otherwise you will receive an error if you leave any input blank (or unspecified). |
If your AOI is a publicly available geoBoundary, see
Getting the GeoJSON URL for a geoBoundary
for details on obtaining it's URL. In this case, that is the URL you must
supply for the aoi
input.
Alternatively, you can make your own GeoJSON file for your AOI and place it
within your public bucket within the ADE. Based upon where you place your
GeoJSON file, you can construct a URL to specify for the a job's aoi
input.
Specifically, you should place your GeoJSON file at a location of the following
form within the ADE (where path/to/aio.geojson
can be any path and filename
for your AOI):
~/my-public-bucket/path/to/aoi.geojson
^^^^^^^^^^^^^^^^^^
You would then supply the following URL as the aoi
input value when running
this algorithm as a DPS job, where <USERNAME>
is your ADE username:
https://maap-ops-workspace.s3.amazonaws.com/shared/<USERNAME>/path/to/aoi.geojson
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Replace "~/my-public-bucket" with this URL prefix
The GEDI Subsetting DPS Job is named gedi-subset_ubuntu
, and may be executed
from your ADE Workspace by opening the DPS/MAS Operations menu, choosing
the Execute DPS Job menu option, and selecting gedi-subset_ubuntu:<VERSION>
from the dropdown. You will be prompted for the inputs as described in the
previous section.
Alternatively, for greater control of your job configuration, you may use the MAAP API from a Notebook (or a Python script), as follows:
from maap.maap import MAAP
maap = MAAP(maap_host='api.ops.maap-project.org')
aoi = "<AOI GeoJSON URL>" # See previous section
limit = 2000 # Maximum number of granule files to download
result = maap.submitJob(
identifier="<DESCRIPTION>",
algo_id="gedi-subset_ubuntu",
version="<VERSION>",
queue="maap-dps-worker-8gb",
username="<USERNAME>", # Your Earthdata Login username
aoi=aoi,
columns="<COLUMNS>", # See previous section
query="<QUERY>", # See previous section
limit=limit,
)
job_id = result["job_id"]
job_id
To check the status of your job via the ADE UI, open the DPS/MAS Operations
menu, choose Get DPS Job Status, and enter the value of the job_id
to
obtain the status, just as if you had submitted the job from the menu (rather
than programmatically).
Alternatively, to programmatically check the status of the submitted job, you
may run the following code. If using a notebook, use a separate cell so you can
run it repeatedly until you get a status of either 'Succeeded'
or 'Failed'
:
import re
# Should evaluate to 'Accepted', 'Running', 'Succeeded', or 'Failed'
re.search(r"Status>(?P<status>.+)</wps:Status>", maap.getJobStatus(job_id).text).group('status')
Once the job status is either Succeeded or Failed, you may obtain the job result either via the UI (DPS/MAS Operations > Get DPS Job Result), or programmatically, but given that the programmatic results are in XML format, it will be difficult to read, so using the UI is ideal in this case.
If the jobs status is Failed, the job results should show failure details.
If the job status is Succeeded, the job results should show 3 URLs, with the first URL of the following form:
http://.../<USERNAME>/dps_output/gedi-subset_ubuntu/<VERSION>/<DATETIME_PATH>
Based upon this URL, the gedi_subset.gpkg
file generated by the job should be
available at the following path within the ADE:
~/my-private-bucket/dps_output/gedi-subset_ubuntu/<VERSION>/<DATETIME_PATH>/gedi_subset.gpkg
If your AOI is a geoBoundary (such as a country), you may obtain the URL for its GeoJSON file from the geoBoundaries website, rather than constructing a custom GeoJSON file. To obtain the URL for the GeoJSON of an AOI, obtain the ISO3 code and level for the AOI's geoBoundary from the website.
Once you know the ISO3 code and level, construct a geoBoundaries API URL
(this is not the GeoJSON URL) of the following form, replacing <ISO3>
and
<LEVEL>
with appropriate values:
https://www.geoboundaries.org/api/current/gbOpen/<ISO3>/<LEVEL>/
For example, the ISO3 code for Gabon is GAB. Therefore, the geoBoundaries API URL for Gabon level 0 is https://www.geoboundaries.org/api/current/gbOpen/GAB/ADM0/.
You may use the geoBoundaries API URL in various ways to obtain your AOI's GeoJSON URL, such as one of the following:
-
Use a browser to navigate to the API URL. If your browser directly displays the result, locate the value of
"gjDownloadURL"
. If your browser forces you to download the result, do so, and locate the value of"gjDownloadURL"
within the downloaded file. In either case, the value associated with"gjDownloadURL"
is your AOI's GeoJSON URL. -
Alternatively, open a terminal window and run the following command, replacing
<API_URL>
appropriately. The output should be the GeoJSON URL:curl -s <API_URL> | tr ',' '\n' | grep "gjDownloadURL.*gbOpen" | sed -E 's/.*"(https.+)"/\1/'
Continuing with the Gabon example, entering the geoBoundaries API URL for Gabon (shown above) in a browser should result in the following (abridged) output (either in the browser, or within a downloaded file):
{
"boundaryID": "GAB-ADM0-25889322",
"boundaryName": "Gabon",
"boundaryISO": "GAB",
...
"gjDownloadURL": "https://github.com/wmgeolab/geoBoundaries/raw/9f8c9e0f3aa13c5d07efaf10a829e3be024973fa/releaseData/gbOpen/GAB/ADM0/geoBoundaries-GAB-ADM0.geojson",
...
"gbHumanitarian": {
...
}
}
Alternatively, using curl
from the terminal should also yield the same GeoJSON URL:
$ curl -s https://www.geoboundaries.org/api/current/gbOpen/GAB/ADM0/ | tr ',' '\n' | grep "gjDownloadURL.*gbOpen" | sed -E 's/.*"(https.+)"/\1/'
https://github.com/wmgeolab/geoBoundaries/raw/9f8c9e0f3aa13c5d07efaf10a829e3be024973fa/releaseData/gbOpen/GAB/ADM0/geoBoundaries-GAB-ADM0.geojson
You may use this GeoJSON URL for the aoi
input when running the GEDI
subsetting DPS job.
To contribute to this work, you must obtain access to the following:
- MAAP Documentation Examples hosted on GitHub: For creating new versions (releases) of the algorithms implemented in the repository.
- MAAP Documentation Examples hosted on GitLab: A copy of the GitHub repository, in order to enable registering new versions of the algorithms from within the ADE (which currently only supports GitLab repositories).
- NASA MAAP: Where the ADE resides, and thus where algorithms can be registered (from GitLab repositories).
To prepare for contributing, do the following in an ADE workspace:
-
Clone this GitHub repository.
-
Change directory to the cloned repository.
-
Add the GitLab repository as another remote (named
ade
here, but you may specify a different name for the remote):git remote add --tags -f ade https://repo.ops.maap-project.org/data-team/maap-documentation-examples.git
-
Create the
gedi_subset
virtual environment (NOTE: you will need to repeat this step whenever your restart your ADE workspace):gedi-subset/build.sh --dev
-
Activate the
gedi_subset
virtual environment:conda activate gedi_subset
-
Install Git pre-commit hooks:
pre-commit install --install-hooks
If you plan to do any development work outside of the ADE (such as on your local
workstation), perform the steps above in that location as well. NOTE: This
means that you must have conda
installed (see conda installation) in your
desired development location outside of the ADE workspace.
During development, you will create PRs against the GitHub repository, as explained below.
- Create a new branch based on an appropriate existing branch (typically based
on
main
). - Add your desired code and/or configuration changes.
- Update the value of
version
ingedi-subset/algorithm_config.yaml
according to the versioning convention referenced at the top of the Changelog. - Add appropriate entries to the Changelog, according to
the Keep a Changelog convention. In particular:
-
Add a new, second-level section of the following form:
## [VERSION] - YYYY-MM-DD
where:
VERSION
is the value ofversion
specified ingedi-subset/algorithm_config.yaml
YYYY-MM-DD
is the date that you expect to create the release (see the following steps), which may or may not be the current date, depending upon when you expect your PR (next step) to be approved and merged.
-
Add appropriate third-level sections under the new version section (for additions, changes, and fixes). Again, see Keep a Changelog.
-
- Submit a PR to the GitHub repository.
- Only when the PR is on a branch to be merged into the
main
branch and it has been approved and merged, create a new release in GitHub as follows:- Go to https://github.com/MAAP-Project/maap-documentation-examples/releases/new
- Click the Choose a tag dropdown.
- In the input box that appears, enter the same value as the new value of
version
ingedi-subset/algorithm_config.yml
, and click the Create a new tag label that appears immediately below the input box. - In the Release title input, also enter the same value as the new
value of
version
ingedi-subset/algorithm_config.yml
. - In the description text box, copy and paste from the Changelog file only the new version section you added earlier to the Changelog, including the new version heading.
- Click the Publish release button.
Once a release is published in the GitHub repository (see above), the code from the GitHub repository must be pushed to the GitLab repository in order to be able to register the new version of the algorithm, as follows, within the ADE:
-
Open a Terminal tab (if necessary) and change directory to the repository.
-
Pull the latest code from GitHub (to obtain merged PR, if necessary):
git checkout main git pull origin
-
Push the latest code to GitLab (replace
ade
with the appropriate remote name, if you didn't useade
earlier):git push --all ade git push --tags ade
NOTE: On occassion, you might get a "server certificate verification failed" error attempting to push to GitLab. If so, simply prefix the preceding commands with
GIT_SSL_NO_VERIFY=1
-
In the ADE's File Browser, navigate to
maap-documentation-examples/gedi-subset
. -
Right-click on
algorithm_config.yaml
and choose Register as MAS Algorithm from the context menu. -
Confirm that the value of the version field matches the GitHub release version you created above. If not, click Cancel and review earlier steps. If so, click Ok, which will trigger a build job that will take about 30 minutes.
-
Check the build job status at https://repo.ops.maap-project.org/root/register-job/-/jobs. If the job fails, you will need to correct the issue (and likely create a patch release, following the release steps again). Otherwise, you should now be able to open the DPS/MAS Operations menu, choose Execute DPS Job, and find the new version of the algorithm in the dropdown list for confirmation.
Country Boundaries from:
Runfola, D. et al. (2020) geoBoundaries: A global database of political administrative boundaries. PLoS ONE 15(4): e0231866. https://doi.org/10.1371/journal.pone.0231866