Global Lake Surface Extent Dynamics

This repository contains all necessary codes for producing datasets and reproducing results for the manuscript "Global dominance of seasonality in shaping lake surface extent dynamics" (in review). Because the code for this manuscript is computationally intensive and requires a complex runtime environment, we have prepared a Docker image that should be set up on a local high-performance computer to run these analyses.

Reproducing results

System requirement

Some of the codes read a large dataset into RAM, so at least a 64 GB RAM is required (for Windows).
Note: Not having enough RAM may cause the program to crash.

Folder preparation

Running codes of this manuscript requires running in a Docker container to ensure identical environment. Folders on the host machine are mounted in the container. Therefore, codes and data need to be downloaded and saved in the correct directory.

Find a place on your local machine to store the codes and data (> 50 GB available). We use your_path to refer to this, and the path writing convention follows the Windows style (use '\' as separator). For Linux and MacOS, please just use '/'.
Create two sub-folders: your_path\\code and your_path\\data
Follow the instructions below to download the codes and data.

Code download

Navigate to the [GitHub repository of this manuscript], click the < > Code button on the page, and then click Download ZIP to download all codes as a single compressed .zip file.
Find a file named global-dominance-of-seasonality-in-shaping-lake-surface-extent-dynamics-main.zip that is downloaded to your local machine.
Decompress this .zip file, and get a folder named global-dominance-of-seasonality-in-shaping-lake-surface-extent-dynamics-main that contains all codes.
Rename this folder to global_lake_area.
Move global_lake_area to your_path\\code, so we get your_path\\code\\global_lake_area. The codes' structure will be like your_path\\code\\global_lake_area\\batch_processing\\....

Data download

All data can be generated using corresponding scripts. Datasets that are used for reproducing quantitative results below are separately hosted in a Zenodo repository for peer review .

Navigate to the Zenodo repository and download the file global_lake_area.zip.
Decompress the file global_lake_area.zip and get a folder named global_lake_area.
Rename this folder to global_lake_area
Move global_lake_area to your_path\\data, so we get your_path\\data\\global_lake_area. The data's structure will be like your_path\\data\\global_lake_area\\area_csvs.

Docker installation

Docker is used to reproduce all related contents in this manuscript, ensuring identical runtime environments and saving time. The corresponding image can be pulled following instructions in the global-lake-area-runner DockerHub repository.

For installing Docker Desktop/Engine, please refer to the official documents:

Windows
MacOS
Linux

To download the docker image automatically, please run the command below in your machine's terminal (for MacOS and Linux: Terminal; for Windows: Powershell or Terminal):

docker pull luoqili/global-lake-area-runner:v1.0

Container creation

Install VS Code and extentions

Download VS Code. (Instructions)[https://code.visualstudio.com/download] can be found on their official website.
Install the Remote Development extension in VS Code, which is required to run docker container as development environment.

Open folder in VS Code

Open VS Code.
Click the file option, click Open Folder.
Open your_path\\code\\global_lake_area, which is the folder that contains all codes.

Reopen in container

Check if there is a .devcontainer folder on the left panel, if not, please check steps above.
Modify the .devcontainer/devcontainer.json file, replacing the mounting paths to your real paths. Modify the "mount" parameter, replace your_path with your real path in these three position:

{"source": "your_path\\code", "target": "/WORK/Codes", "type": "bind"} (the downloaded GitHub repository should be decompressed to a folder, and this folder should be bind to the /WORK/Codes folder in the virtual environment)

{"source": "your_path\\data", "target": "/WORK/Data", "type": "bind"} (this folder contains data downloaded from the Zenodo repository mentioned below, and is necessary for reproducing the figures and key numbers)

With Remote Development extension installed, there should be a small blue >< mark on the lower left corner of the VS Code window.
Click that button and select Reopen in Container.
After a short period of building and opening the docker image, the environment configuration process succeeds.

Running Codes and reproduce the results

For the quantitative figures and key numbers, with steps above finished, you can find the corresponding .ipynb file as recorded below to generate the results. Just open the file and run all cells.

Open the corresponding .ipynb file as indicated in the "Locations of quantitative results in the codes" section.
Click Run All.
In the pop-up options for choosing python kernel, click Python Environments..., and then click Python 3.8.10 /usr/bin/python3.
Success.

Troubleshooting

Cannot open Docker container

This is most likely due to incorrect settings for paths. Please make sure the paths are correctly typed. (for Windows, a correct path looks like D:\\folder1\\folder2)

Error running codes (e.g., package not exist, cannot find file path, and etc.)

This is due to incorrect path mounting in devcontainer.json, please make sure the file sturcture is like your_path\\code\\global_lake_area\\batch_processing\\... and your_path\\data\\global_lake_area\\area_csvs. In the devcontainer.json file, please make sure that the your_path\\code and the your_path\\data folders are used.

Kernel crash errors and other messages with keywords like "free", "mem", and etc.

This is because the user's RAM does not meet the requirement of running the script. Please try using another high-performance computer with at least 64GB RAM installed. However, on Windows 11, this issues may be caused by the default limitation on memory use os WSL2-based docker. If in that case, please refer to the guidance here and here to set up the .wslconfig file to allocate at least 64GB RAM to docker.

Locations of quantitative results in the codes

Note: Require at least 64GB RAM, otherwise the program may crash.

For best reproducibility, codes in this repository should be downloaded to a global_lake_area parent directory, all descriptions below follow this convention.

Fig. 1: No quantitative data.
Fig. 2: Raw fig are generated in global_lake_area/my_spatial_analyze/data_analyze/extreme_analysis/low_water_extreme_plotting.ipynb (in the top cell). Fonts, layout, and sizes are further polished in Adobe Illustrator.
Fig. 3: Raw fig are generated in global_lake_area/my_spatial_analyze/data_analyze/grid_wise_analysis/grid_wise_plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Fig. 4: Raw fig are generated in global_lake_area/my_spatial_analyze/data_analyze/time_series_analysis/time_series_plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Fig. 5: Raw fig are generated in global_lake_area/my_spatial_analyze/data_analyze/extreme_analysis/low_water_extreme_plotting.ipynb (in the bottom cell). Fonts, layout, and sizes are further polished in Adobe Illustrator.
Extended Data Fig. 1: Raw fig are generated in global_lake_area/my_spatial_analyze/data_validation/plot_compare_with_gsw.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Extended Data Fig. 2: Raw fig are generated in global_lake_area/my_spatial_analyze/lake_wise_plotting.ipynb and global_lake_area/my_spatial_analyze/data_analyze/correlation_analysis/plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Extended Data Fig. 3: No quantitative data.
Extended Data Fig. 4: No quantitative data.
Extended Data Fig. 5: Raw fig are generated in global_lake_area/my_plotting. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Supplementary Fig. 1: Raw fig are generated in global_lake_area/my_spatial_analyze/basin_wise_analysis. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Supplementary Fig. 2: Raw fig are generated in global_lake_area/my_spatial_analyze/lake_wise_plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Supplementary Fig. 3: No quantitative data.
Supplementary Fig. 4: Raw fig are generated in global_lake_area/my_spatial_analyze/main_grid.py. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Supplementary Fig. 5: Raw fig are generated in global_lake_area/my_spatial_analyze/data_validation/data_validation_nb.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator.
Supplementary Table 1: Data are obtained in the global_lake_area/training_records.csv as described in the "File Description and Usage" section.
Others: numbers and percentages can be mostly found in global_lake_area/my_spatial_analyze/data_analyze/extreme_analysis/low_water_extreme_plotting.ipynb. Others can be found following the "File Description and Usage" section.

File Description and Usage

Other scripts relates to the algorithm of this manuscript, which require running for several months, high-performance computing clusters, and terrabytes of input data. Therefore, they are not included in the reproducing process due to time and resource limit. For further information on this, a brief description for each file is provided below, with necessary, detailed usage added in the files.

global_lake_area/ (folder)
- unetgee.py: Contains functions for GEE authentication, U-Net sample generation, training, validation, MODIS and GSW raster export, and U-Net prediction.
- unet_train.py: Calls the unet_train function in unetgee.py, acting as a command-line interface for U-Net training.
- UNET_TRAIN_CONFIG.py: Configuration settings for a single U-Net training.
- update_config_unet_train_run.py: Used for batch U-Net training, updating UNET_TRAIN_CONFIG.py and calling unet_train.py.
- training_records.csv: Contains metadata for the U-Net models, such as sample sizes and model metrics.
- update_training_record.py: Used for updating the training_records.csv file.
- unet_samples_generate_per_basin.ipynb: Notebook for exporting samples for U-Net training for each basin.
- unet_sample_size_count.ipynb: Calculates sample sizes for U-Net training, evaluation, and validation; updates the training_records.csv file.
- unet_evaluation.py: Similar to unet_train.py, calculates performance metrics for each U-Net model and updates the training_records.csv file.
- UNET_EVALUATION_CONFIG.py: Configuration settings for a single U-Net evaluation.
- unet_evaluation_update_config_and_run.py: Similar to update_config_unet_train_run.py, used for batch performance metrics calculation.
- selfee.py: Defines service-account-related methods for automated authentication to solve network issues.
- projection_wkt_generation.ipynb: Constructs customized Lambert Azimuth Equal Area (LAEA) projections for each basin in BasinATLAS lev02 product.
- hydrolakes_filter_by_bas.ipynb: Exports lake boundaries for U-Net sample generation (not for final area calculation).
- gsw_export.ipynb: Exports the occurrence and recurrence of GSW.
- gsw_occurrence_and_recurrence_mosaic.py: Mosaics tiled GSW occurrence and recurrence maps.
- export_modis_and_gsw_image.ipynb: Used for exporting MODIS and GSW images in LAEA projections and correct resolutions.
- draw_unet_train_history.py: Draws training and validation curves for each U-Net model.
- add_final_decision_to_records.py: Records the manually-selected optimal epoch to the training_records.csv file.

global_lake_area/.devcontainer (folder)

Contains the devcontainer.json file that defines the container-based runtime environment for this manuscript.

global_lake_area/my_unet_definition (folder)
- __init__.py: Makes this folder a module.
- model.py: Contains implementation of U-Net models (attentionunet was used).
- evaluation_metrics.py: Contains performance metrics and loss functions used (Intersection over Union, IoU).

global_lake_area/my_unet_gdal (folder)
- __init__.py: Makes this folder a module.
- reproject_to_target.py: Deprecated.
- combined.py: Deprecated.
- zonal_statistics.py: Deprecated.
- reproject_to_target_tile.py: Contains functions for clipping, reprojecting, and mosaicing large geotiff files.
- generate_tfrecord_from_tile.py: Contains functions for reprojecting, resampling, and converting geotiff files to tfrecord format.
- align_to_target_tile.py: Contains functions for geographically aligning and combining two rasters.
- unet_predictions.py: Contains functions for using trained U-Net models to process converted tfrecords.
- reconstruct_tile_from_prediction.py: Contains functions for converting serialized tfrecord files back to geotiff tiles.
- area_calculation.py: Contains functions for calculating areas from rasters using vector data as boundaries.
- quick_plotting.py: Contains functions for drawing PNGs and GIFs from geotiff files.
- quick_plotting_runner.py: Command-line interface for quick_plotting.py, takes coordinates in LAEA as input.

global_lake_area/batch_processing (folder)
- __init__.py: Makes this folder a module.
- batch_tfrecord_generation.py: Command-line interface for batch generation of MODIS-converted tfrecord files.
- batch_unet_prediction.py: Command-line interface for batch prediction using U-Net.
- batch_prediction_reconstruction.py: Command-line interface for batch reconstruction of water mask maps.
- batch_mosaic.py: Mosaics water mask tiles into a large geotiff file.
- batch_full.py: Combines multiple batch processing steps into one command-line interface.
- asynchronous_batch.py: Asynchronously calls batch_full.py to maximize usage of available computing resources.
- BATCH_CONFIG.py: Configurations for batch_full.py and asynchronous_batch.py.
- batch_area_calculation.py: Command-line interface for batch area calculation from mosaiced water mask maps.
- AREA_CALCULATION_CONFIG.py: Configurations for batch_area_calculation.py (monthly lake surface extent results).
- MISSING_DATA_AREA_CALCULATION_CONFIG.py: Configurations for batch_area_calculation.py (monthly cloud contamination ratio results).
- MASKED_MY_WATER_AREA_CALCULATION_CONFIG.py: Configurations for batch_area_calculation.py (GSW-masked water mask map results).
- GSWR_AREA_CALCULATION_CONFIG.py: Configurations for batch_area_calculation.py (GSW image results for validation).
- area_calculation_update_and_run.py: Updates config files automatically and runs batch_area_calculation.py.
- load_config_module.py: Used for reading config files written in .py format.

global_lake_area/my_plotting (folder)

Contains scripts for plotting the performances of U-Net models trained in this study.

global_lake_area/my_spatial_analyze (folder)
- __init__.py: Makes this folder a module.
- area_postprocessing.py: Contains functions for post-processing lake surface water extracted from U-Net-generated water mask maps.
- lake_wise_area_postprocessor.py: Command-line interface for lake-wise postprocessing of lake surface extent time series.
- LAKE_WISE_AREA_POSTPROCESSING_CONFIG.py: Configurations for lake_wise_area_postprocessor.py.
- lake_wise_area_postprocess_update_and_run.py: Updates config files automatically and runs lake_wise_area_postprocessor.py.
- lake_wise_lse_analyze.py: Contains functions used for lake-wise plotting.
- lake_wise_plotting.ipynb: Plotting for SI Fig. 2.
- visualization.py: Contains functions related to grid-wise plotting.
- main_grid.py: Explorative grid-wise plotting (deprecated).
- lake_concatenator.py: Combines lake-wise time series of lake surface extent in each basin into one large file for global 1.4 million lakes.
- glake_update_hydrolakes.py: Contains functions and command-line interface for updating HydroLAKES using GLAKES.
- hylak_buffering.py: Removes duplicated lakes and creates buffer zones for GLAKES-updated HydroLAKES.
- gsw_image_mosaic.py: Command-line interface for mosaicing tiled GSW images into one large geotiff file for validation.
- grid_concatenator.py: Deprecated.
- grid_analyze.py: Contains functions that perform a subset of grid-level analysis.
- cloud_cover_ratio_calculater.py: Command-line interface for calculating cloud cover ratios based on boundary size and monthly MODIS cloud-contamination area.
- basin_lse_calculation.py: Deprecated.
- attach_geometry_and_generate_grid.py: Contains functions that create grid from global (or regional) lakes and calculate corresponding statistics.
- area_to_volume.py: Deprecated.
- area_to_level.py: Deprecated.
- AREA_TO_LEVEL_CONFIG.py: Deprecated.
- area_to_level_batch_converter.py: Deprecated.
- ./data_analyze (folder)
  - ./basin_wise_analysis (folder)
    - basin_wise_analysis.py: Contains functions for basin-wise analysis and plotting.
    - basin_wise_plotting.ipynb: Plotting basin-wise figures (including reservoir contribution).
    - basinatlas_statistics_calculator.py: Command-line interface for calculating statistics for BasinATLAS.
    - hydrobasins_merger.py: Merges multiple shp files of HydroBASINS.
    - hydrobasins_statistics_calculator.py: Command-line interface for calculating statistics for HydroBASINS.
  - ./climate_analysis (folder)
    - attach_aridity_index.py: Adds aridity index from LakeATLAS to the time series csv of lake surface extent.
  - ./correlation_analysis (folder)
    - plotting.ipynb: Plots median relative changes in seasonality by lake size (part of Extended Data Fig. 1).
    - correlation_plots.py: Contains functions for plotting the relationship between multiple variables.
  - ./extreme_analysis (folder)
    - area_extreme_analysis.py: Contains functions that identify seasonality-induced low-water extremes and perform other analyses.
    - low_water_extreme_analysis.ipynb: Adds extreme-related columns to the time series csv of lake surface extent.
    - low_water_extreme_plotting.ipynb: Plots seasonality-induced low-water extremes and seasonality-dominance.
  - ./grid_wise_analysis (folder): Contains plotting of changes in seasonality.
  - ./permafrost_analysis (folder): Adds permafrost type column to the time series csv of lake surface extent.
  - ./time_series_analysis (folder): Plots long-term trends.
- ./data_validation (folder): Validates our data using GSW estimates and altimetry-based water levels.

global_lake_area/projection_wkt (folder)

Contains LAEA projections used in this manuscript.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global Lake Surface Extent Dynamics

Reproducing results

System requirement

Folder preparation

Code download

Data download

Docker installation

Container creation

Install VS Code and extentions

Open folder in VS Code

Reopen in container

Running Codes and reproduce the results

Troubleshooting

Locations of quantitative results in the codes

File Description and Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.devcontainer		.devcontainer
__pycache__		__pycache__
batch_processing		batch_processing
my_plotting		my_plotting
my_spatial_analyze		my_spatial_analyze
my_unet_definition		my_unet_definition
my_unet_gdal		my_unet_gdal
projection_wkt/Lambert_Azimuthal_Equal_Area		projection_wkt/Lambert_Azimuthal_Equal_Area
.DS_Store		.DS_Store
README.md		README.md
UNET_EVALUATION_CONFIG.py		UNET_EVALUATION_CONFIG.py
UNET_TRAIN_CONFIG.py		UNET_TRAIN_CONFIG.py
add_final_decision_to_records.py		add_final_decision_to_records.py
draw_unet_train_history.py		draw_unet_train_history.py
export_modis_and_gsw_image.ipynb		export_modis_and_gsw_image.ipynb
gsw_export.ipynb		gsw_export.ipynb
gsw_occurrence_and_recurrence_mosaic.py		gsw_occurrence_and_recurrence_mosaic.py
hydrolakes_filter_by_bas.ipynb		hydrolakes_filter_by_bas.ipynb
projection_wkt_generation.ipynb		projection_wkt_generation.ipynb
selfee.py		selfee.py
training_records.csv		training_records.csv
unet_evaluation.py		unet_evaluation.py
unet_evaluation_update_config_and_run.py		unet_evaluation_update_config_and_run.py
unet_sample_size_count.ipynb		unet_sample_size_count.ipynb
unet_samples_generate_per_basin.ipynb		unet_samples_generate_per_basin.ipynb
unet_train.py		unet_train.py
unetgee.py		unetgee.py
update_config_unet_train_run.py		update_config_unet_train_run.py
update_training_record.py		update_training_record.py

Folders and files

Latest commit

History

Repository files navigation

Global Lake Surface Extent Dynamics

Reproducing results

System requirement

Folder preparation

Code download

Data download

Docker installation

Container creation

Install VS Code and extentions

Open folder in VS Code

Reopen in container

Running Codes and reproduce the results

Troubleshooting

Locations of quantitative results in the codes

File Description and Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages