This repository contains all necessary codes for producing datasets and reproducing results for the manuscript "Global dominance of seasonality in shaping lake surface extent dynamics" (in review). Because the code for this manuscript is computationally intensive and requires a complex runtime environment, we have prepared a Docker image that should be set up on a local high-performance computer to run these analyses.
Some of the codes read a large dataset into RAM, so at least a 64 GB RAM is required (for Windows).
Note: Not having enough RAM may cause the program to crash.
Running codes of this manuscript requires running in a Docker container to ensure identical environment. Folders on the host machine are mounted in the container. Therefore, codes and data need to be downloaded and saved in the correct directory.
- Find a place on your local machine to store the codes and data (> 50 GB available). We use
your_pathto refer to this, and the path writing convention follows the Windows style (use '\' as separator). For Linux and MacOS, please just use '/'. - Create two sub-folders:
your_path\\codeandyour_path\\data - Follow the instructions below to download the codes and data.
- Navigate to the [GitHub repository of this manuscript], click the
< > Codebutton on the page, and then clickDownload ZIPto download all codes as a single compressed.zipfile. - Find a file named
global-dominance-of-seasonality-in-shaping-lake-surface-extent-dynamics-main.zipthat is downloaded to your local machine. - Decompress this
.zipfile, and get a folder namedglobal-dominance-of-seasonality-in-shaping-lake-surface-extent-dynamics-mainthat contains all codes. - Rename this folder to
global_lake_area. - Move
global_lake_areatoyour_path\\code, so we getyour_path\\code\\global_lake_area. The codes' structure will be likeyour_path\\code\\global_lake_area\\batch_processing\\....
All data can be generated using corresponding scripts. Datasets that are used for reproducing quantitative results below are separately hosted in a Zenodo repository for peer review .
- Navigate to the Zenodo repository and download the file
global_lake_area.zip. - Decompress the file
global_lake_area.zipand get a folder namedglobal_lake_area. - Rename this folder to
global_lake_area - Move
global_lake_areatoyour_path\\data, so we getyour_path\\data\\global_lake_area. The data's structure will be likeyour_path\\data\\global_lake_area\\area_csvs.
Docker is used to reproduce all related contents in this manuscript, ensuring identical runtime environments and saving time. The corresponding image can be pulled following instructions in the global-lake-area-runner DockerHub repository.
For installing Docker Desktop/Engine, please refer to the official documents:
To download the docker image automatically, please run the command below in your machine's terminal (for MacOS and Linux: Terminal; for Windows: Powershell or Terminal):
docker pull luoqili/global-lake-area-runner:v1.0
- Download VS Code. (Instructions)[https://code.visualstudio.com/download] can be found on their official website.
- Install the Remote Development extension in VS Code, which is required to run docker container as development environment.
- Open VS Code.
- Click the file option, click
Open Folder. - Open
your_path\\code\\global_lake_area, which is the folder that contains all codes.
- Check if there is a
.devcontainerfolder on the left panel, if not, please check steps above. - Modify the
.devcontainer/devcontainer.jsonfile, replacing the mounting paths to your real paths. Modify the "mount" parameter, replaceyour_pathwith your real path in these three position:
{"source": "your_path\\code", "target": "/WORK/Codes", "type": "bind"}(the downloaded GitHub repository should be decompressed to a folder, and this folder should be bind to the/WORK/Codesfolder in the virtual environment)
{"source": "your_path\\data", "target": "/WORK/Data", "type": "bind"}(this folder contains data downloaded from the Zenodo repository mentioned below, and is necessary for reproducing the figures and key numbers)
- With Remote Development extension installed, there should be a small blue
><mark on the lower left corner of the VS Code window. - Click that button and select
Reopen in Container. - After a short period of building and opening the docker image, the environment configuration process succeeds.
For the quantitative figures and key numbers, with steps above finished, you can find the corresponding .ipynb file as recorded below to generate the results. Just open the file and run all cells.
- Open the corresponding
.ipynbfile as indicated in the "Locations of quantitative results in the codes" section. - Click
Run All. - In the pop-up options for choosing python kernel, click
Python Environments..., and then clickPython 3.8.10 /usr/bin/python3. - Success.
- Cannot open Docker container
This is most likely due to incorrect settings for paths. Please make sure the paths are correctly typed. (for Windows, a correct path looks like
D:\\folder1\\folder2)
- Error running codes (e.g.,
package not exist,cannot find file path, and etc.)
This is due to incorrect path mounting in
devcontainer.json, please make sure the file sturcture is likeyour_path\\code\\global_lake_area\\batch_processing\\...andyour_path\\data\\global_lake_area\\area_csvs. In thedevcontainer.jsonfile, please make sure that theyour_path\\codeand theyour_path\\datafolders are used.
- Kernel crash errors and other messages with keywords like "free", "mem", and etc.
This is because the user's RAM does not meet the requirement of running the script. Please try using another high-performance computer with at least 64GB RAM installed. However, on Windows 11, this issues may be caused by the default limitation on memory use os WSL2-based docker. If in that case, please refer to the guidance here and here to set up the
.wslconfigfile to allocate at least 64GB RAM to docker.
Note: Require at least 64GB RAM, otherwise the program may crash.
For best reproducibility, codes in this repository should be downloaded to a global_lake_area parent directory, all descriptions below follow this convention.
-
Fig. 1: No quantitative data.
-
Fig. 2: Raw fig are generated in
global_lake_area/my_spatial_analyze/data_analyze/extreme_analysis/low_water_extreme_plotting.ipynb(in the top cell). Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Fig. 3: Raw fig are generated in
global_lake_area/my_spatial_analyze/data_analyze/grid_wise_analysis/grid_wise_plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Fig. 4: Raw fig are generated in
global_lake_area/my_spatial_analyze/data_analyze/time_series_analysis/time_series_plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Fig. 5: Raw fig are generated in
global_lake_area/my_spatial_analyze/data_analyze/extreme_analysis/low_water_extreme_plotting.ipynb(in the bottom cell). Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Extended Data Fig. 1: Raw fig are generated in
global_lake_area/my_spatial_analyze/data_validation/plot_compare_with_gsw.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Extended Data Fig. 2: Raw fig are generated in
global_lake_area/my_spatial_analyze/lake_wise_plotting.ipynbandglobal_lake_area/my_spatial_analyze/data_analyze/correlation_analysis/plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Extended Data Fig. 3: No quantitative data.
-
Extended Data Fig. 4: No quantitative data.
-
Extended Data Fig. 5: Raw fig are generated in
global_lake_area/my_plotting. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Supplementary Fig. 1: Raw fig are generated in
global_lake_area/my_spatial_analyze/basin_wise_analysis. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Supplementary Fig. 2: Raw fig are generated in
global_lake_area/my_spatial_analyze/lake_wise_plotting.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Supplementary Fig. 3: No quantitative data.
-
Supplementary Fig. 4: Raw fig are generated in
global_lake_area/my_spatial_analyze/main_grid.py. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Supplementary Fig. 5: Raw fig are generated in
global_lake_area/my_spatial_analyze/data_validation/data_validation_nb.ipynb. Fonts, layout, and sizes are further polished in Adobe Illustrator. -
Supplementary Table 1: Data are obtained in the
global_lake_area/training_records.csvas described in the "File Description and Usage" section. -
Others: numbers and percentages can be mostly found in
global_lake_area/my_spatial_analyze/data_analyze/extreme_analysis/low_water_extreme_plotting.ipynb. Others can be found following the "File Description and Usage" section.
Other scripts relates to the algorithm of this manuscript, which require running for several months, high-performance computing clusters, and terrabytes of input data. Therefore, they are not included in the reproducing process due to time and resource limit. For further information on this, a brief description for each file is provided below, with necessary, detailed usage added in the files.
-
global_lake_area/(folder)unetgee.py: Contains functions for GEE authentication, U-Net sample generation, training, validation, MODIS and GSW raster export, and U-Net prediction.unet_train.py: Calls theunet_trainfunction inunetgee.py, acting as a command-line interface for U-Net training.UNET_TRAIN_CONFIG.py: Configuration settings for a single U-Net training.update_config_unet_train_run.py: Used for batch U-Net training, updatingUNET_TRAIN_CONFIG.pyand callingunet_train.py.training_records.csv: Contains metadata for the U-Net models, such as sample sizes and model metrics.update_training_record.py: Used for updating thetraining_records.csvfile.unet_samples_generate_per_basin.ipynb: Notebook for exporting samples for U-Net training for each basin.unet_sample_size_count.ipynb: Calculates sample sizes for U-Net training, evaluation, and validation; updates thetraining_records.csvfile.unet_evaluation.py: Similar tounet_train.py, calculates performance metrics for each U-Net model and updates thetraining_records.csvfile.UNET_EVALUATION_CONFIG.py: Configuration settings for a single U-Net evaluation.unet_evaluation_update_config_and_run.py: Similar toupdate_config_unet_train_run.py, used for batch performance metrics calculation.selfee.py: Defines service-account-related methods for automated authentication to solve network issues.projection_wkt_generation.ipynb: Constructs customized Lambert Azimuth Equal Area (LAEA) projections for each basin in BasinATLAS lev02 product.hydrolakes_filter_by_bas.ipynb: Exports lake boundaries for U-Net sample generation (not for final area calculation).gsw_export.ipynb: Exports the occurrence and recurrence of GSW.gsw_occurrence_and_recurrence_mosaic.py: Mosaics tiled GSW occurrence and recurrence maps.export_modis_and_gsw_image.ipynb: Used for exporting MODIS and GSW images in LAEA projections and correct resolutions.draw_unet_train_history.py: Draws training and validation curves for each U-Net model.add_final_decision_to_records.py: Records the manually-selected optimal epoch to thetraining_records.csvfile.
global_lake_area/.devcontainer(folder)
Contains the devcontainer.json file that defines the container-based runtime environment for this manuscript.
-
global_lake_area/my_unet_definition(folder)__init__.py: Makes this folder a module.model.py: Contains implementation of U-Net models (attentionunetwas used).evaluation_metrics.py: Contains performance metrics and loss functions used (Intersection over Union, IoU).
-
global_lake_area/my_unet_gdal(folder)__init__.py: Makes this folder a module.reproject_to_target.py: Deprecated.combined.py: Deprecated.zonal_statistics.py: Deprecated.reproject_to_target_tile.py: Contains functions for clipping, reprojecting, and mosaicing large geotiff files.generate_tfrecord_from_tile.py: Contains functions for reprojecting, resampling, and converting geotiff files to tfrecord format.align_to_target_tile.py: Contains functions for geographically aligning and combining two rasters.unet_predictions.py: Contains functions for using trained U-Net models to process converted tfrecords.reconstruct_tile_from_prediction.py: Contains functions for converting serialized tfrecord files back to geotiff tiles.area_calculation.py: Contains functions for calculating areas from rasters using vector data as boundaries.quick_plotting.py: Contains functions for drawing PNGs and GIFs from geotiff files.quick_plotting_runner.py: Command-line interface forquick_plotting.py, takes coordinates in LAEA as input.
-
global_lake_area/batch_processing(folder)__init__.py: Makes this folder a module.batch_tfrecord_generation.py: Command-line interface for batch generation of MODIS-converted tfrecord files.batch_unet_prediction.py: Command-line interface for batch prediction using U-Net.batch_prediction_reconstruction.py: Command-line interface for batch reconstruction of water mask maps.batch_mosaic.py: Mosaics water mask tiles into a large geotiff file.batch_full.py: Combines multiple batch processing steps into one command-line interface.asynchronous_batch.py: Asynchronously callsbatch_full.pyto maximize usage of available computing resources.BATCH_CONFIG.py: Configurations forbatch_full.pyandasynchronous_batch.py.batch_area_calculation.py: Command-line interface for batch area calculation from mosaiced water mask maps.AREA_CALCULATION_CONFIG.py: Configurations forbatch_area_calculation.py(monthly lake surface extent results).MISSING_DATA_AREA_CALCULATION_CONFIG.py: Configurations forbatch_area_calculation.py(monthly cloud contamination ratio results).MASKED_MY_WATER_AREA_CALCULATION_CONFIG.py: Configurations forbatch_area_calculation.py(GSW-masked water mask map results).GSWR_AREA_CALCULATION_CONFIG.py: Configurations forbatch_area_calculation.py(GSW image results for validation).area_calculation_update_and_run.py: Updates config files automatically and runsbatch_area_calculation.py.load_config_module.py: Used for reading config files written in.pyformat.
global_lake_area/my_plotting(folder)
Contains scripts for plotting the performances of U-Net models trained in this study.
-
global_lake_area/my_spatial_analyze(folder)__init__.py: Makes this folder a module.area_postprocessing.py: Contains functions for post-processing lake surface water extracted from U-Net-generated water mask maps.lake_wise_area_postprocessor.py: Command-line interface for lake-wise postprocessing of lake surface extent time series.LAKE_WISE_AREA_POSTPROCESSING_CONFIG.py: Configurations forlake_wise_area_postprocessor.py.lake_wise_area_postprocess_update_and_run.py: Updates config files automatically and runslake_wise_area_postprocessor.py.lake_wise_lse_analyze.py: Contains functions used for lake-wise plotting.lake_wise_plotting.ipynb: Plotting for SI Fig. 2.visualization.py: Contains functions related to grid-wise plotting.main_grid.py: Explorative grid-wise plotting (deprecated).lake_concatenator.py: Combines lake-wise time series of lake surface extent in each basin into one large file for global 1.4 million lakes.glake_update_hydrolakes.py: Contains functions and command-line interface for updating HydroLAKES using GLAKES.hylak_buffering.py: Removes duplicated lakes and creates buffer zones for GLAKES-updated HydroLAKES.gsw_image_mosaic.py: Command-line interface for mosaicing tiled GSW images into one large geotiff file for validation.grid_concatenator.py: Deprecated.grid_analyze.py: Contains functions that perform a subset of grid-level analysis.cloud_cover_ratio_calculater.py: Command-line interface for calculating cloud cover ratios based on boundary size and monthly MODIS cloud-contamination area.basin_lse_calculation.py: Deprecated.attach_geometry_and_generate_grid.py: Contains functions that create grid from global (or regional) lakes and calculate corresponding statistics.area_to_volume.py: Deprecated.area_to_level.py: Deprecated.AREA_TO_LEVEL_CONFIG.py: Deprecated.area_to_level_batch_converter.py: Deprecated../data_analyze(folder)./basin_wise_analysis(folder)basin_wise_analysis.py: Contains functions for basin-wise analysis and plotting.basin_wise_plotting.ipynb: Plotting basin-wise figures (including reservoir contribution).basinatlas_statistics_calculator.py: Command-line interface for calculating statistics for BasinATLAS.hydrobasins_merger.py: Merges multiple shp files of HydroBASINS.hydrobasins_statistics_calculator.py: Command-line interface for calculating statistics for HydroBASINS.
./climate_analysis(folder)attach_aridity_index.py: Adds aridity index from LakeATLAS to the time series csv of lake surface extent.
./correlation_analysis(folder)plotting.ipynb: Plots median relative changes in seasonality by lake size (part of Extended Data Fig. 1).correlation_plots.py: Contains functions for plotting the relationship between multiple variables.
./extreme_analysis(folder)area_extreme_analysis.py: Contains functions that identify seasonality-induced low-water extremes and perform other analyses.low_water_extreme_analysis.ipynb: Adds extreme-related columns to the time series csv of lake surface extent.low_water_extreme_plotting.ipynb: Plots seasonality-induced low-water extremes and seasonality-dominance.
./grid_wise_analysis(folder): Contains plotting of changes in seasonality../permafrost_analysis(folder): Adds permafrost type column to the time series csv of lake surface extent../time_series_analysis(folder): Plots long-term trends.
./data_validation(folder): Validates our data using GSW estimates and altimetry-based water levels.
global_lake_area/projection_wkt(folder)
Contains LAEA projections used in this manuscript.