Skip to content

Spatial-Data-Science-and-GEO-AI-Lab/UGSU-NET-Identifying-urban-greening-with-U-Net

Repository files navigation

Automatically Mapping Urban Green Space Using Sentinel-2 Imagery and Deep Learning Methods in Multiple Cities Worldwide: A Convolutional Neural Network Approach

The main workflow of this project is,

  • First, data preparation, including importing satellite images and corresponding masks (the ground truth for where are urban green spaces (UGSs) and where are not), creating chips, balancing data, and data augmentation.

  • Second, model building and training. This was done on the 13 chosen cities from different continents: San Francisco, Seattle, Denver, Philadelphia, Greater Manchester, Dublin, Amsterdam, Ghent, Dhaka, Vancouver, Dallas, London, and Buffalo. Both U-Net model from base level and with pretrained backbones were tested, with multiple combinations of input bands.

  • Third, model validation and prediction on external cities. In this step, models with the best performance would be used to validate on Washington D.C. and Tel Aviv, and predict on Kampala.

Data preparation

Satellite images were generated from Sentinel-2, downloaded at EarthExplorer, with spectral information of blue (B2), green (B3), red (B4), near-infrared (NIR, B8). Together with these bands, NDVI, NDBI, NDWI were also calculated and put into training. Additionally, we got landcover data and added this as another layer.

Functions defined in this step include read_file(), normalize_by_layer(), create_chips(), remove_images(), with detailed explanation shown in the table below. Python scripts are in 1_Data_preparation.

Function Description Inputs Parameters set in this study
read_file() read the satellite image file and corresponding mask as np.array of a given city dir, city, image_file_name, park_file_name: directory and file name of each image
normalize_by_layer() to normalize image data to the same max and min. Since different layers have different scales, normalization will be done layer by layer image_array: np.array of the image set max to be 1 and min to be 0
create_chips() create chips for satellite image and corresponding masks image_file, park_file: np.array for satellite image and mask
patch_size: the number of pixels of each chip
step: stride, which represents the distance travelled during the motion
patch_size = 256 pixels
step = 32, 64, 80 pixles for different cities to make each training city has similar contribution
remove_images() remove images and corresponding masks with high proportion of backgrounds (non-UGS pixels) to balance data image_dataset, park_dataset: np.array of image and mask
threshold: chip pairs with the proportion of backgrounds higher than the threshold are removed
threshold = 90% for Dhaka and 86% for other cities chosen from the trade-off between the training set size and degree of balance

Examples of image chip pairs are:

Techniques chosen for data augmentation are (1) random rotation within an angle of 45 degrees; (2) random width and height shift within a range of 20%; (3) randomly horizontal and vertical flip; (4) randomly zooming in and out within a range of 20%. The python library used was ImageDataGenerator. Python scripts are in 2_Data_augmentation. Example is shown as

Model training

U-Net model from scratch

U-Net model from scratch was built in 3_Model_UNet, with the number of filters chosen to be (64, 128, 256, 512, 1024). Due to limited system memory, eight three-bands combinations were used instead of all the 8 layers. Model performance from different combinations are:

Combination of bands OA IoU F-score AUC
Red-Green-Blue 0.8329 0.5394 0.5438 0.6894
Red-Green-NIR 0.8531 0.6425 0.6227 0.7373
NDVI-Red-NIR 0.8724 0.6429 0.6921 0.7745
NDWI-Red-NIR 0.8977 0.7185 0.7819 0.8457
NDBI-Red-NIR 0.8498 0.6072 0.6583 0.7599
NDVI-NDWI-NDBI 0.8515 0.5846 0.6126 0.7252
NDVI-NDWI-landcover 0.8502 0.5964 0.6365 0.7431
NDVI-NDBI-landcover 0.8914 0.7044 0.7682 0.8368
Average 0.8624 0.6295 0.6645 0.7640

U-Net model with pretrained backbones

Compared with U-Net model from scrath, a pretrained backbone can increase model performace and make the training converge faster. ResNet-50 and VGG-16 pretrained on ImageNet were used as the encoder part of U-Net model in 3_Model_UNet_with_backbones, with performance on training cities shown as:

Combination of bands ResNet-50 VGG-16
OA IoU F-score AUC OA IoU F-score AUC
Red-Green-Blue 0.9708 0.9220 0.9590 0.9610 0.2535 0.1243 0.2001 0.5013
Red-Green-NIR 0.9718 0.9243 0.9602 0.9596 0.7508 0.3769 0.4319 0.4996
NDVI-Red-NIR 0.9709 0.9223 0.9591 0.9608 0.7259 0.3857 0.4588 0.4952
NDWI-Red-NIR 0.9712 0.9225 0.9592 0.9616 0.2804 0.1456 0.2409 0.5063
NDBI-Red-NIR 0.9704 0.9208 0.9583 0.9607 0.7439 0.3766 0.4348 0.4980
NDVI-NDWI-NDBI 0.9691 0.9178 0.9566 0.9565 0.2476 0.1222 0.1960 0.5000
NDVI-NDWI-landcover 0.9660 0.9107 0.9526 0.9532 0.6020 0.3581 0.4834 0.4768
NDVI-NDBI-landcover 0.9656 0.9100 0.9522 0.9536 0.2468 0.1222 0.1961 0.5000
Average 0.9695 0.9188 0.9572 0.9584 0.5139 0.2515 0.3303 0.4972

ResNet-50 backbone added much benefit to the model, while VGG-16 made the model perform worse. The converge histories of the three models are plotted below. ResNet-50 helps the training converge faster, however VGG-16 has difficulty to converge within the 50 epochs.

Validation & prediction on external cities

U-Net with pretrained ResNet-50 backbone was used to do validation. The average OA, IoU, F-score, and AUC for Washington D.C. are 0.8743, 0.6185, 0.7236, and 0.7429, for Tel Aviv are 0.8790, 0.4954, 0.5660, and 0.5639, showing a moderate to good generalization capacity of our model.

We also used Washington D.C. as an example to explore where our model performed well and where could be improved, with results shown below. We can see that the model made a good prediction for clumped UGSs, which are circled in red. However, for UGSs which have buildings inside, for instance, the blue areas in rectangle, our model did not give an accurate identification. Additionally, the model also generated some UGSs which are not in the ground truth dataset, mainly golf courses, national parks, and cemeteries, which are not freely accessible to the public or not normal UGSs under the hard definition of UGS.

Python scripts to do this are documented in 4_Prediction_and_save_as_tiff. Predictions were made for each image chip, and then predicted chips were merged together to a numpy array. This numpy array then was converted into tiff file using given metadata. Main functions used were unpatchify in patchify library and the defined array2raster.

Multi cities solution

If you are training model on multiple cities, you can use Google Earth Engine to automatically download satellite images, automatically crop satellite images into chips, and create corresponding chips for masks. Main packages used were ee and gdal, with scripts shown in 5_Multi_city_solution _with _GEE

Examples of the whole training process are shown in the folder of Notebook.

Main references

Literatures:

Mapping Urban Green Spaces at the Metropolitan Level Using Very High Resolution Satellite Imagery and Deep Learning Techniques for Semantic Segmentation, https://doi.org/10.3390/rs13112031

An Automatic Extraction Architecture of Urban Green Space Based on DeepLabv3plus Semantic Segmentation Model, https://doi.org/10.1109/ICIVC47709.2019.8981007

Using convolutional networks and satellite imagery to identify patterns in urban environments at a large scale, https://doi.org/10.48550/ARXIV.1704.02965

Codes:

https://segmentation-models.readthedocs.io/en/latest/api.html#unet

https://github.com/bnsreenu/python_for_microscopists

https://geemap.org/notebooks/96_image_chips/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published