Skip to content

Using Airborne Hyperspectral Data for Satellite Image Classification with Principal Components and k‑Means #987

Description

@bhass-neon

What is the objective of the proposed tutorial?

In this tutorial, you will learn how to apply Principal Component Analysis (PCA) in Google Earth Engine to reduce the National Ecological Observatory Network's (NEON's) airborne reflectance data from hundreds of bands to a compact set of uncorrelated components. You will build a reproducible, memory‑efficient workflow using representative sampling, export and reload PCA outputs as Earth Engine assets, and visualize and interpret key principal components. You will run a k‑means clustering on the PCA‑transformed data along with NEON's airborne lidar derived Canopy Height Model (CHM) data to perform an unsupervised classification, label the clusters, and scale up by transferring those labels to train a model on cloud‑free Sentinel‑2 or Harmonized Landsat-Sentinel (HLS) imagery collected within ±14 days of the AOP overpass to produce broader‑scale maps. This tutorial illustrates a scaling application, leveraging high‑resolution (1 m spatial + 426 band spectral) airborne data to supervise coarser multispectral satellite imagery classification with higher revisit rates and broader spatial coverage.

What is the scope of the proposed tutorial?

Themes: ecology, forestry, land cover classification, PCA, k-Means, hyperspectral, lidar
Geographic area: United States (site is still TBD)

Please provide an outline of the structure of the proposed tutorial?

  1. Data and Study Area
    a. Brief introduction to NEON's Airborne Observation Platform and the NEON datasets on the GEE Publisher Catalog
    b. Load hyperspectral image (projects/neon-prod-earthengine/assets/HSI_REFL/002) and visualize natural color RGB + CHM (projects/neon-prod-earthengine/assets/CHM/001) data at the site
    c. Acquisition date noted for ±14 day alignment to cloud-free satellite data
  2. Compute Principle Component Analysis (PCA) on reflectance data and add CHM
    a. Sampling strategy for PCA and discussion of memory trade offs
    b. Helper functions (band naming; covariance/eigen decomposition; explained variance)
    c. Apply PCA, select number of PCs (e.g., top 5) and compute explained variance
    d. Export PCA image as an Earth Engine asset
    e. CHM integration, standardize for later use with PCs
    f. Checks and troubleshooting steps (masks, no data handling, runtime)
  3. Interpret Principal Components and Unsupervised k Means Clustering + CHM
    a. Visualize and interpret PCs
    b. Build standardized feature stack: [PC1..PCn] + CHM (all z scored)
    c. Human label assignment per cluster: review RGB, IR bands + CHM; assign a semantic label to each cluster
    d. Determine cluster confidence & apply de-noising; keep only high confidence pixels
    e. Create and export the high confidence, labeled AOP raster (cluster to class mapping) for scaling
  4. Scale Up to Multispectral Satellite data (HLS or Sentinel 2)
    a. Choose data source and assess trade offs: HLS v2 (30 m, harmonized, better coverage) vs Sentinel 2 SR (10 m, finer detail if clear)
    b. Build cloud free satellite composite within ±14 days of the AOP collection
    c. Integrate CHM at satellite scale and add as a predictor
    d. Transfer AOP labels with purity filters, re scale to coarser resolution of satellite data
    e. Stratified random sample of labeled pixels (at satellite grid)
    f. Supervised classifier trained on satellite bands (+ CHM): fit a Random Forest model (or similar) on sampled points
    g. Accuracy assessment: evaluate against the test set of NEON derived labels (aggregated to satellite grid)
    h. Apply at model scale and export: classify the region of interest; export class map and (optionally) class probability layers
  5. Quality Assurance, Troubleshooting, and Iteration
    a. Diagnose class wise errors; explain options to make adjustments to k, confidence threshold, and purity threshold
    b. Consider widening the ±14 day window (balanced against phenology)
    c. Address domain shift considerations with multi site/multi date training data
    d. Reminder: This lesson includes cluster level human labeling; more granular, polygon based labeling and advanced supervised training workflows are out of scope but are referenced with external resources (e.g. link to tutorial for creating training data from NEON observational data sets).

Notes:

  • See Principal Component Analysis of AOP Hyperspectral Data in GEE or the markdown file on NEON-Data-Skills GitHub for an existing tutorial showcasing the first part of this lesson (steps 1-3a.)

  • If this is too much content / too involved for a single lesson, we could make a lesson out of steps 1-3 and then later on make a different scaling application lesson using the NEON Canopy Nitrogen Dataset (projects/neon-prod-earthengine/assets/CNC/002) as the training data for generating larger-scale satellite models. This CNC dataset is still beta, so would like to wait a 3-6 months before creating a lesson using it.

In what format will you be submitting the tutorial?

Markdown

This request will be reviewed by the Earth Engine community maintainers, who will reply on this issue tracker with any questions or suggestions. Once approved, this issue will be assigned to you and you can begin work on the tutorial following instructions in Writing a tutorial. When creating your Pull Request, enter "Closes #issueno" in the description of your Pull Request to link the tutorial to this issue.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions