A catalog of this nature would search the Earth, understand crucial changes, and facilitate democratization
Authors: Jason Gilman, Adeel Hassan, Nathan Zimmerman
Massive Earth Observation (EO) archives containing hundreds of petabytes of data offer a huge potential for understanding the Earth’s resources and potential hazards to security while also addressing societal challenges. Even so, extracting actionable insights remains a challenge. This data requires specialized knowledge in machine learning and geospatial expertise as well as significant computational resources. While the size and value of these archives is constantly increasing, we are not making meaningful progress fast enough to handle the challenges of our changing world.
We propose creating a centralized vector embeddings catalog that enables users to extract insights from EO data. EO Foundation Models, having been trained on millions of geographically diverse data points, have the ability to convert raw EO data into vector embeddings, a compressed semantic representation of the data. By indexing these vector embeddings into a catalog, we enable users to directly query complex phenomena like agricultural areas showing drought stress, illegal mining operations, or construction projects in flood zones without specialized technical knowledge. Additionally, users can use a vector embeddings catalog to quickly search for relevant changes over time, such as an increase in conditions leading to potential wildfires.
We argue how this effort aligns well with NASA’s declared goals of accelerating data discovery and democratizing data access and why NASA is uniquely positioned to undertake it. We further lay out a phased program for building this catalog and discuss important challenges and risks that will need to be addressed along the way.
See the PDF document in this repository for the full paper.
We'd like to thank the Earth Genome team, Dan Pilone (Element 84), Julia Signell (Element 84), Ryan Abbott (Element 84) and Sara Mack (Element 84) for their support in providing input and reviews for this paper.