-
Notifications
You must be signed in to change notification settings - Fork 2
Final Report
- Project: AI-ready Dataset Metadata as a Service
- Organization: OSGeo (Open Source Geospatial Foundation), ZOO-Project
This GSoC project focused on implementing native support for GeoCroissant metadata to enable AI-ready geospatial datasets. The implementation provides functionalities for metadata generation, transformation, and validation, as well as integration pathways with platforms like STAC, Earth Engine, and Hugging Face. It also introduces Data-Centric AI (DCAI) workflows to assess and improve dataset quality, addressing issues such as annotation errors and bias. This enhancement improves metadata interoperability, streamlines data preparation for machine learning, and lays the foundation for future cross-platform adoption.
While the ZOO-Project already offers solid support for OGC-compliant geoprocessing, it currently doesn’t have built-in support for GeoCroissant—a metadata standard designed specifically for AI-ready geospatial datasets. There are no tools available within ZOO to help users create or validate this kind of metadata or to connect easily with existing platforms like STAC, Earth Engine, or machine learning hubs like HuggingFace and Kaggle. It also lacks workflows that can help users check the quality of their training data or fix common issues like annotation errors or bias. This project aims to fill those gaps and bring these much-needed features to the ZOO-Project.
This project introduces Data-Centric AI (DCAI) workflows with support for GeoCroissant metadata, enabling more effective creation, transformation, and validation of AI-ready geospatial datasets. It helps users improve dataset quality, streamline data preparation for machine learning, and ensure better interoperability with existing standards and platforms.
- Simplified generation and conversion of GeoCroissant-compliant metadata.
- Enhanced data quality assessment through DCAI-focused tools.
- Improved discoverability and usability of datasets for AI workflows.
- A foundation for seamless integration with broader geospatial and AI ecosystems.
For detailed implementation examples and workflows, please refer to the Log of Pull Requests section below.
After the completion of the core integrations, I aim to expand support to platforms like OpenML by adding geospatial attribute addons and finalizing pending integrations, such as compatibility with the Hugging Face dataset viewer. This will enhance the ecosystem's robustness, accessibility, and adoption potential. Additionally, I plan to explore further improvements based on community feedback and extended testing.
-
Real-World Experience: GSoC 2025 and my work with the ZOO-Project gave me the opportunity to contribute to meaningful, real-world projects in the open-source ecosystem. This experience helped me bridge the gap between theory and practice by applying my knowledge to create AI-ready geospatial metadata tools.
-
Hands-On Coding: Throughout the project, I wrote code, developed new features, resolved bugs, and worked on improving the integration of GeoCroissant. These practical tasks sharpened my problem-solving skills and deepened my understanding of software development workflows.
-
Mentorship: Collaborating with experienced mentors was a significant part of this journey. Their guidance and constructive feedback not only helped me overcome technical challenges but also shaped the way I approach complex problems.
-
Collaboration: Working within a distributed team taught me the importance of collaborative development practices. I became comfortable with version control systems, code reviews, and pull request workflows all essential skills in modern open-source projects.
-
Networking: This project connected me with a vibrant global network of developers, researchers, and open-source enthusiasts in the geospatial AI field. Building these connections has opened up possibilities for future collaborations and valuable professional relationships.
-
Project Management: Managing tasks, planning milestones, and delivering updates on time were crucial aspects of this experience. It taught me how to stay organized and maintain steady progress in a dynamic development environment.
-
Communication Skills: Regularly documenting my work and sharing progress with mentors and the community helped me improve my written and professional communication. Clear communication made collaboration smoother and ensured everyone stayed aligned.
-
Open-Source Culture: Being part of this project reinforced the importance of transparency, community involvement, and collaborative growth values that I will carry forward in my future contributions to open-source.
Link to all the Pull Requests made in GSoC Project
| Pull Request / Commit | Description | Date | Status |
|---|---|---|---|
| Introduction to Geocroissant | Introduction and basics of GeoCroissant | 08/06/2025 | OK |
| GeoCroissant to STAC | GeoCroissant to STAC conversion | 15/06/2025 | OK |
| GeoCroissant to GeoDCAT | GeoCroissant to GeoDCAT transformation | 22/06/2025 | OK |
| STAC to GeoCroissant | STAC to GeoCroissant mapping | 29/06/2025 | OK |
| #3212: GeoCroissant support for HF | GeoCroissant support into Hugging Face’s dataset viewer | 07/07/2025 | Open |
| Datacube to GeoCroissant | Datacube to GeoCroissant conversion | 13/07/2025 | OK |
| OGC-TDML to GeoCroissant Support | OGC-TDML to GeoCroissant support | 20/07/2025 | OK |
| NASA-UMM to GeoCroissant | NASA-UMM to GeoCroissant mapping | 27/07/2025 | OK |
| CEDA UK to GeoCroissant Support | CEDA UK to GeoCroissant support | 30/08/2025 | OK |
| e19ded4: mlcroissant[geo] extension for geospatial support | mlcroissant[geo] extension for geospatial support | 11/08/2025 | In Progress to merge |
| b42f1e3: Functionality to geo/converters.py for enhanced geospatial data conversion | Functionality to geo/converters.py for enhanced geospatial conversion | 17/08/2025 | In Progress to merge |
| a437352: Nasa UMM-G Converter | Nasa UMM-G Converter | 25/08/2025 | In Progress to merge |
- Weekly Reports & Updates: View Weekly Reports
- Project Proposal: Google Docs Proposal
- GitHub Repository: DCAI
- Project Wiki: Project Wiki Home
- Final Report: View Final Report
I am truly grateful to be a part of the wonderful GSoC and OSGeo communities. This program has been an incredible learning journey, helping me enhance my understanding of open-source contributions and significantly improve my programming skills. I would like to express my heartfelt thanks to my mentors, Gérald Fenoy and Chetan Mahajan, for their incredible guidance and continuous support throughout the program, as well as to the entire community for their encouragement and constructive feedback during my GSoC journey.
Thank you and Regards,
Harsh Shinde