Ivan Felipe Rodriguez1🌿,
Thomas Fel1,2🌿,
Gaurav Gaonkar1,
Mohit Vaishnav1,
Herbert Meyer3,
Peter Wilf4,
Thomas Serre1🍂
1 Center for Computational Brain Science, Brown University
2 Kempner Institute, Harvard University
3 Florissant Fossil Beds National Monument, National Park Service
4 Department of Geosciences, Pennsylvania State University
🌿 Joint first authors | 🍂 Corresponding author
Explore concepts and family classification | Identify unknown fossils
Leaf Lens is the companion platform to our study "Decoding Fossil Leaves with Artificial Intelligence: An application to the Florissant Formation". This website provides an interactive exploration of how deep neural networks learn to classify fossil angiosperm leaves—one of paleobotany's most persistent challenges.
Our deep learning framework overcomes data scarcity by augmenting sparse fossil data with synthetic examples and aligning extant and fossil leaf domains through representational learning. We demonstrate this approach on the late Eocene Florissant flora of Colorado, achieving well over 90% accuracy for family-level classification across 142 dicot angiosperm families—compared to a chance level of just 3.5%.
Our primary objective is to leverage Explainable AI techniques to understand the concepts that matter most for neural networks when classifying leaves. By revealing these concepts, we aim to provide:
- Insights into the model's decision-making process, identifying the key features used for classification
- A deeper understanding of the relationships between biological taxonomy and computational representations
- Visual and interactive tools for exploring how concepts and families are structured within the learned representations
Our system addresses a fundamental challenge: the extreme scarcity of taxonomically vetted fossil specimens. While modern leaf specimens are abundant, fossilization processes—compression, mineralization, fragmentation—create a challenging domain shift between living and fossil forms.
- Number of families: 142 dicot angiosperm families
- Total dataset: Over 34,000 images (extant and fossil leaves)
- Florissant fossils: 3,200 taxonomically vetted specimens spanning 23 families
- Classification performance: Well over 90% top-5 accuracy (chance: 3.5%)
- Discovered concepts: 2,000+ unique visual concepts extracted via sparse dictionary learning
- Interactive visualizations of over 2,000 learned concepts and their relations in embedding space
- Family-level exploration of 142 dicot families with representative samples and explanatory maps
- Concept pages presenting feature visualizations, top activating examples, and their taxonomic relevance
- Comparisons between real fossils and high-fidelity synthetic fossils used for generative augmentation
This research advances one of paleobotany's central challenges—accurate identification of fossil angiosperm leaves—and demonstrates how state-of-the-art AI can be applied to scientific domains with limited training data. Using concept-based interpretability methods, our system surfaces botanically meaningful cues by visually summarizing subtle morphological features that define families across fossil and extant specimens, suggesting new diagnostic characters.
Beyond the Florissant Formation, this cross-domain strategy is readily generalizable to other fossil deposits, positioning this approach for broad use in understanding the evolution and ecological dynamics of ancient terrestrial ecosystems.
This material is based upon work supported by the U.S. National Science Foundation under Award No. EAR-1925481 (T.S.) and EAR-1925755 (P.W.), and by ANR-3IA Artificial and Natural Intelligence Toulouse Institute (ANR-19-PI3A-0004).
Computing support was provided by the Center for Computation and Visualization (CCV) at Brown University (via NIH Office of the Director grant S10OD025181). We also acknowledge Google's Cloud TPU hardware resources via the TensorFlow Research Cloud (TFRC) program.
If you make use of Leaf Lens in your research, please cite:
Main paper:
Rodriguez, I.F., Fel, T., Gaonkar, G., Vaishnav, M., Meyer, H., Wilf, P., & Serre, T. (2025). Decoding Fossil Leaves with Artificial Intelligence: An application to the Florissant Formation.
@article{rodriguez2025fossils,
title = {Decoding Fossil Leaves with Artificial Intelligence:
An application to the Florissant Formation},
author = {Rodriguez, Ivan Felipe and Fel, Thomas and Gaonkar, Gaurav and
Vaishnav, Mohit and Meyer, Herbert and Wilf, Peter and Serre, Thomas},
year = {2025}
}Dataset:
Wilf, P., Wing, S.L., Meyer, H.W., Rose, J.A., Saha, R., Serre, T., Cúneo, N.R., Donovan, M.P., Erwin, D.M., Gandolfo, M.A., Gonzalez-Akre, E., Herrera, F., Hu, S., Iglesias, A., Johnson, K.R., Karim, T.S., & Zou, X. (2021). An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning. PhytoKeys, 187, 93–128. https://doi.org/10.3897/phytokeys.187.72350
@article{wilf2021leaves,
title = {An image dataset of cleared, x-rayed, and fossil leaves vetted
to plant family for human and machine learning},
author = {Wilf, Peter and Wing, Scott L. and Meyer, Herbert W. and
Rose, Jacob A. and Saha, Rohit and Serre, Thomas and
Cúneo, N. Rubén and Donovan, Michael P. and Erwin, Diane M. and
Gandolfo, Maria A. and Gonzalez-Akre, Erika and Herrera, Fabiany and
Hu, Shusheng and Iglesias, Ari and Johnson, Kirk R. and
Karim, Talia S. and Zou, Xiaoyu},
journal = {PhytoKeys},
volume = {187},
pages = {93--128},
year = {2021},
doi = {10.3897/phytokeys.187.72350}
}
