Skip to content

serre-lab/LeafLens

Repository files navigation

Leaf Lens

Ivan Felipe Rodriguez1🌿, Thomas Fel1,2🌿, Gaurav Gaonkar1, Mohit Vaishnav1, Herbert Meyer3,
Peter Wilf4, Thomas Serre1🍂

        

1 Center for Computational Brain Science, Brown University
2 Kempner Institute, Harvard University
3 Florissant Fossil Beds National Monument, National Park Service
4 Department of Geosciences, Pennsylvania State University

🌿 Joint first authors  |  🍂 Corresponding author

  

Explore concepts and family classification | Identify unknown fossils

Overview

Leaf Lens is the companion platform to our study "Decoding Fossil Leaves with Artificial Intelligence: An application to the Florissant Formation". This website provides an interactive exploration of how deep neural networks learn to classify fossil angiosperm leaves—one of paleobotany's most persistent challenges.

Our deep learning framework overcomes data scarcity by augmenting sparse fossil data with synthetic examples and aligning extant and fossil leaf domains through representational learning. We demonstrate this approach on the late Eocene Florissant flora of Colorado, achieving well over 90% accuracy for family-level classification across 142 dicot angiosperm families—compared to a chance level of just 3.5%.

Project goals

Our primary objective is to leverage Explainable AI techniques to understand the concepts that matter most for neural networks when classifying leaves. By revealing these concepts, we aim to provide:

  • Insights into the model's decision-making process, identifying the key features used for classification
  • A deeper understanding of the relationships between biological taxonomy and computational representations
  • Visual and interactive tools for exploring how concepts and families are structured within the learned representations

Our system addresses a fundamental challenge: the extreme scarcity of taxonomically vetted fossil specimens. While modern leaf specimens are abundant, fossilization processes—compression, mineralization, fragmentation—create a challenging domain shift between living and fossil forms.

Key highlights

  • Number of families: 142 dicot angiosperm families
  • Total dataset: Over 34,000 images (extant and fossil leaves)
  • Florissant fossils: 3,200 taxonomically vetted specimens spanning 23 families
  • Classification performance: Well over 90% top-5 accuracy (chance: 3.5%)
  • Discovered concepts: 2,000+ unique visual concepts extracted via sparse dictionary learning

Features

  • Interactive visualizations of over 2,000 learned concepts and their relations in embedding space
  • Family-level exploration of 142 dicot families with representative samples and explanatory maps
  • Concept pages presenting feature visualizations, top activating examples, and their taxonomic relevance
  • Comparisons between real fossils and high-fidelity synthetic fossils used for generative augmentation

Broader implications

This research advances one of paleobotany's central challenges—accurate identification of fossil angiosperm leaves—and demonstrates how state-of-the-art AI can be applied to scientific domains with limited training data. Using concept-based interpretability methods, our system surfaces botanically meaningful cues by visually summarizing subtle morphological features that define families across fossil and extant specimens, suggesting new diagnostic characters.

Beyond the Florissant Formation, this cross-domain strategy is readily generalizable to other fossil deposits, positioning this approach for broad use in understanding the evolution and ecological dynamics of ancient terrestrial ecosystems.

Funding and acknowledgments

This material is based upon work supported by the U.S. National Science Foundation under Award No. EAR-1925481 (T.S.) and EAR-1925755 (P.W.), and by ANR-3IA Artificial and Natural Intelligence Toulouse Institute (ANR-19-PI3A-0004).

Computing support was provided by the Center for Computation and Visualization (CCV) at Brown University (via NIH Office of the Director grant S10OD025181). We also acknowledge Google's Cloud TPU hardware resources via the TensorFlow Research Cloud (TFRC) program.

Citations

If you make use of Leaf Lens in your research, please cite:

Main paper:

Rodriguez, I.F., Fel, T., Gaonkar, G., Vaishnav, M., Meyer, H., Wilf, P., & Serre, T. (2025). Decoding Fossil Leaves with Artificial Intelligence: An application to the Florissant Formation.

@article{rodriguez2025fossils,
  title  = {Decoding Fossil Leaves with Artificial Intelligence: 
            An application to the Florissant Formation},
  author = {Rodriguez, Ivan Felipe and Fel, Thomas and Gaonkar, Gaurav and 
            Vaishnav, Mohit and Meyer, Herbert and Wilf, Peter and Serre, Thomas},
  year   = {2025}
}

Dataset:

Wilf, P., Wing, S.L., Meyer, H.W., Rose, J.A., Saha, R., Serre, T., Cúneo, N.R., Donovan, M.P., Erwin, D.M., Gandolfo, M.A., Gonzalez-Akre, E., Herrera, F., Hu, S., Iglesias, A., Johnson, K.R., Karim, T.S., & Zou, X. (2021). An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning. PhytoKeys, 187, 93–128. https://doi.org/10.3897/phytokeys.187.72350

@article{wilf2021leaves,
  title   = {An image dataset of cleared, x-rayed, and fossil leaves vetted 
             to plant family for human and machine learning},
  author  = {Wilf, Peter and Wing, Scott L. and Meyer, Herbert W. and 
             Rose, Jacob A. and Saha, Rohit and Serre, Thomas and 
             Cúneo, N. Rubén and Donovan, Michael P. and Erwin, Diane M. and 
             Gandolfo, Maria A. and Gonzalez-Akre, Erika and Herrera, Fabiany and 
             Hu, Shusheng and Iglesias, Ari and Johnson, Kirk R. and 
             Karim, Talia S. and Zou, Xiaoyu},
  journal = {PhytoKeys},
  volume  = {187},
  pages   = {93--128},
  year    = {2021},
  doi     = {10.3897/phytokeys.187.72350}
}

About

Leaf Lens for Paleobotanical Interpretability

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •