Skip to content

breadboardfoundry/awesome-ml-for-scientists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome ML For Scientists

Awesome

Resources for research teams running machine learning experiments without enterprise budgets or ML engineering backgrounds.

This list is for scientists like marine biologists, climate researchers, ecologists, and others—who need to run ML experiments. It focuses on accessible compute, reproducible workflows, and resources that are for researchers and scientists, not scaling companies.

It is curated by Bread Board Foundry. Check out Pandoro — GPU compute infrastructure for research teams below.


Contents


GPU Access

Providers that understand academic needs.

  • Pandoro - Fixed monthly pricing with complete hardware transparency. Designed for research teams needing reproducible compute environments. Consumer GPUs with disclosed specifications.

Free Tiers & Academic Programs


Reproducibility & Hardware Transparency

Tools and practices for ensuring your ML results can be reproduced.

Why Hardware Transparency Matters

Most cloud providers don't disclose exact hardware specifications. This creates problems for scientific reproducibility—you can't cite "some GPU on AWS" in a methods section. Look for providers that tell you exactly what hardware you're using.

Reproducibility Guides


ML for Non-Engineers

Resources that don't assume you have a CS degree.

Courses & Tutorials

Beginner-Friendly Tools

Frameworks that minimize boilerplate and configuration.

  • FastAI - High-level library that handles common patterns. Great for image classification.
  • Keras - Simple API for neural networks. Good documentation.
  • Hugging Face Transformers - Pre-trained models with simple APIs. Good for NLP and vision.
  • scikit-learn - Classical ML with consistent API. Start here for non-deep-learning.
  • PyTorch Lightning - Reduces PyTorch boilerplate while keeping flexibility.

Books for Scientists


Domain-Specific Resources

Marine Biology & Ecology

  • CV4Ecology - Caltech workshop teaching ecologists practical computer vision for analyzing images, audio, and video. Includes cloud compute credits and hands-on training.
  • VIAME - Video and Image Analytics for Marine Environments. Object detection for underwater footage.
  • DeepMatcher - Photo-ID matching for wildlife research.
  • Wildlife Insights - AI-powered platform for camera trap analysis.
  • MegaDetector - Microsoft's pre-trained model for camera trap images. Detects animals, people, vehicles.
  • Automated Wildlife Classifier - Complete pipeline for camera trap image processing using MegaDetector V6 and PytorchWildlife for species detection and classification.
  • Camera Trap ML Survey - Comprehensive guide to ML systems, models, and tools for camera trap image analysis.
  • FathomNet - Image database for ocean exploration with ML tools.
  • Zooniverse ML - Citizen science platform with ML integration for image classification.

Climate & Environmental Science

  • ClimateBench - Benchmarks for ML in climate science.
  • WeatherBench - Benchmark dataset for data-driven weather prediction.
  • Pangeo - Community for big data geoscience. Tools and examples for climate ML.
  • TorchGeo - PyTorch library for geospatial data.
  • Radiant MLHub - Open repository for geospatial training data.

Medical & Biological Imaging

  • MONAI - PyTorch framework for healthcare imaging. Well-documented.
  • StarDist - Cell/nuclei detection in microscopy images.
  • CellPose - Generalist cell segmentation. Works out of the box.
  • napari - Multi-dimensional image viewer with ML plugin ecosystem.
  • QuPath - Open source pathology image analysis.

Experiment Tracking & Data Management

Organizing Your Work

Notebooks & Documentation

  • Jupyter - Interactive notebooks. Standard for exploration.
  • JupyterBook - Turn notebooks into publishable documentation.
  • Quarto - Scientific publishing system. Notebooks to papers.
  • Marimo - Reactive Python notebooks with better reproducibility than Jupyter.

Contributing

Contributions welcome! Please read the contribution guidelines first.

We especially welcome:

  • Resources for domain scientists in fields not yet covered
  • Free or low-cost tools accessible to grant-funded researchers
  • Guides and tutorials written for non-CS audiences

About

Resources for research teams running machine learning experiments.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published