Skip to content

benitomartin/substack-newsletters-search-course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Substack Articles Search Engine

Diagram

Build Status Build Status Status GitHub release (latest by date) License: MIT Python version uv

Supabase Qdrant Cloud Run Prefect FastAPI Gradio

A RAG application for searching articles and getting answers on relevant topics from your favorite Substack newsletters

📚 Table of Contents

🙂 Contributors

Benito Martin | AI / ML Engineer
LinkedIn
AI Echoes Newsletter
Miguel Otero Pedrido | AI / ML Engineer
LinkedIn
YouTube
The Neural Maze Newsletter

🎯 Why Take This Course?

Unlike basic tutorials, this course provides a comprehensive, hands-on guide to building a complete end-to-end Retrieval-Augmented Generation (RAG) system using modern tools and best practices. You’ll see how to:

  • Automate data pipelines for ingesting and processing newsletter content
  • Integrate multiple cloud and open-source services (Supabase, Qdrant, Prefect, FastAPI)
  • Build a robust backend for keyword and LLM-powered search
  • Deploy and interact with your system using Google Cloud, a Gradio UI and REST API

👥 Who Is This Course For?

Audience Why Join?
ML/AI Engineers Build scalable RAG and LLM-powered search systems
Software Engineers Learn modern backend, API, and cloud deployment skills
Data Engineers Automate data pipelines and vector search workflows
AI Enthusiasts Get hands-on with real-world, production-grade tools

🧑‍🎓 What You Will Learn

By the end of this course, you will have a fully functional RAG system and the skills to build production-ready applications to search over your favorite newsletters. You will:

  • Ingest articles from RSS feeds and store them in Supabase
  • Generate and index embeddings in Qdrant, including payload indexes for filtering with optimized index configuration with quantization and hybrid search
  • Orchestrate and schedule workflows with Prefect (local and cloud)
  • Build and expose RESTful search endpoints using FastAPI
  • Integrate multiple LLM providers (OpenRouter, OpenAI, Hugging Face)
  • Deploy your backend to Google Cloud Run for global access
  • Create an interactive Gradio UI for end-users

Gradio UI

🎓 Prerequisites

  • Python (Intermediate)
  • Basic understanding of REST APIs
  • Familiarity with AI/LLM concepts is helpful
  • Modern laptop/PC (no GPU required; free tiers are sufficient)

💵 Does this course cost anything?

  • No, this course is completely free to access and learn from. Starring and sharing the repository is appreciated!
  • Google Cloud Run monthly free tier is sufficient for deployment
  • Prefect cloud monthly free tier is sufficient for orchestration once your flow is deployed but it is recommended to use prefect local server for development as it is unlimited.
  • Supabase and Qdrant monthly free tiers are sufficient for hosting the Postgres and vector databases
  • OpenRouter daily requests on free LLM models is sufficient for LLM calls but you can also use OpenAI or Hugging Face as backup LLM providers as the project supports multiple LLM providers.
  • Any other tools used in this course like FastAPI, Docker, Gradio, or Opik are completely free to use.

📚 Course Outline

Lesson Topic Substack Article Description
1 Setup, Configuration & Articles Ingestion Lesson 1 Supabase Postgres setup and ingesting articles
2 Vector Embeddings & Semantic Search Infrastructure Lesson 2 Qdrant configuration and semantic search
3 FastAPI Backend & Multi-Provider LLM Support Lesson 3 FastAPI backend, OpenRouter, OpenAI, Hugging Face
4 Cloud Run Deployment & Gradio UI Lesson 4 Google Cloud Run deployment and Gradio UI
5 Video Application Overview Lesson 5 Video demo showcasing the entire pipeline

🚀 Getting Started

Follow the INSTRUCTIONS.md in the documentation to set up your environment, install dependencies, and configure services.

All components are explained in detail in the documentation but if you have any questions, feel free to open an issue or reach out!

🔌 Services Providers

This project integrates several best-in-class open-source and cloud services to provide a scalable, production-ready RAG pipeline:

Service Description Docs/Links
Supabase PostgreSQL database for articles Supabase
Qdrant Vector DB for embeddings Qdrant
Prefect Orchestration for ingestion/embedding Prefect
OpenRouter LLM Provider OpenRouter
OpenAI, Hugging Face (backup) LLM Provider (backup) OpenAI / Hugging Face
Docker Containerization Docker
FastAPI API for querying/search FastAPI
Google Cloud SDK Command-line interface for Google Cloud services Google Cloud SDK
Gradio UI Gradio
Opik AI LLM evaluation Opik
Google Cloud Run Deployment and hosting Cloud Run

🪪 License

This project is licensed under the MIT License - see the LICENSE file for details.