Skip to content

Latest commit

 

History

History
1062 lines (889 loc) · 42.5 KB

File metadata and controls

1062 lines (889 loc) · 42.5 KB

Habla Hermano: Crash Course

Version: 2.7 | Tests: 2,529 (2,291 Python + 238 JS) | Coverage: 97% | Date: April 2026

📚 AI-powered conversational language tutor for Spanish, German, and French


Executive Summary

This crash course documents everything about Habla Hermano — an AI-powered conversational language tutor that teaches languages from complete beginner (A0) to intermediate level (B1) through real conversations, structured micro-lessons, and interactive exercises.

What We Built

block-beta
    columns 1
    block:stack["HABLA HERMANO"]
        columns 2
        A["Frontend"] B["HTMX + Jinja2 + Tailwind (5 themes)"]
        C["Backend"] D["FastAPI with dependency injection"]
        E["AI System"] F["LangGraph with 3 nodes, conditional routing"]
        G["LLM"] H["Claude Haiku 4.5 (conversational + analysis)"]
        I["Auth"] J["Supabase Auth with JWT validation"]
        K["Persistence"] L["PostgreSQL checkpointing (LangGraph) + Fernet encryption"]
        M["Config"] N["Environment-based Pydantic Settings"]
        O["Deployment"] P["Docker + Render.com"]
        Q["Lessons"] R["YAML micro-lessons with exercises (60 across 3 languages)"]
        S["UI"] T["Hamburger menu, unified chat (freeform + lesson modes)"]
        U["Voice"] V["Deepgram STT (Nova-3) + TTS (Aura-2)"]
        W["JS Testing"] X["Vitest + jsdom (238 tests, ~90% coverage)"]
        Y["Lesson Chat"] Z["Phase machine: intro→teaching→exercise→complete (unified in chat.py)"]
    end
Loading

Key Achievements

  • ✅ 3-node LangGraph pipeline with conditional routing
  • ✅ Hermano personality system (supportive "big brother" character)
  • ✅ Level-adaptive scaffolding (word banks, hints, sentence starters)
  • ✅ Grammar feedback with gentle corrections
  • ✅ Supabase Auth with JWT validation
  • ✅ PostgreSQL conversation persistence via LangGraph checkpointing
  • ✅ Three languages: Spanish, German, French
  • ✅ Four proficiency levels: A0, A1, A2, B1
  • ✅ 2,529+ tests (2,291 Python + 238 JS) with 97% coverage, strict typing
  • ✅ 5 Spanish-inspired themes: Azulejo, Terracotta, Flamenco, Sangria, Jardin
  • ✅ Mobile-responsive: safe areas, dynamic viewport, touch optimization
  • ✅ Collapsible pronunciation tips UI with level-based auto-expand
  • ✅ Micro-lessons system: 60 lessons across all languages and levels
  • ✅ Hamburger menu with Lessons, New Chat, Theme, Auth
  • ✅ Guest access for chat (no persistence beyond LangGraph checkpointing)
  • ✅ Progress tracking dashboard with Chart.js visualizations
  • ✅ User-authenticated Supabase client for RLS compliance
  • ✅ AI-enhanced lessons via LangGraph subgraphs (Phase 9)
  • ✅ Learning paths with structured progression: PathService, AdaptiveService (Phase 14)
  • ✅ Daily adaptive recommendations based on path progress, vocab accuracy, review schedules
  • ✅ Learn routes (/learn/, /learn/recommendation) with HTMX lazy-loaded partial
  • ✅ Voice conversation: Deepgram STT/TTS via WebSocket proxy with graceful degradation
  • ✅ ES Module architecture: 11 JavaScript modules with Vitest test suite (238 tests)
  • ✅ Mobile-first JS improvements: touch focus, scroll throttle, keyboard handling
  • ✅ Floating TTS stop control with mutual exclusion (one TTS at a time)
  • ✅ Conversational lesson delivery: Phase machine teaches lessons through chat UI (Phase 19)
  • ✅ Voice FSM refactor: FSM + AbortController, 5 sub-modules, race condition fixes (Phase 21)
  • ✅ Lesson experience revamp: unified chat handles freeform + lesson modes, removed separate lesson player (Phase 23)
  • ✅ Message encryption & privacy: Fernet field-level encryption, checkpoint blob encryption, RLS on checkpoint tables, PBKDF2 key derivation (Phase 24)
  • ✅ Design system revamp: Jardin theme, Plus Jakarta Sans typography, spacing tokens, SVG lesson icons, WCAG AA compliance (Phase 25)
  • ✅ Conversation threads: per-thread language/level, thread sidebar with SPA switching, auto-titling via Claude Haiku, active_thread httponly cookie, 717 new Python tests (Phase 26)
  • ✅ Privacy & security page with password reset flow (forgot + reset via Supabase Auth) (Phase 27)
  • ✅ Sentry error monitoring for backend (FastAPI) and frontend (JS)
  • ✅ Progress page redesign with analytics dashboard and language filter
  • ✅ WebSocket TTS uses linear16 encoding for Deepgram compatibility
  • ✅ Connection pool for LangGraph checkpointer to prevent concurrent query errors
  • ✅ Cookie signing with itsdangerous and auth error sanitization
  • ✅ JWT refresh fix in review routes to prevent token expiry errors

Table of Contents

  1. Architecture Overview
  2. Technology Stack
  3. Project Structure
  4. Data Flow Pipeline
  5. LangGraph Pipeline
  6. Hermano Personality System
  7. Progress Tracking System
  8. API Design
  9. Database Schema
  10. Frontend Architecture
  11. Configuration
  12. Testing Strategy
  13. Development Workflow
  14. Deployment
  15. Quick Reference

1. Architecture Overview

High-Level Flow

flowchart TB
    subgraph Client
        Browser["Browser<br/>(HTMX + ES Modules + Tailwind)"]
    end

    subgraph Server["FastAPI Backend"]
        API[API Routes]
        Templates[Jinja2 Templates]
        Auth[Supabase Auth]
    end

    subgraph Pipeline["LangGraph Pipeline"]
        Respond[respond_node]
        Scaffold[scaffold_node]
        Analyze[analyze_node]
        Claude["Claude Haiku 4.5"]
    end

    subgraph Storage["Persistence"]
        Supabase[(Supabase PostgreSQL)]
        Checkpoint[(LangGraph Checkpoints)]
    end

    Browser -->|fetch POST /chat/stream| API
    API -->|SSE token stream + HTML partials| Browser
    API --> Respond
    Respond --> Claude
    Respond -->|A0/A1| Scaffold
    Respond -->|A2/B1| Analyze
    Scaffold --> Analyze
    Analyze --> Checkpoint
    Auth --> Supabase
Loading

Design Decisions

Decision Choice Rationale
Pipeline Framework LangGraph StateGraph with conditional routing for level-based behavior
LLM Claude Haiku 4.5 Superior language understanding for multiple languages
Frontend HTMX + Jinja2 + ES Modules Server-driven UI; chat uses SSE streaming via modules/stream.js, other pages use HTMX
Auth Supabase Auth Managed auth with JWT, easy integration
Persistence PostgreSQL + LangGraph Conversation checkpointing with AsyncPostgresSaver
Config Pydantic Settings Type-safe, environment-based configuration

2. Technology Stack

Backend

Technology Version Purpose
Python 3.12 Runtime
FastAPI ≥0.110 Web framework
LangGraph ≥0.2 Conversation orchestration
langchain-anthropic ≥0.1 Claude integration
Pydantic ≥2.0 Data validation
Supabase ≥2.0 Auth & PostgreSQL
langgraph-checkpoint-postgres ≥2.0 Conversation persistence

Frontend

Technology Purpose
HTMX Server-driven HTML swapping
Jinja2 Server-side templating
Tailwind CSS Utility-first styling
CSS Variables Theme system
Alpine.js Lightweight reactivity

Voice

Technology Purpose
Deepgram Nova-3 Speech-to-text (STT) via WebSocket proxy
Deepgram Aura-2 Text-to-speech (TTS) via REST proxy

JS Testing

Technology Purpose
Vitest JavaScript unit test runner
jsdom Browser environment simulation

Observability

Technology Purpose
Sentry Error monitoring (backend + frontend)

DevOps

Technology Purpose
Docker Containerization
uv Fast package management
ruff Linting + formatting
mypy Type checking
pytest Testing framework

3. Project Structure

habla-hermano/
├── src/
│   ├── config.py                         # Canonical Settings + get_settings (inner layers import from here)
│   ├── validation.py                     # Canonical domain validation constants and helpers (VALID_LANGUAGES, VALID_LEVELS, etc.)
│   │
│   ├── api/                          # FastAPI application
│   │   ├── main.py                   # App creation, lifespan, routes
│   │   ├── config.py                 # Re-export shim → delegates to src/config.py
│   │   ├── dependencies.py           # DI: templates, settings
│   │   ├── auth.py                   # JWT validation
│   │   ├── session.py                # Session management
│   │   ├── supabase_client.py        # Re-export shim → delegates to src/db/client.py
│   │   ├── validation.py             # Re-export shim → delegates to src/validation.py
│   │   ├── middleware.py             # SecurityHeadersMiddleware + CSRFMiddleware
│   │   ├── streaming.py              # SSE streaming: StreamResult, stream_chat_events()
│   │   └── routes/
│   │       ├── chat.py               # GET / (freeform + lesson mode via ?lesson=, review via ?mode=review)
│   │       ├── chat_stream.py        # POST /chat/stream (SSE streaming, optional lesson_id)
│   │       ├── auth.py               # Signup, login, logout, password reset (forgot + reset)
│   │       ├── lessons.py            # Micro-lessons (list, catalog)
│   │       ├── progress.py           # Dashboard, vocabulary, chart-data endpoints
│   │       ├── review.py             # Spaced repetition review sessions (auth-only)
│   │       ├── learn.py              # Learning paths & adaptive recommendations
│   │       ├── privacy.py            # Privacy & security info page
│   │       ├── voice.py              # WebSocket STT proxy + REST TTS endpoint (Deepgram)
│   │       └── threads.py            # Thread CRUD: list, create, rename (PATCH), delete (Phase 26)
│   │
│   ├── agent/                        # LangGraph conversation engine
│   │   ├── graph.py                  # StateGraph with routing
│   │   ├── state.py                  # ConversationState TypedDict
│   │   ├── prompts.py                # System prompts by level
│   │   ├── routing.py                # Conditional edge functions
│   │   ├── checkpointer.py           # Postgres/Memory checkpointer with encryption
│   │   ├── checkpoint_purge.py       # Purge old checkpoint data
│   │   ├── llm.py                    # LLM client factory
│   │   ├── utils.py                  # Agent utility functions
│   │   ├── lesson_state.py           # LessonState for lesson subgraph
│   │   ├── lesson_graph.py           # Lesson and exercise subgraphs
│   │   ├── lesson_chat_state.py      # LessonChatState for lesson chat graph (Phase 19)
│   │   ├── lesson_chat_graph.py      # Lesson chat graph builder (Phase 19)
│   │   ├── prompts_lesson_chat.py    # Lesson chat system prompts (Phase 19)
│   │   ├── review_graph.py           # Review subgraph
│   │   ├── review_state.py           # Review state TypedDict
│   │   └── nodes/
│   │       ├── respond.py            # Generate AI response
│   │       ├── scaffold.py           # Word banks & hints (A0-A1)
│   │       ├── analyze.py            # Grammar & vocab extraction
│   │       ├── lesson.py             # AI-enhanced lesson nodes
│   │       ├── lesson_chat.py        # Lesson chat node (Phase 19)
│   │       └── review.py             # Review exercise nodes
│   │
│   ├── lessons/                      # Micro-lessons system
│   │   ├── models.py                 # Pydantic lesson, step, exercise models
│   │   └── service.py               # YAML loading, filtering, vocabulary extraction
│   │
│   ├── db/                           # Database layer
│   │   ├── client.py                 # Canonical Supabase client factory (get_supabase, get_supabase_admin)
│   │   ├── encryption.py             # Fernet encryption: field-level + FernetCipher for checkpoints
│   │   ├── models.py                 # Pydantic models
│   │   ├── repository.py             # Data access layer
│   │   └── seed.py                   # Initial data loader
│   │
│   ├── services/                     # Business logic
│   │   ├── vocabulary.py             # Vocab tracking
│   │   ├── levels.py                 # Level detection
│   │   ├── progress.py               # ProgressService: dashboard aggregation
│   │   ├── review.py                 # ReviewService: spaced repetition (SM-2)
│   │   ├── paths.py                  # PathService: structured learning paths per language
│   │   ├── adaptive.py               # AdaptiveService: daily adaptive recommendations
│   │   ├── data_retention.py         # Data retention and cleanup policies
│   │   ├── lesson_completion.py      # Lesson completion logic (ExerciseFeedback, CompletionResult, check_exercise_answer, complete_lesson_and_persist)
│   │   ├── threads.py                # ThreadService: CRUD for conversation_threads table (Phase 26)
│   │   ├── thread_titling.py         # Auto-title generation via Claude Haiku, 30-token budget, 3–5 words (Phase 26)
│   │   └── thread_messages.py        # Message history extraction from LangGraph checkpoint state (Phase 26)
│   │
│   ├── templates/                    # Jinja2 HTML
│   │   ├── base.html                 # Layout with themes, safe areas, dynamic viewport
│   │   ├── chat.html                 # Chat interface: freeform + lesson + review modes
│   │   ├── lessons.html              # Lesson catalog page
│   │   ├── progress.html             # Progress dashboard with charts
│   │   ├── learn.html                # Learning paths overview page
│   │   ├── privacy.html              # Privacy & security info page
│   │   ├── auth/                     # Auth templates
│   │   │   ├── login.html            # Login form
│   │   │   ├── signup.html           # Signup form
│   │   │   ├── forgot_password.html  # Forgot password form
│   │   │   └── reset_password.html   # Password reset form
│   │   ├── errors/                   # Error pages
│   │   │   ├── 400.html, 404.html, 500.html
│   │   ├── macros/
│   │   │   └── lesson_icon.html      # SVG lesson icon macro
│   │   └── partials/                 # 28 partial templates
│   │       ├── app_header.html       # Shared header with hamburger, logo, selectors
│   │       ├── message_pair.html     # User + AI message
│   │       ├── message.html          # Single message partial
│   │       ├── grammar_feedback.html # Collapsible grammar tips
│   │       ├── pronunciation_tips.html # Collapsible pronunciation tips
│   │       ├── scaffold.html         # Word bank, hints
│   │       ├── feedback.html         # Generic feedback partial
│   │       ├── lesson_complete.html  # Completion celebration
│   │       ├── progress_vocab.html   # Vocabulary list partial
│   │       ├── stats_summary.html    # Stats card partial
│   │       ├── learn_recommendation.html # Adaptive recommendation partial (HTMX)
│   │       ├── learn_unit.html       # Learning unit partial
│   │       ├── vocab_sidebar.html    # Vocabulary sidebar partial
│   │       ├── warmup_prompt.html    # Review warmup prompt
│   │       ├── review_*.html         # Review partials (start, question, feedback_question, summary, card, empty, complete)
│   │       ├── thread_sidebar.html   # Sidebar drawer with thread list, close button, New Chat picker
│   │       ├── thread_content.html   # SPA partial for thread switching
│   │       └── thread_history.html   # Preloaded message history for threads
│   │
│   └── static/
│       ├── css/output.css            # Compiled Tailwind
│       └── js/
│           ├── main.js               # Entry point, imports all modules
│           ├── pcm-processor.js      # AudioWorklet for mobile STT
│           └── modules/              # 11 ES modules
│               ├── dom.js            # DOM utilities, scroll, focus
│               ├── fsm.js            # Finite state machine for voice
│               ├── htmx-handlers.js  # HTMX event handlers
│               ├── scaffold.js       # Click-to-insert word bank
│               ├── shortcuts.js      # Keyboard shortcuts
│               ├── stream.js         # SSE streaming client (fetch + ReadableStream)
│               ├── voice.js          # Voice orchestrator (imports sub-modules)
│               ├── voice-constants.js # Voice configuration constants
│               ├── voice-stt.js      # Speech-to-text via Deepgram WebSocket
│               ├── voice-tts.js      # Text-to-speech via Deepgram REST
│               └── voice-ui.js       # Voice UI state and controls
│
├── tests/                            # 2,529+ tests (2,291 Python + 238 JS), 97% coverage
│   ├── conftest.py                   # Fixtures + CSRF_HEADERS constant
│   ├── test_rate_limiting.py         # Rate limiting tests
│   ├── agent/
│   │   ├── test_graph.py             # LangGraph pipeline tests
│   │   ├── test_state.py             # ConversationState tests
│   │   ├── test_prompts.py           # System prompt tests
│   │   ├── test_routing.py           # Conditional routing tests
│   │   ├── test_checkpointer.py      # Checkpointer tests
│   │   ├── test_checkpoint_purge.py  # Checkpoint purge tests
│   │   ├── test_llm_zero_retention.py # LLM zero retention tests
│   │   ├── test_review_graph.py      # Review subgraph tests
│   │   ├── test_coverage.py          # Agent coverage tests
│   │   └── nodes/
│   │       ├── test_nodes.py         # Node integration tests
│   │       ├── test_analyze.py       # analyze_node tests
│   │       ├── test_scaffold.py      # scaffold_node tests
│   │       ├── test_lesson_chat.py   # Lesson chat node tests
│   │       └── test_review.py        # Review node tests
│   ├── api/
│   │   ├── test_auth.py              # JWT validation tests
│   │   ├── test_config.py            # Settings tests
│   │   ├── test_csrf.py              # CSRF middleware tests
│   │   ├── test_session.py           # Session management tests
│   │   ├── test_supabase_client.py   # Supabase client tests
│   │   ├── test_data_capture.py      # Data capture tests
│   │   ├── test_persistence.py       # Persistence tests
│   │   ├── test_chat_security.py     # Chat security tests
│   │   ├── test_privacy.py           # Privacy route tests
│   │   ├── test_sanitize.py          # Input sanitization tests
│   │   ├── test_security_headers.py  # Security headers tests
│   │   ├── test_streaming.py         # SSE streaming tests
│   │   ├── test_threads.py           # Thread API tests
│   │   └── routes/
│   │       ├── test_chat.py          # Chat endpoint tests
│   │       ├── test_auth.py          # Auth route tests
│   │       ├── test_auth_cache.py    # Auth caching tests
│   │       ├── test_auth_password_reset.py # Password reset tests
│   │       ├── test_learn.py         # Learn route tests
│   │       ├── test_lessons.py       # Lesson route tests
│   │       ├── test_progress.py      # Progress route tests
│   │       ├── test_review.py        # Review route tests
│   │       ├── test_validation.py    # Validation tests
│   │       ├── test_voice.py         # Voice STT/TTS route tests
│   │       ├── test_voice_integration.py # Voice integration tests
│   │       └── test_e2e.py           # End-to-end route tests
│   ├── db/
│   │   ├── test_models.py            # Database model tests
│   │   ├── test_repository.py        # Repository tests
│   │   ├── test_encryption.py        # Field-level encryption tests
│   │   ├── test_fernet_cipher.py     # FernetCipher tests
│   │   └── test_repository_encryption.py # Repository encryption integration tests
│   ├── lessons/
│   │   ├── test_models.py            # Lesson data model tests
│   │   └── test_service.py           # Lesson service tests
│   └── services/
│       ├── test_adaptive.py          # AdaptiveService tests
│       ├── test_coverage.py          # Service coverage tests
│       ├── test_data_retention.py    # Data retention tests
│       ├── test_progress.py          # ProgressService tests
│       ├── test_review.py            # ReviewService tests
│       ├── test_paths.py             # PathService tests
│       ├── test_levels.py            # Level detection tests
│       ├── test_vocabulary.py        # Vocabulary tracking tests
│       ├── test_threads.py           # ThreadService CRUD tests (Phase 26)
│       ├── test_thread_titling.py    # Auto-title generation tests (Phase 26)
│       └── test_thread_messages.py   # Message history extraction tests (Phase 26)
│
├── docs/
│   ├── architecture.md
│   ├── api.md
│   ├── product.md
│   └── design/phase*.md
│
├── data/
│   └── lessons/                      # YAML lesson content (60 total lessons)
│       ├── es/                       # Spanish lessons
│       │   ├── A0/                   # 5 lessons (greetings, introductions, numbers, colors, family)
│       │   ├── A1/                   # 5 lessons
│       │   ├── A2/                   # 5 lessons
│       │   └── B1/                   # 5 lessons
│       ├── de/                       # German lessons
│       │   ├── A0/                   # 5 lessons
│       │   ├── A1/                   # 5 lessons
│       │   ├── A2/                   # 5 lessons
│       │   └── B1/                   # 5 lessons
│       └── fr/                       # French lessons
│           ├── A0/                   # 5 lessons
│           ├── A1/                   # 5 lessons
│           ├── A2/                   # 5 lessons
│           └── B1/                   # 5 lessons
│
├── pyproject.toml
├── .env.example
├── Makefile
└── render.yaml

4. Data Flow Pipeline

sequenceDiagram
    participant U as User
    participant API as FastAPI
    participant G as LangGraph
    participant AI as Claude
    participant DB as PostgreSQL
    participant PS as ProgressService

    U->>API: POST /chat/stream {message, level, language}
    API->>G: Start pipeline with state (SSE streaming)

    rect rgb(240, 248, 255)
        Note over G,AI: respond_node
        G->>AI: Generate response with system prompt
        AI-->>G: AI message
    end

    alt Level A0 or A1
        rect rgb(255, 245, 238)
            Note over G,AI: scaffold_node
            G->>AI: Generate word bank & hints
            AI-->>G: ScaffoldingConfig
        end
    end

    rect rgb(240, 255, 240)
        Note over G,AI: analyze_node
        G->>AI: Extract grammar errors & vocab
        AI-->>G: GrammarFeedback + VocabWords
    end

    G->>DB: Save checkpoint
    G-->>API: Final state
    API->>PS: Record vocabulary & session activity
    PS->>DB: Upsert vocabulary, update session
    API-->>U: SSE events (tokens, then feedback HTML partials)
Loading

Progress Capture (Authenticated Users Only)

For authenticated users, ProgressService.record_chat_activity() persists data after each chat interaction using a user-authenticated Supabase client (RLS-compliant):

  • Vocabulary: New words extracted by analyze_node (upsert with times_seen counter)
  • Sessions: Active learning session tracking (language, level, message count)

Guest users receive grammar feedback and pronunciation tips in the response but no data is persisted to the database.


5. LangGraph Pipeline

Graph Structure

flowchart TB
    START([START])
    respond["respond_node<br/><i>Generate AI response</i>"]
    check{"needs_scaffolding()<br/><i>Is level A0 or A1?</i>"}
    scaffold["scaffold_node<br/><i>Word bank, hints</i>"]
    analyze["analyze_node<br/><i>Grammar + vocab</i>"]
    END([END])

    START --> respond
    respond --> check
    check -->|Yes| scaffold
    check -->|No| analyze
    scaffold --> analyze
    analyze --> END
Loading

State Schema

class ConversationState(TypedDict):
    # Core conversation
    messages: Annotated[list[BaseMessage], add_messages]

    # User settings
    level: str              # A0, A1, A2, B1
    language: str           # es, de, fr

    # Analysis results
    grammar_feedback: NotRequired[list[GrammarFeedback]]
    new_vocabulary: NotRequired[list[VocabWord]]
    pronunciation_tips: NotRequired[list[PronunciationTip]]  # Pronunciation guidance

    # Scaffolding (A0-A1 only)
    scaffolding: NotRequired[dict[str, Any]]

Node Implementations

Node Purpose Output
respond_node Generate AI response using level prompt AIMessage
scaffold_node Create word bank, hints, sentence starters ScaffoldingConfig
analyze_node Extract grammar errors, vocabulary, and pronunciation tips GrammarFeedback[], VocabWord[], PronunciationTip[]

Lesson Subgraph Nodes (Phase 9)

Node Purpose Output
load_step_node Load step data from YAML lessons step_type, step_content, vocabulary
enhance_step_node Hermano enhances with personalized content enhanced_content, hermano_intro
validate_exercise_node Validate answer with AI feedback is_correct, exercise_feedback

Lesson Chat Graph (Phase 19)

START → lesson_respond → END
  Phase machine: intro → teaching → exercise_ask → exercise_eval → complete

Conditional Routing

def needs_scaffolding(state: ConversationState) -> str:
    """Route based on learner level."""
    return "scaffold" if state["level"] in ["A0", "A1"] else "analyze"

graph.add_conditional_edges(
    "respond",
    needs_scaffolding,
    {"scaffold": "scaffold", "analyze": "analyze"},
)

6. Hermano Personality System

The "Big Brother" Character

Hermano is a consistent personality adapted to each proficiency level:

  • Supportive: Patient, encouraging, celebrates progress
  • Authentic: Makes mistakes feel normal
  • Adaptive: Language mix changes by level
  • Natural: Conversations feel like chatting with a friend

Language Adapter Pattern

LANGUAGE_ADAPTER: dict[str, dict[str, str]] = {
    "es": {
        "language_name": "Spanish",
        "hello": "Hola",
        "my_name_is": "Me llamo",
    },
    "de": {
        "language_name": "German",
        "hello": "Hallo",
        "my_name_is": "Ich heiße",
    },
    "fr": {
        "language_name": "French",
        "hello": "Bonjour",
        "my_name_is": "Je m'appelle",
    },
}

Personality by Level

Level Hermano's Approach Language Mix Topics
A0 Heavy encouragement 80% English, 20% target Greetings, numbers, colors
A1 Chill friend 50/50 mix Daily routine, family, food
A2 Challenges while fun 80% target, 20% English Travel, shopping, experiences
B1 Peer conversation 95%+ target News, opinions, culture

7. Progress Tracking System

ProgressService Architecture

The ProgressService aggregates data from vocabulary, session, and lesson repositories into dashboard-ready statistics and chart data structures.

class ProgressService:
    """Read-heavy service for dashboard rendering. Authenticated users only."""

    def __init__(self, user_id: str, client: SupabaseClient | None = None):
        self._vocab_repo = VocabularyRepository(user_id, client=client)
        self._session_repo = LearningSessionRepository(user_id, client=client)
        self._lesson_repo = LessonProgressRepository(user_id, client=client)

    def get_dashboard_stats(self, language: str = "es") -> DashboardStats
    def get_chart_data(self, language: str = "es", days: int = 30) -> ChartData
    def record_chat_activity(self, language: str, level: str, new_vocab: list) -> None

Routes pass a user-authenticated Supabase client (get_supabase_for_user(sb_access_token)) so that all database queries respect RLS.

Dashboard Data Structures

Structure Fields Purpose
DashboardStats total_words, total_sessions, lessons_completed, current_streak, accuracy_rate, words_learned_today, messages_today Summary cards
ChartData vocab_growth[], accuracy_trend[] Chart.js visualization
VocabGrowthPoint date, cumulative_words Vocabulary growth line chart
AccuracyPoint date, accuracy Accuracy trend line chart

Guest Model (Simplified)

Guests get chat only with no persistent data tracking:

  • Chat: Full conversational functionality via LangGraph checkpointing (session cookie)
  • Grammar feedback: Returned inline in chat responses
  • Pronunciation tips: Returned inline in chat responses
  • Scaffolding: Word banks and hints for A0-A1 levels

Guests do not get: vocabulary tracking, progress dashboard, lesson progress, or spaced repetition review. These features require authentication.

Auth Pattern for Data Operations

All data operations (progress, vocabulary, review) use a user-authenticated Supabase client so that PostgreSQL Row-Level Security (RLS) policies work via auth.uid():

from src.api.supabase_client import get_supabase_for_user

# In route handlers, read the token from the cookie:
sb_access_token: Annotated[str | None, Cookie(alias="sb-access-token")] = None

# Then create a user-scoped client:
user_client = get_supabase_for_user(sb_access_token)
service = ProgressService(user.id, client=user_client)

This replaced the earlier pattern of using get_supabase_admin() (service-role client that bypassed RLS) for guest operations. The admin client is no longer used in progress or review routes.


8. API Design (continued)

Core Endpoints

Method Endpoint Purpose
GET / Render chat page (accepts ?lesson=, ?mode=review)
POST /chat Send message, get AI response (non-streaming fallback)
POST /chat/stream Send message, get SSE streaming response (accepts optional lesson_id)
POST /new Start new conversation
POST /auth/signup Register user
POST /auth/login Authenticate
POST /auth/logout Sign out
GET /auth/forgot-password Forgot password form
POST /auth/forgot-password Send password reset email via Supabase
GET /auth/reset-password Password reset form (receives token from email)
POST /auth/reset-password Set new password with recovery token
GET /lessons/ Lesson catalog
GET /progress/ Progress dashboard page
GET /progress/vocabulary Vocabulary list partial (HTMX)
GET /progress/stats Stats summary partial (HTMX)
GET /progress/chart-data JSON chart data for Chart.js
DELETE /progress/vocabulary/{id} Remove word from vocabulary
GET /learn/ Learning paths overview page
GET /learn/recommendation Adaptive recommendation partial (HTMX)
GET /privacy/ Privacy & security info page
GET /threads/ List all threads for the authenticated user
POST /threads/ Create a new thread (language + level required)
POST /threads/select Set active thread cookie
PATCH /threads/{id} Rename a thread
DELETE /threads/{id} Delete a thread and its checkpoints
GET /chat/thread-content SPA partial for thread switching (returns thread history + new welcome)

Chat Request/Response

# Request (Form Data)
message: str          # User's message
level: str = "A1"     # CEFR level
language: str = "es"  # Language code

# Response (HTML Partial)
# Returns message_pair.html with:
# - user_message
# - ai_response
# - grammar_feedback (list)
# - new_vocabulary (list)
# - scaffolding (dict, A0-A1 only)

9. Database Schema

Supabase Tables

user_profiles

id: UUID (FK to auth.users)
display_name: TEXT
preferred_language: TEXT DEFAULT 'es'
current_level: TEXT DEFAULT 'A1'
created_at: TIMESTAMP
updated_at: TIMESTAMP

vocabulary

id: SERIAL PRIMARY KEY
user_id: UUID
word: TEXT
translation: TEXT
language: TEXT
part_of_speech: TEXT
first_seen_at: TIMESTAMP
times_seen: INT DEFAULT 1
times_correct: INT DEFAULT 0

learning_sessions

id: SERIAL PRIMARY KEY
user_id: UUID
started_at: TIMESTAMP
ended_at: TIMESTAMP
language: TEXT
level: TEXT
messages_count: INT
words_learned: INT

lesson_progress

user_id: UUID
lesson_id: TEXT
completed_at: TIMESTAMP
score: INT

conversation_threads (Phase 26)

id: UUID PRIMARY KEY
user_id: UUID (FK to auth.users)
thread_id: TEXT UNIQUE    -- format: user:{user_id}:{uuid4}, bridges metadata ↔ LangGraph checkpoints
title: TEXT               -- auto-generated via Claude Haiku (30-token budget, 3–5 words) after first exchange
language: TEXT            -- immutable after creation (es, de, fr)
level: TEXT               -- immutable after creation (A0, A1, A2, B1)
created_at: TIMESTAMP
updated_at: TIMESTAMP

10. Frontend Architecture

Technologies

Component Technology
HTML Swapping HTMX
Templating Jinja2
Styling Tailwind CSS
Themes CSS Variables
Reactivity Alpine.js

Theme System

:root {
  --color-bg-primary: #ffffff;
  --color-text-primary: #000000;
  --color-accent: #3b82f6;
}

.theme-dark {
  --color-bg-primary: #1f2937;
  --color-text-primary: #f3f4f6;
}

.theme-ocean {
  --color-bg-primary: #0f3460;
  --color-text-primary: #e0e0e0;
}

JavaScript ES Module Architecture

The frontend JavaScript is organized as ES Modules loaded via main.js:

Module Path Purpose
main.js src/static/js/main.js Entry point, imports and initializes all modules
dom.js src/static/js/modules/dom.js DOM utilities, scroll throttle, touch focus
fsm.js src/static/js/modules/fsm.js Finite state machine for voice state management
htmx-handlers.js src/static/js/modules/htmx-handlers.js HTMX event handlers (afterSwap, etc.)
scaffold.js src/static/js/modules/scaffold.js Click-to-insert word bank interactions
shortcuts.js src/static/js/modules/shortcuts.js Keyboard shortcuts (Ctrl+Enter, etc.)
stream.js src/static/js/modules/stream.js SSE streaming client (fetch + ReadableStream)
voice.js src/static/js/modules/voice.js Voice orchestrator (imports sub-modules below)
voice-constants.js src/static/js/modules/voice-constants.js Voice configuration constants
voice-stt.js src/static/js/modules/voice-stt.js Speech-to-text via Deepgram WebSocket
voice-tts.js src/static/js/modules/voice-tts.js Text-to-speech via Deepgram WebSocket (linear16) with REST fallback
voice-ui.js src/static/js/modules/voice-ui.js Voice UI state and controls
pcm-processor.js src/static/js/pcm-processor.js AudioWorklet for mobile STT PCM encoding

Chat Form Submission

The chat form uses modules/stream.js (fetch + ReadableStream) to POST to /chat/stream and parse SSE events for real-time token streaming. The form submit is intercepted by JavaScript; HTMX is not used for chat submission. Other parts of the UI (lessons, progress, review, learn) continue to use HTMX for partial updates.

HTMX Pattern (non-chat pages)

<!-- Used for progress, review, learn — NOT for chat submission -->
<form hx-get="/progress/vocabulary"
      hx-target="#vocab-list"
      hx-swap="innerHTML">
    ...
</form>

11. Configuration

Environment Variables

# Required
ANTHROPIC_API_KEY=sk-ant-...

# Supabase
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_ANON_KEY=eyJ...
SUPABASE_DB_URL=postgresql://...  # For checkpointing
SUPABASE_SERVICE_KEY=eyJ...       # For admin ops

# Application
APP_NAME="Habla Hermano"
DEBUG=false
LLM_MODEL=claude-haiku-4-5-20251001
LLM_TEMPERATURE=0.7
HOST=127.0.0.1
PORT=8000

Pydantic Settings

class Settings(BaseSettings):
    ANTHROPIC_API_KEY: str
    SUPABASE_URL: str | None = None
    SUPABASE_ANON_KEY: str | None = None
    SUPABASE_DB_URL: str | None = None

    APP_NAME: str = "Habla Hermano"
    DEBUG: bool = False
    LLM_MODEL: str = "claude-haiku-4-5-20251001"

    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
    )

12. Testing Strategy

Coverage: 97% (2,529+ tests: 2,291 Python + 238 JS)

Test Categories

Category Directory Focus
Agent tests/agent/ LangGraph nodes, state, routing, checkpointer
Agent Nodes tests/agent/nodes/ Individual node tests (analyze, scaffold, review)
API tests/api/ Auth, config, CSRF, session, supabase client
API Routes tests/api/routes/ Chat, auth, learn, lessons, progress, review, threads, e2e
Database tests/db/ Models, repository
Lessons tests/lessons/ Lesson models, lesson service
Services tests/services/ Adaptive, coverage, progress, review, paths, levels, vocabulary, threads, thread_titling, thread_messages

Key Fixtures

@pytest.fixture
def mock_settings():
    """Mock settings for tests."""
    return Settings(ANTHROPIC_API_KEY="test-key")  # pragma: allowlist secret

@pytest.fixture
def mock_compiled_graph():
    """Mock LangGraph for API tests."""
    mock = MagicMock()
    mock.ainvoke.return_value = {...}
    return mock

@pytest.fixture
def auth_headers():
    """JWT auth headers for protected routes."""
    return {"Authorization": f"Bearer {test_token}"}

13. Development Workflow

Quick Start

# Clone and setup
git clone https://github.com/darth-dodo/habla-hermano.git
cd habla-hermano
make install

# Configure
cp .env.example .env
# Edit .env with ANTHROPIC_API_KEY

# Run
make dev
# Visit http://localhost:8000

Makefile Commands

Command Description
make install Install dependencies with uv
make dev Run dev server (auto-reload)
make test Run pytest with coverage
make lint Run Ruff linting
make format Auto-format code
make typecheck Run MyPy
make check All quality gates

14. Deployment

Render.com

# render.yaml
services:
  - type: web
    name: habla-hermano
    env: python
    buildCommand: pip install uv && uv sync --frozen --no-dev
    startCommand: uv run uvicorn src.api.main:app --host 0.0.0.0 --port $PORT
    healthCheckPath: /health
    envVars:
      - key: ANTHROPIC_API_KEY
        sync: false
      - key: SUPABASE_URL
        sync: false
      - key: SUPABASE_ANON_KEY
        sync: false

Docker

FROM python:3.12-slim
WORKDIR /app

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev

COPY src ./src
EXPOSE 8000
CMD ["uv", "run", "uvicorn", "src.api.main:app", "--host", "0.0.0.0", "--port", "8000"]

15. Quick Reference

Key Files

src/config.py                # Canonical Settings + get_settings
src/validation.py            # Canonical domain validation (VALID_LANGUAGES, VALID_LEVELS)
src/api/main.py              # FastAPI app entry
src/api/config.py            # Re-export shim → src/config.py
src/api/middleware.py         # SecurityHeadersMiddleware + CSRFMiddleware
src/api/routes/chat.py       # Chat endpoints (GET /, POST /chat/stream) — handles both freeform and lesson modes
src/api/streaming.py         # SSE streaming logic
src/static/js/main.js        # JS entry point (imports all modules)
src/static/js/modules/stream.js  # SSE client (fetch + ReadableStream)
src/static/js/modules/voice.js   # Deepgram STT/TTS client
src/api/routes/voice.py      # WebSocket STT proxy + REST TTS endpoint
src/api/routes/progress.py   # Progress dashboard endpoints
src/db/client.py             # Canonical Supabase client factory
src/agent/graph.py           # LangGraph pipeline
src/agent/nodes/*.py         # Pipeline nodes
src/agent/prompts.py         # Level-specific prompts
src/services/progress.py     # ProgressService: dashboard aggregation
src/services/lesson_completion.py  # Lesson completion business logic
src/services/review.py       # ReviewService: spaced repetition (SM-2)
src/services/paths.py        # PathService: structured learning paths per language
src/services/adaptive.py     # AdaptiveService: daily adaptive recommendations
src/api/routes/learn.py      # Learn routes: paths overview, recommendation partial

Commands

make dev          # Start server
make test         # Run tests
make check        # All quality gates
make format       # Auto-fix style

API Quick Test

# Health check
curl http://localhost:8000/health

# Send message
curl -X POST http://localhost:8000/chat \
  -d "message=Hola&level=A1&language=es"

Crash Course v2.7 — Habla Hermano (2,529+ tests, 97% coverage, LangGraph Pipeline + Micro-Lessons + AI-Enhanced Lessons + Progress Tracking + Mobile Responsive + Learning Paths + Voice Conversation + FSM Voice Refactor + Conversational Lessons + Unified Lesson Experience + Message Encryption + Design System + Conversation Threads + Password Reset + Privacy Page)