PathReview is a multi-service application with five major subsystems. This document describes how they fit together.
User Input (GitHub username, resume PDF, repo URLs)
│
▼
┌─────────────────────┐
│ API Layer (FastAPI) │ ← Authentication, validation, rate limiting
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Ingestion Pipeline │ ← Parse documents, chunk, embed, store
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Agent Orchestrator │ ← Plan analysis, execute tools, manage state
│ ┌───────────────┐ │
│ │ GitHub Tool │ │
│ │ Skill Extract │ │
│ │ README Scorer │ │
│ │ Market Analyze │ │
│ │ Tech Detector │ │
│ └───────────────┘ │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ RAG System │ ← Retrieve context, generate feedback, evaluate
│ (Hybrid Retrieval │
│ + LLM Generation) │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Safety Layer │ ← Bias check, content filter, PII scrub
└──────────┬──────────┘
│
▼
Review Output
FastAPI application serving REST endpoints. Handles authentication (JWT), request validation (Pydantic), rate limiting, and CORS. Routes delegate to service layer in core/services/.
Processes user-submitted documents into vector embeddings. Parsers implement BaseParser and extract structured text. Chunkers split text for embedding. The pipeline orchestrates: parse → chunk → embed → store.
A plan-execute orchestrator that coordinates multiple analysis tools. Each tool implements BaseTool with name, description, and execute(). The orchestrator builds a plan based on available profile data, executes tools with retry and timeout policies, and synthesizes results.
Hybrid retrieval (vector similarity + BM25 keyword) fetches relevant context from the user's ingested documents. The generator uses prompt templates to produce structured, evidence-based feedback. The evaluator scores retrieval relevance and generation faithfulness.
Middleware wrapping the generation pipeline. Components run in sequence: prompt injection defense → content filter → bias detector → PII scrubber. All safety events are logged with structured metadata for monitoring.
See the Architecture Decision Records in docs/adr/ for context on major decisions: