Architecture Overview

PathReview is a multi-service application with five major subsystems. This document describes how they fit together.

High-Level Data Flow

User Input (GitHub username, resume PDF, repo URLs)
    │
    ▼
┌─────────────────────┐
│  API Layer (FastAPI) │ ← Authentication, validation, rate limiting
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Ingestion Pipeline  │ ← Parse documents, chunk, embed, store
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Agent Orchestrator  │ ← Plan analysis, execute tools, manage state
│  ┌───────────────┐  │
│  │ GitHub Tool    │  │
│  │ Skill Extract  │  │
│  │ README Scorer  │  │
│  │ Market Analyze │  │
│  │ Tech Detector  │  │
│  └───────────────┘  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  RAG System          │ ← Retrieve context, generate feedback, evaluate
│  (Hybrid Retrieval   │
│   + LLM Generation)  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Safety Layer        │ ← Bias check, content filter, PII scrub
└──────────┬──────────┘
           │
           ▼
      Review Output

Subsystem Details

API Layer (`api/`)

FastAPI application serving REST endpoints. Handles authentication (JWT), request validation (Pydantic), rate limiting, and CORS. Routes delegate to service layer in core/services/.

Ingestion Pipeline (`ingestion/`)

Processes user-submitted documents into vector embeddings. Parsers implement BaseParser and extract structured text. Chunkers split text for embedding. The pipeline orchestrates: parse → chunk → embed → store.

Agent System (`agent/`)

A plan-execute orchestrator that coordinates multiple analysis tools. Each tool implements BaseTool with name, description, and execute(). The orchestrator builds a plan based on available profile data, executes tools with retry and timeout policies, and synthesizes results.

RAG System (`rag/`)

Hybrid retrieval (vector similarity + BM25 keyword) fetches relevant context from the user's ingested documents. The generator uses prompt templates to produce structured, evidence-based feedback. The evaluator scores retrieval relevance and generation faithfulness.

Safety Layer (`safety/`)

Middleware wrapping the generation pipeline. Components run in sequence: prompt injection defense → content filter → bias detector → PII scrubber. All safety events are logged with structured metadata for monitoring.

Key Design Decisions

See the Architecture Decision Records in docs/adr/ for context on major decisions:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Overview

High-Level Data Flow

Subsystem Details

API Layer (`api/`)

Ingestion Pipeline (`ingestion/`)

Agent System (`agent/`)

RAG System (`rag/`)

Safety Layer (`safety/`)

Key Design Decisions

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Overview

High-Level Data Flow

Subsystem Details

API Layer (api/)

Ingestion Pipeline (ingestion/)

Agent System (agent/)

RAG System (rag/)

Safety Layer (safety/)

Key Design Decisions

API Layer (`api/`)

Ingestion Pipeline (`ingestion/`)

Agent System (`agent/`)

RAG System (`rag/`)

Safety Layer (`safety/`)