Skip to content

Latest commit

 

History

History
70 lines (59 loc) · 3.41 KB

File metadata and controls

70 lines (59 loc) · 3.41 KB

Architecture Overview

PathReview is a multi-service application with five major subsystems. This document describes how they fit together.

High-Level Data Flow

User Input (GitHub username, resume PDF, repo URLs)
    │
    ▼
┌─────────────────────┐
│  API Layer (FastAPI) │ ← Authentication, validation, rate limiting
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Ingestion Pipeline  │ ← Parse documents, chunk, embed, store
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Agent Orchestrator  │ ← Plan analysis, execute tools, manage state
│  ┌───────────────┐  │
│  │ GitHub Tool    │  │
│  │ Skill Extract  │  │
│  │ README Scorer  │  │
│  │ Market Analyze │  │
│  │ Tech Detector  │  │
│  └───────────────┘  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  RAG System          │ ← Retrieve context, generate feedback, evaluate
│  (Hybrid Retrieval   │
│   + LLM Generation)  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Safety Layer        │ ← Bias check, content filter, PII scrub
└──────────┬──────────┘
           │
           ▼
      Review Output

Subsystem Details

API Layer (api/)

FastAPI application serving REST endpoints. Handles authentication (JWT), request validation (Pydantic), rate limiting, and CORS. Routes delegate to service layer in core/services/.

Ingestion Pipeline (ingestion/)

Processes user-submitted documents into vector embeddings. Parsers implement BaseParser and extract structured text. Chunkers split text for embedding. The pipeline orchestrates: parse → chunk → embed → store.

Agent System (agent/)

A plan-execute orchestrator that coordinates multiple analysis tools. Each tool implements BaseTool with name, description, and execute(). The orchestrator builds a plan based on available profile data, executes tools with retry and timeout policies, and synthesizes results.

RAG System (rag/)

Hybrid retrieval (vector similarity + BM25 keyword) fetches relevant context from the user's ingested documents. The generator uses prompt templates to produce structured, evidence-based feedback. The evaluator scores retrieval relevance and generation faithfulness.

Safety Layer (safety/)

Middleware wrapping the generation pipeline. Components run in sequence: prompt injection defense → content filter → bias detector → PII scrubber. All safety events are logged with structured metadata for monitoring.

Key Design Decisions

See the Architecture Decision Records in docs/adr/ for context on major decisions: