Skip to content

Latest commit

 

History

History
277 lines (192 loc) · 9.64 KB

File metadata and controls

277 lines (192 loc) · 9.64 KB

Project Guide

This guide is a practical map of the entire repository for contributors and maintainers.

It focuses on:

  • What each folder is responsible for
  • Which Python environment and package workflow are the defaults
  • Which commands are currently valid
  • What to improve next in architecture, testing, performance, and developer experience

1) System Summary

AI Imaging Agent is a RAG plus VLM recommender for imaging software.

High-level flow:

  1. User uploads file(s) and enters a task.
  2. Retrieval stage finds candidate tools (BGE-M3 + FAISS + reranker).
  3. Agent/VLM stage ranks candidates with image-aware reasoning.
  4. UI renders ranked recommendations and optional demo links.

Primary orchestrator: src/ai_agent/api/pipeline.py

2) Default Python Environment And Packages (Dev Container Canonical)

Assume development is done inside the dev container.

Source of truth:

Default environment:

  • OS: Debian Bookworm (dev container)
  • Python: 3.12
  • Environment manager: uv
  • Virtual environment path: .venv

Recommended commands:

uv venv
uv pip install -e .
uv pip install -e ".[dev]"

Run and test:

ai_agent chat
ai_agent sync
pytest tests/

Important note on command drift:

  • CLI officially supports chat and sync in src/ai_agent/cli.py.
  • justfile currently references ai_agent ui, which does not match current CLI modes.
  • Documentation in this guide follows the actual CLI implementation.

3) Repository Top-Level Map

4) Detailed Source Folder Responsibilities

Package root: src/ai_agent/

Purpose: conversational orchestration using PydanticAI.

Key files:

Boundary:

  • Should orchestrate tools and policy, not own retrieval internals.

Purpose: pipeline orchestration between inputs, retrieval, and selection.

Key file:

Responsibilities:

  • validate files
  • extract metadata
  • build retrieval query
  • call retrieval and selection stages
  • manage index refresh/reload behavior

Boundary:

  • Keep UI concerns out of this module.

Purpose: deterministic retrieval stack (no LLM calls).

Key files:

Boundary:

  • Retrieval quality logic should stay here.

Purpose: selection schema and prompting primitives.

Key files:

Boundary:

  • Keep this layer focused on schema and prompt contracts, not transport/UI concerns.

Purpose: Gradio app and interaction handling.

Key files:

Boundary:

  • UI should call orchestrators, not reimplement retrieval/selection decisions.

Purpose: cross-cutting utility functions.

Key files:

Boundary:

  • Keep utilities reusable and independent from UI-specific logic.

Purpose: catalog synchronization and refresh helpers.

Key file:

Boundary:

  • Catalog IO and sync logic should stay isolated from ranking logic.

Purpose: shared core coordination such as pipeline registry.

Key file:

Boundary:

  • Keep core primitives minimal and dependency-light.

Purpose: query assets used by catalog sync/retrieval support.

Key file:

Boundary:

  • Keep query definitions versioned and testable.

Purpose: command entry point and mode dispatch.

Current modes:

  • chat
  • sync

This is the command contract docs should follow.

5) Supporting Folders

5.1 tests/

Contains unit/integration tests and test fixtures under tests/data/.

Improvement target:

  • add more focused tests for UI handler edge cases and tool failure handling.

5.2 tools/

Container and deployment support assets.

Notable file:

5.3 docs/

Documentation source for MkDocs.

Add new pages to mkdocs.yml nav to keep docs discoverable.

6) Known Inconsistencies To Track

  1. justfile uses ai_agent ui, while src/ai_agent/cli.py defines chat and sync.
  2. Installation docs often show pip-first flow, while dev container bootstrap is uv-first.
  3. requirements.txt is looser than pyproject.toml, which contains current pinned/runtime dependencies.

7) Codebase Improvement Guidelines

7.1 Architecture And Modularity

  1. Keep strict stage boundaries: retrieval logic in retriever, selection contracts in generator, orchestration in api.
  2. Minimize cross-layer imports from ui to low-level modules.
  3. Introduce lightweight interface contracts for tool adapters to reduce coupling in agent/tools.
  4. Centralize shared constants/env defaults to reduce duplicated configuration behavior.

7.2 Testing And Quality Gates

  1. Add regression tests for format-token query construction and retry broadening behavior.
  2. Add failure-path tests for image preview generation and graceful degradation.
  3. Add contract tests for agent tool outputs (search, alternative search, repo info).
  4. Enforce formatting/lint/type checks in CI (ruff, black --check, mypy, pytest).

7.3 Performance And Retrieval Quality

  1. Add benchmark fixtures for retrieval latency and reranker throughput.
  2. Track retrieval quality with a small fixed evaluation set (top-k recall, MRR).
  3. Cache expensive metadata extraction where safe for repeated files in a session.
  4. Make index reload behavior observable with structured counters in logs.

7.4 Developer Experience And CI

  1. Align just tasks with real CLI contract (chat/sync).
  2. Add a docs link checker in CI to prevent markdown drift.
  3. Document one canonical local workflow (dev container first, optional local pip fallback).
  4. Add a short maintainer checklist for release prep and changelog updates.

8) Practical Contributor Checklist

Before opening a PR:

  1. Install/update in editable mode in the active environment.
  2. Run tests relevant to changed modules.
  3. Validate docs links if docs were touched.
  4. Update CHANGELOG.md for user-visible changes.
  5. Confirm command and environment docs still match real behavior.

9) Related References