AGENTS Guidelines

This document serves as a guide for AI coding agents working on the LightSpeed Evaluation Framework. This document provides context, instructions, and best practices to help agents understand the codebase and contribute effectively.

⚠️ IMPORTANT GUIDELINES

1. Directory Status & New Features

Directory	Status	New Features?	Bug Fixes/Security/Linting?
`src/lightspeed_evaluation/`	Active Development	✅ Yes	✅ Yes
`lsc_agent_eval/`	Deprecated - will be removed	❌ No - add to `lightspeed_evaluation` instead	✅ Yes (with confirmation)
`src/generate_answers/`	Moving to separate repo	❌ No - will be relocated	✅ Yes (with confirmation)
`tests/`, `config/`, `docs/`	Active	✅ Yes	✅ Yes

Before making changes to deprecated/transitional directories:

Inform the user about the directory's status
Get explicit confirmation before proceeding
For new features: suggest adding to src/lightspeed_evaluation/ instead

2. Use pytest Mocking - NEVER unittest.mock

# ❌ WRONG - DO NOT USE
from unittest.mock import patch, MagicMock

# ✅ CORRECT - ALWAYS USE
def test_example(mocker):
    mocker.patch('module.function')

3. Update Documentation When Changing Features

When modifying functionality, you MUST update:

docs/ - Update relevant guide if feature behavior changes
README.md - Update when features change or adding new features
AGENTS.md - Update if adding new conventions or project structure changes

4. Update This File When Receiving New Instructions

If the user provides new project conventions, coding standards, or constraints:

Add them to this AGENTS.md file in the appropriate section
Important guidelines go in this section above
Best practices go in relevant sections below

5. Run Quality Checks Before Completing Work

Before considering any code change complete, you MUST run:

# Format code
make black-format

# Run all pre-commit checks at once (same as CI)
make pre-commit      # Runs: bandit, check-types, pyright, docstyle, ruff, pylint, black-check
# or Run each quality checks individually:
make bandit          # Security scan
make check-types     # Type check
make pyright         # Type check
make docstyle        # Docstring style
make ruff            # Lint check
make pylint          # Lint check
make black-check     # Check formatting

# Run tests
make test            # Or: uv run pytest tests

Git hooks are automatically installed via make install-deps-test. They run make pre-commit before commits and tests before pushes.

Do NOT skip these steps. If any check fails:

Fix the issues in code you changed
For pre-existing issues in unchanged code: notify the user but don't fix (to avoid scope creep)
Re-run the checks until they pass
Only then consider the task complete

Important: Do NOT disable lint warnings (e.g., # noqa, # type: ignore, # pylint: disable). Always try to fix the underlying issue. If a fix becomes too complicated, inform the user and discuss alternatives.

Project Overview

The LightSpeed Evaluation Framework is a comprehensive evaluation system for GenAI applications, supporting multiple evaluation frameworks (Ragas, DeepEval, custom metrics) with both turn-level and conversation-level assessments.

Key Components

Core Framework: Located in src/lightspeed_evaluation/
- core/: Configuration, LLM management, metrics, output handling
- pipeline/: Evaluation orchestration and processing
- runner/: Command-line interface and main entry points
Configuration: YAML-based system and evaluation data configs in config/
Testing: Comprehensive test suite in tests/ following pytest conventions

Development Environment Setup

Prerequisites

Python 3.11+
uv package manager (preferred) or pip

Quick Setup

uv sync --group dev
make install-deps-test

Environment Variables

Refer to README.md for full list. Key variables:

OPENAI_API_KEY - Required for LLM evaluation
API_KEY - Optional, for API-enabled evaluations
KUBECONFIG - Optional, for script-based evaluations

Architecture & Code Organization

Core Module Structure

src/lightspeed_evaluation/
├── core/
│   ├── api/           # API client for real-time data
│   ├── llm/           # LLM provider management
│   ├── metrics/       # Evaluation metrics (Ragas, DeepEval, custom)
│   ├── models/        # Pydantic data models
│   ├── output/        # Report generation and visualization
│   ├── script/        # Script execution for environment validation
│   └── system/        # Configuration and validation
├── pipeline/
│   └── evaluation/    # Main evaluation pipeline orchestration
└── runner/            # CLI interface and main entry points

Key Classes

EvaluationPipeline: Main orchestrator
SystemConfig / EvaluationData: Pydantic config models
MetricManager: Metric execution
OutputHandler: Report generation

Testing Guidelines

Test Structure

Mirror the source code structure in tests/:

Test files: test_*.py
Test functions: test_*
Test classes: Test*

Mocking Example (pytest-mock)

def test_llm_manager(mocker):
    """Test LLM manager with mocked provider."""
    mock_client = mocker.patch('lightspeed_evaluation.core.llm.openai.OpenAI')
    mock_client.return_value.chat.completions.create.return_value = mock_response
    
    manager = LLMManager(config)
    result = manager.evaluate_response(query, response)
    
    assert result.score > 0.5

Coverage

Aim for >80% on new code
Run: uv run pytest tests --cov=src --cov-report=html

Code Style & Quality

Code Standards

Type Hints: Required for all public functions and methods
Docstrings: Google-style docstrings for all public APIs
Error Handling: Use custom exceptions from core.system.exceptions
Logging: Use structured logging with appropriate levels

Adding New Features

Custom Metrics: Add to src/lightspeed_evaluation/core/metrics/custom/
Register: Update MetricManager supported_metrics dictionary
Configure: Add metadata to config/system.yaml metrics_metadata section
Test: Add comprehensive tests with mocked LLM calls using pytest

Troubleshooting

Configuration Errors: Check core/system/validator.py
Metric Failures: Enable DEBUG logging
API Issues: Verify API_KEY and endpoint connectivity
Test Failures: Run make test and check specific error messages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS Guidelines

⚠️ IMPORTANT GUIDELINES

1. Directory Status & New Features

2. Use pytest Mocking - NEVER unittest.mock

3. Update Documentation When Changing Features

4. Update This File When Receiving New Instructions

5. Run Quality Checks Before Completing Work

Project Overview

Key Components

Development Environment Setup

Prerequisites

Quick Setup

Environment Variables

Architecture & Code Organization

Core Module Structure

Key Classes

Testing Guidelines

Test Structure

Mocking Example (pytest-mock)

Coverage

Code Style & Quality

Code Standards

Adding New Features

Troubleshooting

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS Guidelines

⚠️ IMPORTANT GUIDELINES

1. Directory Status & New Features

2. Use pytest Mocking - NEVER unittest.mock

3. Update Documentation When Changing Features

4. Update This File When Receiving New Instructions

5. Run Quality Checks Before Completing Work

Project Overview

Key Components

Development Environment Setup

Prerequisites

Quick Setup

Environment Variables

Architecture & Code Organization

Core Module Structure

Key Classes

Testing Guidelines

Test Structure

Mocking Example (pytest-mock)

Coverage

Code Style & Quality

Code Standards

Adding New Features

Troubleshooting