Skip to content

Latest commit

 

History

History
201 lines (157 loc) · 5.51 KB

File metadata and controls

201 lines (157 loc) · 5.51 KB

AI Agents Architecture

This document describes the AI agents implemented in the Receipt Analyzer application.

Current Agents

1. Receipt Analyzer Agent

File: agent/receipt_analyzer_agent.py Type: LangGraph workflow with Azure OpenAI integration Purpose: Analyzes receipt images and extracts structured expense data

Workflow Structure

graph TD
    A[analyze_image_node] --> B[categorize_expense_node]
    B --> C[END]
Loading

Node Details

analyze_image_node

  • Input: Base64 encoded receipt image
  • Processing: Calls Azure OpenAI vision model (o4-mini)
  • Output: Structured JSON with receipt details
  • Tools Used: analyze_receipt_image

categorize_expense_node

  • Input: Extracted receipt information
  • Processing: AI-powered expense categorization
  • Output: Business categories, deductibility, tags
  • Tools Used: categorize_expense

State Management

class ReceiptAnalysisState(TypedDict):
    messages: Annotated[List[BaseMessage], add_messages]
    image_data: str                    # Base64 receipt image
    analysis_result: Dict[str, Any]    # Raw analysis output
    extracted_info: Dict[str, Any]     # Final structured data

Tools

analyze_receipt_image

  • Converts base64 image to Azure OpenAI vision API call
  • Extracts: merchant, date, items, amounts, payment method
  • Returns structured JSON with confidence scores

categorize_expense

  • Analyzes merchant and items for expense classification
  • Determines business deductibility
  • Assigns budget categories and tags

Output Schema

{
  "merchant_name": "Store Name",
  "date": "2024-01-15",
  "time": "14:30",
  "total_amount": 45.67,
  "currency": "USD",
  "tax_amount": 3.50,
  "items": [
    {
      "name": "Item Name",
      "quantity": 2,
      "price": 12.99,
      "total": 25.98
    }
  ],
  "payment_method": "Card",
  "receipt_number": "12345",
  "category": "grocery",
  "categorization": {
    "primary_category": "Groceries",
    "sub_category": "Food & Beverages",
    "business_deductible": false,
    "tax_implications": "Personal expense",
    "budget_category": "Food",
    "tags": ["grocery", "food", "personal"]
  },
  "analysis_timestamp": "2024-01-15T14:30:00",
  "confidence": "high"
}

Agent Integration

FastAPI Integration

The agent is integrated into FastAPI via:

  1. CopilotKit Integration (/copilotkit endpoint)

    • Full conversational interface
    • Supports chat-based interactions
    • Integrated with frontend chat widget
  2. Direct API Endpoint (/analyze-receipt endpoint)

    • Direct receipt analysis without chat context
    • Used by main receipt analyzer component
    • Faster for simple analysis tasks

LangSmith Observability

All agent executions are traced in LangSmith:

  • Workflow-level tracing: Complete agent execution flow
  • Tool-level tracing: Individual tool calls and results
  • Image attachments: Visual receipt images in traces
  • Performance metrics: Token usage, latency, costs

Usage Patterns

Direct Analysis

result = receipt_analysis_graph.invoke({
    "image_data": base64_image,
    "messages": [],
    "analysis_result": {},
    "extracted_info": {}
})

Chat Integration

receipt_analyzer_agent = LangGraphAgent(
    name="receipt_analyzer",
    description="Analyzes receipt images...",
    graph=receipt_analysis_graph,
)

Planned Agents

2. Browser Automation Agent (Future)

Purpose: Automate expense report submission to accounting systems Capabilities:

  • Navigate web interfaces
  • Fill forms with extracted data
  • Handle multi-step workflows
  • Screenshot verification

Integration:

  • Would receive output from Receipt Analyzer Agent
  • Could be triggered automatically or on-demand
  • Would provide audit trail of submissions

3. Document Classification Agent (Future)

Purpose: Classify and route different document types Capabilities:

  • Distinguish receipts from invoices, statements, etc.
  • Route to appropriate specialized agents
  • Handle batch document processing

4. Compliance Verification Agent (Future)

Purpose: Verify expense compliance with company policies Capabilities:

  • Check spending limits
  • Validate expense categories
  • Flag policy violations
  • Generate compliance reports

Agent Development Guidelines

Creating New Agents

  1. Define State Schema: Create TypedDict for agent state
  2. Design Workflow: Plan node structure and data flow
  3. Implement Tools: Create individual @tool functions
  4. Build Graph: Assemble nodes and edges
  5. Add Tracing: Integrate LangSmith observability
  6. Test Integration: Verify with FastAPI endpoints

Best Practices

  • Atomic Tools: Keep tools focused on single responsibilities
  • Error Handling: Graceful fallbacks for tool failures
  • State Management: Clear state transitions between nodes
  • Observability: Comprehensive tracing and logging
  • Testing: Unit tests for tools, integration tests for workflows

Azure OpenAI Integration

  • Model Compatibility: Ensure o4-mini parameter compatibility
  • Rate Limiting: Implement appropriate retry logic
  • Cost Optimization: Monitor token usage and costs
  • Error Recovery: Handle API failures gracefully

LangSmith Integration

  • Trace Everything: Workflow, tools, and external API calls
  • Attach Media: Images, documents, and rich content
  • Tag Runs: Consistent tagging for filtering and analysis
  • Evaluate Quality: Regular evaluation runs for quality assessment