Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Sep 21, 2025

feat: Add token usage tracking with cost estimation to multi-agent workflow

Summary

Implements comprehensive token usage tracking and cost estimation for the multi-agent connector building workflow. This adds:

  • Cost tracking system with hard-coded pricing for 25+ AI models (OpenAI, GPT-4o, o1, etc.)
  • Model name extraction fixes to resolve the "unknown-model" bug
  • Smart file organization - usage files now save alongside manifest.yaml when available
  • Real-time cost monitoring with detailed breakdowns by model and request

The system tracks input/output tokens, request counts, and provides cost estimates in USD during both interactive and manager-developer build modes.

Review & Testing Checklist for Human

3 critical items to verify:

  • Test cost estimation accuracy - Run a real connector build and verify the cost calculations match expected API pricing. Check that model names are correctly extracted (no more "unknown-model" in output)
  • Verify file path logic - Test scenarios with and without manifest.yaml files to ensure usage files save to the correct directory. Test edge cases like multiple manifest files or nested directories
  • Check pricing data accuracy - Validate the hard-coded pricing in _MODEL_PRICING against current OpenAI pricing documentation. Pricing data was provided by user but should be double-checked

Additional testing:

  • Run both interactive and manager-developer build modes to ensure cost tracking doesn't break existing workflows
  • Verify the token attribute mapping handles both OpenAI format (prompt_tokens/completion_tokens) and expected format (input_tokens/output_tokens) correctly

Notes

  • Hard-coded pricing may need periodic updates as API pricing changes
  • Debug scripts (debug_model_extraction.py, debug_workflow_responses.py) are included in the diff but may not be needed in production
  • The model extraction logic has extensive fallbacks to handle different response structures - this complexity may hide edge cases

Link to Devin run: https://app.devin.ai/sessions/49242329340c4c67908d1ff5c10103d7
Requested by: @aaronsteers (AJ Steers)

- Add CostTracker class with generic model/SKU tracking approach
- Track input/output/total tokens and request counts from RunResult.raw_responses
- Integrate tracking into run_manager_developer_build and run_interactive_build
- Add CostEvaluator with business logic for usage assessment
- Correlate usage data with existing trace_id for attribution
- Save detailed usage summaries to JSON files
- Focus on token tracking foundation rather than hardcoded cost calculations

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

Original prompt from AJ Steers
Received message in Slack channel #ask-devin-ai:

@Devin - We're wanting to use evals for the connector-builder-mcp tool, and specifically for the new `poe build-connector` task, which wraps the tools in a multi-agent workflow. Can you advise how we can plug this into an "evals" framework like the one described here? As of now, the eval _only_ needs to look at the output of the "connector readiness check" - which is a markdown file enumerating the streams and number of records per stream. We will define the correct answers that should be in this report (a successful report text for instance), and the eval should grade the result based on that golden example.
Thread URL: https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1758217363818419?thread_ts=1758217363.818419

Copy link
Contributor

devin-ai-integration bot commented Sep 21, 2025

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1758422080-add-token-usage-tracking", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1758422080-add-token-usage-tracking#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poe <command> - Runs any poe command in the uv virtual environment
  • /poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

Copy link

github-actions bot commented Sep 21, 2025

PyTest Results (Fast)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit 3d354ca. ± Comparison against base commit ce2626c.

♻️ This comment has been updated with latest results.

- Apply ruff formatting to cost_tracking.py and run.py
- Convert single quotes to double quotes for consistency
- Wrap long lines to meet formatting standards

Co-Authored-By: AJ Steers <[email protected]>
@devin-ai-integration devin-ai-integration bot changed the title Add token usage tracking to multi-agent workflow feat: Add token usage tracking to multi-agent workflow Sep 21, 2025
@github-actions github-actions bot added the enhancement New feature or request label Sep 21, 2025
devin-ai-integration bot and others added 15 commits September 21, 2025 02:47
- Add cost_summary_report string property to CostTracker class
- Consolidate all summary text generation into single property
- Simplify run_manager_developer_build to use new summary property
- Remove unused CostEvaluator import from run.py
- Maintain equivalent summary output with cleaner code structure

Co-Authored-By: AJ Steers <[email protected]>
- Add comprehensive pricing table for OpenAI, Anthropic, and other models
- Implement cost calculation in _calculate_cost method using per-token pricing
- Enhance model name extraction with additional fallback strategies
- Fix bug in CostEvaluator (thresholds -> _THRESHOLDS)
- Add cost display to summary reports with 4-decimal precision
- Support for unknown models with conservative pricing estimates

Co-Authored-By: AJ Steers <[email protected]>
- Clarifies that tuples represent (input_price_per_1M_tokens, output_price_per_1M_tokens) in USD
- Includes example for better understanding
- Addresses code documentation feedback

Co-Authored-By: AJ Steers <[email protected]>
- Updated _MODEL_PRICING with official pricing from OpenAI and other providers
- Organized models by series (GPT-5, GPT-4.1, GPT-4o, O-series, etc.)
- Added support for specialized models (realtime, audio, search, computer-use)
- Set gpt-image-1 output pricing to 0.00 as it has no output tokens
- Updated fallback pricing for unknown-model to be more conservative

Co-Authored-By: AJ Steers <[email protected]>
- Added https://platform.openai.com/docs/pricing URL to _MODEL_PRICING docstring
- Noted that the pricing page may require login for access
- Provides authoritative source for pricing data verification and updates

Co-Authored-By: AJ Steers <[email protected]>
- Verified that https://platform.openai.com/docs/pricing requires authentication
- Updated docstring from 'may require login' to 'requires login' for accuracy
- Browser test showed authentication error when accessing the URL

Co-Authored-By: AJ Steers <[email protected]>
- Add fallback logic for OpenAI vs expected attribute naming (completion_tokens vs output_tokens)
- Handle missing requests attribute from OpenAI responses (default to 1)
- Improve debug logging in _extract_model_name for better troubleshooting
- Add comprehensive test scripts to verify both response structure types
- Fix token calculation in add_run_result to handle both attribute naming conventions

Co-Authored-By: AJ Steers <[email protected]>
…rt pattern

- Update usage file save logic to look for manifest.yaml in workspace directory
- Save usage files alongside manifest when found, fall back to workspace directory
- Follow same pattern as readiness report for consistent file organization
- Add logging to show where usage files are being saved

Co-Authored-By: AJ Steers <[email protected]>
Comment on lines 266 to 283
try:
from .constants import WORKSPACE_WRITE_DIR

usage_dir = WORKSPACE_WRITE_DIR
manifest_files = list(WORKSPACE_WRITE_DIR.glob("**/manifest.yaml"))
if manifest_files:
usage_dir = manifest_files[0].parent
update_progress_log(
f"📁 Found manifest at {manifest_files[0]}, saving usage data in same directory"
)
else:
update_progress_log(
"📁 No manifest.yaml found, saving usage data in workspace directory"
)

usage_file = usage_dir / f"{trace_id}_usage_summary.json"
cost_tracker.save_to_file(usage_file)
update_progress_log(f"📊 Detailed usage data saved to: {usage_file}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't remember we already have this constant: WORKSPACE_WRITE_DIR

Skip checking for manifest.yaml - it doesn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant