feat: Add token usage tracking to multi-agent workflow #86

aaronsteers · 2025-09-21T02:40:13Z

feat: Add token usage tracking with cost estimation to multi-agent workflow

Summary

Implements comprehensive token usage tracking and cost estimation for the multi-agent connector building workflow. This adds:

Cost tracking system with hard-coded pricing for 25+ AI models (OpenAI, GPT-4o, o1, etc.)
Model name extraction fixes to resolve the "unknown-model" bug
Smart file organization - usage files now save alongside manifest.yaml when available
Real-time cost monitoring with detailed breakdowns by model and request

The system tracks input/output tokens, request counts, and provides cost estimates in USD during both interactive and manager-developer build modes.

Review & Testing Checklist for Human

3 critical items to verify:

Test cost estimation accuracy - Run a real connector build and verify the cost calculations match expected API pricing. Check that model names are correctly extracted (no more "unknown-model" in output)
Verify file path logic - Test scenarios with and without manifest.yaml files to ensure usage files save to the correct directory. Test edge cases like multiple manifest files or nested directories
Check pricing data accuracy - Validate the hard-coded pricing in _MODEL_PRICING against current OpenAI pricing documentation. Pricing data was provided by user but should be double-checked

Additional testing:

Run both interactive and manager-developer build modes to ensure cost tracking doesn't break existing workflows
Verify the token attribute mapping handles both OpenAI format (prompt_tokens/completion_tokens) and expected format (input_tokens/output_tokens) correctly

Notes

Hard-coded pricing may need periodic updates as API pricing changes
Debug scripts (debug_model_extraction.py, debug_workflow_responses.py) are included in the diff but may not be needed in production
The model extraction logic has extensive fallbacks to handle different response structures - this complexity may hide edge cases

Link to Devin run: https://app.devin.ai/sessions/49242329340c4c67908d1ff5c10103d7
Requested by: @aaronsteers (AJ Steers)

- Add CostTracker class with generic model/SKU tracking approach - Track input/output/total tokens and request counts from RunResult.raw_responses - Integrate tracking into run_manager_developer_build and run_interactive_build - Add CostEvaluator with business logic for usage assessment - Correlate usage data with existing trace_id for attribution - Save detailed usage summaries to JSON files - Focus on token tracking foundation rather than hardcoded cost calculations Co-Authored-By: AJ Steers <[email protected]>

devin-ai-integration · 2025-09-21T02:40:16Z

Original prompt from AJ Steers

Received message in Slack channel #ask-devin-ai:

@Devin - We're wanting to use evals for the connector-builder-mcp tool, and specifically for the new `poe build-connector` task, which wraps the tools in a multi-agent workflow. Can you advise how we can plug this into an "evals" framework like the one described here? As of now, the eval _only_ needs to look at the output of the "connector readiness check" - which is a markdown file enumerating the streams and number of records per stream. We will define the correct answers that should be in this report (a successful report text for instance), and the eval should grade the result based on that golden example.
Thread URL: https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1758217363818419?thread_ts=1758217363.818419

devin-ai-integration · 2025-09-21T02:40:17Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2025-09-21T02:40:25Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1758422080-add-token-usage-tracking", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1758422080-add-token-usage-tracking#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poe <command> - Runs any poe command in the uv virtual environment
/poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

github-actions · 2025-09-21T02:42:02Z

PyTest Results (Fast)

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
0 files ±0 0 ❌ ±0

Results for commit 3d354ca. ± Comparison against base commit ce2626c.

♻️ This comment has been updated with latest results.

- Apply ruff formatting to cost_tracking.py and run.py - Convert single quotes to double quotes for consistency - Wrap long lines to meet formatting standards Co-Authored-By: AJ Steers <[email protected]>

- Add cost_summary_report string property to CostTracker class - Consolidate all summary text generation into single property - Simplify run_manager_developer_build to use new summary property - Remove unused CostEvaluator import from run.py - Maintain equivalent summary output with cleaner code structure Co-Authored-By: AJ Steers <[email protected]>

…cking Co-Authored-By: AJ Steers <[email protected]>

- Add comprehensive pricing table for OpenAI, Anthropic, and other models - Implement cost calculation in _calculate_cost method using per-token pricing - Enhance model name extraction with additional fallback strategies - Fix bug in CostEvaluator (thresholds -> _THRESHOLDS) - Add cost display to summary reports with 4-decimal precision - Support for unknown models with conservative pricing estimates Co-Authored-By: AJ Steers <[email protected]>

- Clarifies that tuples represent (input_price_per_1M_tokens, output_price_per_1M_tokens) in USD - Includes example for better understanding - Addresses code documentation feedback Co-Authored-By: AJ Steers <[email protected]>

- Updated _MODEL_PRICING with official pricing from OpenAI and other providers - Organized models by series (GPT-5, GPT-4.1, GPT-4o, O-series, etc.) - Added support for specialized models (realtime, audio, search, computer-use) - Set gpt-image-1 output pricing to 0.00 as it has no output tokens - Updated fallback pricing for unknown-model to be more conservative Co-Authored-By: AJ Steers <[email protected]>

- Added https://platform.openai.com/docs/pricing URL to _MODEL_PRICING docstring - Noted that the pricing page may require login for access - Provides authoritative source for pricing data verification and updates Co-Authored-By: AJ Steers <[email protected]>

- Verified that https://platform.openai.com/docs/pricing requires authentication - Updated docstring from 'may require login' to 'requires login' for accuracy - Browser test showed authentication error when accessing the URL Co-Authored-By: AJ Steers <[email protected]>

- Add fallback logic for OpenAI vs expected attribute naming (completion_tokens vs output_tokens) - Handle missing requests attribute from OpenAI responses (default to 1) - Improve debug logging in _extract_model_name for better troubleshooting - Add comprehensive test scripts to verify both response structure types - Fix token calculation in add_run_result to handle both attribute naming conventions Co-Authored-By: AJ Steers <[email protected]>

…rt pattern - Update usage file save logic to look for manifest.yaml in workspace directory - Save usage files alongside manifest when found, fall back to workspace directory - Follow same pattern as readiness report for consistent file organization - Add logging to show where usage files are being saved Co-Authored-By: AJ Steers <[email protected]>

Co-Authored-By: AJ Steers <[email protected]>

aaronsteers · 2025-09-22T19:45:01Z

connector_builder_agents/src/run.py

+            try:
+                from .constants import WORKSPACE_WRITE_DIR
+
+                usage_dir = WORKSPACE_WRITE_DIR
+                manifest_files = list(WORKSPACE_WRITE_DIR.glob("**/manifest.yaml"))
+                if manifest_files:
+                    usage_dir = manifest_files[0].parent
+                    update_progress_log(
+                        f"📁 Found manifest at {manifest_files[0]}, saving usage data in same directory"
+                    )
+                else:
+                    update_progress_log(
+                        "📁 No manifest.yaml found, saving usage data in workspace directory"
+                    )
+
+                usage_file = usage_dir / f"{trace_id}_usage_summary.json"
+                cost_tracker.save_to_file(usage_file)
+                update_progress_log(f"📊 Detailed usage data saved to: {usage_file}")


I didn't remember we already have this constant: WORKSPACE_WRITE_DIR

Skip checking for manifest.yaml - it doesn't matter.

devin-ai-integration bot assigned aaronsteers Sep 21, 2025

Fix code formatting issues

02f12ed

- Apply ruff formatting to cost_tracking.py and run.py - Convert single quotes to double quotes for consistency - Wrap long lines to meet formatting standards Co-Authored-By: AJ Steers <[email protected]>

devin-ai-integration bot changed the title ~~Add token usage tracking to multi-agent workflow~~ feat: Add token usage tracking to multi-agent workflow Sep 21, 2025

github-actions bot added the enhancement New feature or request label Sep 21, 2025

devin-ai-integration bot and others added 15 commits September 21, 2025 02:47

Revert GitHub token validation changes - keep focus on core token tra…

5e5f745

…cking Co-Authored-By: AJ Steers <[email protected]>

clean up

5d702de

fix lint warnings

9817be2

gpt-5 pricing for unknown-model

539659e

add missing copyright

426183e

Merge branch 'main' into devin/1758422080-add-token-usage-tracking

2c6b6c0

fix: Apply formatting and linting fixes for usage file path changes

48d82ec

Co-Authored-By: AJ Steers <[email protected]>

aaronsteers commented Sep 22, 2025

View reviewed changes

aaronsteers added 3 commits September 23, 2025 20:49

Merge branch 'main' into devin/1758422080-add-token-usage-tracking

22c184f

revert gh-specific token change

6136af9

clean up logs dir

3d354ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add token usage tracking to multi-agent workflow #86

feat: Add token usage tracking to multi-agent workflow #86

Uh oh!

aaronsteers commented Sep 21, 2025 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot commented Sep 21, 2025

Uh oh!

devin-ai-integration bot commented Sep 21, 2025 •

edited by aaronsteers

Loading

Uh oh!

github-actions bot commented Sep 21, 2025

Uh oh!

github-actions bot commented Sep 21, 2025 •

edited

Loading

Uh oh!

aaronsteers Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Add token usage tracking to multi-agent workflow #86

Are you sure you want to change the base?

feat: Add token usage tracking to multi-agent workflow #86

Uh oh!

Conversation

aaronsteers commented Sep 21, 2025 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: Add token usage tracking with cost estimation to multi-agent workflow

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Sep 21, 2025

Uh oh!

devin-ai-integration bot commented Sep 21, 2025 • edited by aaronsteers Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Devin AI Engineer

Uh oh!

github-actions bot commented Sep 21, 2025

👋 Greetings, Airbyte Team Member!

Testing This Branch via MCP

Testing This Branch via CLI

PR Slash Commands

Uh oh!

github-actions bot commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

aaronsteers Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aaronsteers commented Sep 21, 2025 •

edited by devin-ai-integration bot

Loading

devin-ai-integration bot commented Sep 21, 2025 •

edited by aaronsteers

Loading

github-actions bot commented Sep 21, 2025 •

edited

Loading