Skip to content

Conversation

pedroslopez
Copy link
Contributor

@pedroslopez pedroslopez commented Oct 8, 2025

This pull request enhances the evaluation CLI and dataset management by allowing users to specify a custom dataset prefix, improving experiment lookup logic, and making the code more robust when searching for prior experiments. The main focus is on increasing flexibility for dataset naming and ensuring reliable retrieval of previous experiment results, even when datasets change due to test set updates.

Summary by CodeRabbit

  • New Features

    • Added a --dataset-prefix CLI option (default: "builder-connectors") used when creating evaluation datasets.
    • Dataset names now include the chosen prefix and indicate when connector filtering is applied.
    • Prior experiments can be discovered across datasets that share the same prefix when none exist locally.
  • Bug Fixes

    • More resilient prior-experiment lookup with per-dataset error handling to skip failed fetches.
  • Documentation

    • CLI help updated to explain the dataset-prefix option.

Copy link

coderabbitai bot commented Oct 8, 2025

📝 Walkthrough

Walkthrough

Adds a dataset_prefix CLI option forwarded into the eval runtime and Phoenix dataset creation; renames connector filtering parameter to filtered_connectors; dataset naming now includes the prefix; prior-experiment lookup falls back to datasets sharing the same prefix when none exist on the current dataset.

Changes

Cohort / File(s) Summary of Changes
CLI: dataset prefix arg
connector_builder_agents/src/evals/cli.py
Adds --dataset-prefix (dataset_prefix, default "builder-connectors"); logs the prefix and forwards it to the evaluation runner (run_evals_main).
Dataset creation & filtering
connector_builder_agents/src/evals/dataset.py
Renames connectorsfiltered_connectors; get_dataset_with_hash(filtered_connectors: ...) updated; get_or_create_phoenix_dataset(filtered_connectors: ... , *, dataset_prefix: str) builds dataset name as "filtered-{dataset_prefix}-{hash}" when filtered, or "{dataset_prefix}-{hash}" otherwise; attaches is_filtered/filtered_connectors metadata and updates docstrings/logs.
Entrypoint propagation
connector_builder_agents/src/evals/phoenix_run.py
`async def main(connectors: list[str]
Prior-experiment cross-dataset fallback
connector_builder_agents/src/evals/summary.py
find_prior_experiment extended to, when no priors on current dataset, derive a dataset prefix, list datasets with that prefix (excluding filtered datasets), aggregate experiments across matches with per-dataset error handling, and resiliently fetch full prior experiments (try/except per prior).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant CLI as evals/cli.py
  participant PR as phoenix_run.main
  participant DS as dataset.get_or_create_phoenix_dataset
  participant PX as Phoenix API

  U->>CLI: run --dataset-prefix=<prefix> [--connectors ...]
  CLI->>PR: main(connectors, dataset_prefix=<prefix>)
  PR->>DS: get_or_create_phoenix_dataset(filtered_connectors=connectors, dataset_prefix=<prefix>)
  DS->>PX: find/create dataset "<prefix>-<hash>" or "filtered-<prefix>-<hash>"
  PX-->>DS: Dataset
  DS-->>PR: Dataset
  PR-->>U: Run evaluations using dataset
Loading
sequenceDiagram
  autonumber
  participant SUM as summary.find_prior_experiment
  participant PX as Phoenix API

  SUM->>PX: Get current dataset + experiments
  alt Experiments with eval runs exist
    PX-->>SUM: Experiments
  else No experiments found
    SUM->>SUM: Derive prefix from dataset name (before last "-")
    SUM->>PX: List datasets starting with prefix
    PX-->>SUM: Matching datasets
    loop For each matching dataset
      SUM->>PX: Get experiments (try/except)
      PX-->>SUM: Experiments or error
    end
    SUM->>SUM: Aggregate experiments with eval runs
  end
  SUM-->>SUM: Return selected prior experiment or None
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • aaronsteers

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the core enhancement of enabling prior experiment lookup across datasets, directly reflecting the main summary.py changes without extraneous detail, making it specific and understandable at a glance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pedro/dataset-workaround

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00e97b0 and 7873fa5.

📒 Files selected for processing (2)
  • connector_builder_agents/src/evals/dataset.py (3 hunks)
  • connector_builder_agents/src/evals/summary.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test Connector Build (JSONPlaceholder)
  • GitHub Check: Test Connector Build (PokemonTGG)
🔇 Additional comments (5)
connector_builder_agents/src/evals/dataset.py (2)

18-23: LGTM: Clear parameter rename.

The rename from connectors to filtered_connectors improves clarity and accurately describes the parameter's purpose.


63-78: LGTM: Well-structured dataset naming and prefix support.

The addition of dataset_prefix as a keyword-only parameter and the conditional prefixing logic for filtered datasets (filtered-{dataset_prefix}-{dataset_hash} vs {dataset_prefix}-{dataset_hash}) provides clear differentiation between dataset types and enables the cross-dataset prior discovery feature.

connector_builder_agents/src/evals/summary.py (3)

74-93: LGTM: Cross-dataset discovery with appropriate filtered dataset handling.

The fallback to cross-dataset search when no prior experiments exist on the current dataset addresses the issue of dataset recreation when test sets change. The logic correctly skips filtered datasets (line 88-89) since they target specific connector subsets and cross-dataset comparison would be inappropriate. The prefix extraction using rsplit("-", 1)[0] works correctly since it splits from the right, preserving multi-part prefixes (e.g., "my-cool-prefix-abc123" → "my-cool-prefix").


100-125: LGTM: Robust dataset filtering and experiment aggregation.

The filtering logic correctly identifies datasets sharing the same prefix while excluding:

  1. The current dataset (ds.get("id") != dataset_id)
  2. Filtered datasets (implicitly, since they start with "filtered-" not {dataset_prefix}-)

The per-dataset error handling (lines 115-120) ensures the aggregation continues even if some datasets are inaccessible, with appropriate warning logs.


139-148: LGTM: Resilient prior experiment retrieval.

Wrapping individual prior experiment fetches in try-except blocks prevents a single fetch failure from blocking the entire prior discovery process. The warning logs provide visibility while allowing graceful degradation.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

github-actions bot commented Oct 8, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/dataset-workaround", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/dataset-workaround#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poe <command> - Runs any poe command in the uv virtual environment
  • /poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

Copy link

github-actions bot commented Oct 8, 2025

PyTest Results (Fast)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit 7873fa5. ± Comparison against base commit 56e7f4a.

♻️ This comment has been updated with latest results.

@pedroslopez pedroslopez changed the title Pedro/dataset workaround feat: find prior experiment run across datasets Oct 8, 2025
@pedroslopez pedroslopez marked this pull request as ready for review October 8, 2025 23:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56e7f4a and b7fef0d.

📒 Files selected for processing (4)
  • connector_builder_agents/src/evals/cli.py (3 hunks)
  • connector_builder_agents/src/evals/dataset.py (1 hunks)
  • connector_builder_agents/src/evals/phoenix_run.py (1 hunks)
  • connector_builder_agents/src/evals/summary.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
connector_builder_agents/src/evals/phoenix_run.py (2)
connector_builder_agents/src/evals/cli.py (1)
  • main (72-116)
connector_builder_agents/src/evals/dataset.py (1)
  • get_or_create_phoenix_dataset (63-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build PokemonTGG Connector
  • GitHub Check: Build Hubspot Connector
  • GitHub Check: Build JSONPlaceholder Connector
  • GitHub Check: Test Connector Build (PokemonTGG)
  • GitHub Check: Test Connector Build (JSONPlaceholder)
  • GitHub Check: Test Connector Build (PokemonTGG)
  • GitHub Check: Test Connector Build (JSONPlaceholder)
🔇 Additional comments (5)
connector_builder_agents/src/evals/dataset.py (1)

63-65: LGTM! Clean parameterization of dataset prefix.

The function signature change properly introduces dataset_prefix as a keyword-only parameter, and the implementation consistently replaces the hardcoded prefix with the dynamic parameter. The docstring is updated appropriately.

Also applies to: 70-70, 77-77, 79-79

connector_builder_agents/src/evals/cli.py (1)

8-8: LGTM! Proper CLI argument integration.

The new --dataset-prefix argument is well-integrated with appropriate help text, sensible default, and proper propagation to the evaluation runner. The added logging provides visibility into the configuration being used.

Also applies to: 39-39, 44-45, 96-101

connector_builder_agents/src/evals/phoenix_run.py (1)

41-41: LGTM! Parameter properly threaded through.

The dataset_prefix parameter is correctly added to the function signature and passed to the dataset creation function, maintaining the keyword-only parameter pattern.

Also applies to: 48-48

connector_builder_agents/src/evals/summary.py (2)

110-116: Good error handling for robustness.

The try-except blocks around individual dataset and experiment fetches ensure the cross-dataset search continues even when some datasets or experiments are inaccessible. The warning logs provide visibility into failures without breaking the overall flow.

Also applies to: 134-143


124-126: Clear early return improves readability.

The explicit check and early return when no prior experiments are found (after cross-dataset search) makes the control flow easier to follow.

@github-actions github-actions bot added the enhancement New feature or request label Oct 9, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7fef0d and fa8fe32.

📒 Files selected for processing (3)
  • connector_builder_agents/src/evals/dataset.py (4 hunks)
  • connector_builder_agents/src/evals/phoenix_run.py (1 hunks)
  • connector_builder_agents/src/evals/summary.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
connector_builder_agents/src/evals/phoenix_run.py (2)
connector_builder_agents/src/evals/cli.py (1)
  • main (72-116)
connector_builder_agents/src/evals/dataset.py (1)
  • get_or_create_phoenix_dataset (63-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test Connector Build (JSONPlaceholder)
  • GitHub Check: Pytest (Fast)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
connector_builder_agents/src/evals/dataset.py (1)

63-65: Consider adding input validation for dataset_prefix.

The dataset_prefix parameter is well-designed as keyword-only, but there's no validation to ensure it:

  • Doesn't contain problematic characters (e.g., spaces, special chars)
  • Doesn't end with a dash (which could create dataset names like prefix--hash)
  • Isn't empty

Invalid prefixes could cause issues with the dataset matching logic in summary.py (lines 100-101) that assumes the format {prefix}-{hash}.

Apply this diff to add validation:

 def get_or_create_phoenix_dataset(
     filtered_connectors: list[str] | None = None, *, dataset_prefix: str
 ) -> Dataset:
     """Get or create a Phoenix dataset for the evals config.
 
     Args:
         filtered_connectors: Optional list of connector names to filter by.
         dataset_prefix: Prefix for the dataset name.
     """
+    # Validate dataset_prefix
+    if not dataset_prefix or not dataset_prefix.strip():
+        raise ValueError("dataset_prefix cannot be empty")
+    if dataset_prefix.endswith("-"):
+        raise ValueError("dataset_prefix cannot end with a dash")
+    if not dataset_prefix.replace("-", "").replace("_", "").isalnum():
+        raise ValueError("dataset_prefix can only contain alphanumeric characters, dashes, and underscores")
+    
     dataframe, dataset_hash = get_dataset_with_hash(filtered_connectors=filtered_connectors)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa79ae1 and 00e97b0.

📒 Files selected for processing (2)
  • connector_builder_agents/src/evals/dataset.py (3 hunks)
  • connector_builder_agents/src/evals/summary.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Test Connector Build (PokemonTGG)
  • GitHub Check: Test Connector Build (JSONPlaceholder)
🔇 Additional comments (9)
connector_builder_agents/src/evals/summary.py (5)

42-43: LGTM!

The docstring accurately describes the new fallback behavior for cross-dataset search.


93-102: Prefix matching logic appears sound.

The filter ds.get("name", "").startswith(dataset_prefix + "-") correctly matches datasets with the extracted prefix followed by a dash, and excludes the current dataset.


108-122: LGTM! Good defensive programming.

The error handling ensures the search continues even if individual datasets cannot be accessed, and the logging provides visibility into failures.


136-145: LGTM! Improved resilience.

Wrapping each get_experiment call in a try/except block ensures that failures to fetch individual experiments don't prevent finding other prior experiments.


87-91: Ignore incorrect prefix parsing concern
The Phoenix dataset name is always constructed as {dataset_prefix}-{hash} (filtering only alters the hash), so current_dataset_name.rsplit("-", 1)[0] reliably recovers the original dataset_prefix.

Likely an incorrect or invalid review comment.

connector_builder_agents/src/evals/dataset.py (4)

18-18: LGTM! More descriptive parameter name.

Renaming connectors to filtered_connectors better communicates the parameter's purpose.


22-22: LGTM! Consistent parameter usage.

All references correctly updated to use filtered_connectors throughout the function, including in conditionals, dataframe operations, logging, and error messages.

Also applies to: 35-46


72-72: LGTM! Consistent with parameter rename.

The call correctly passes filtered_connectors using keyword argument syntax.


74-74: LGTM! Enables customizable dataset naming.

The dataset name construction now uses the provided dataset_prefix instead of the hardcoded "builder-connectors", enabling the flexibility described in the PR objectives.

@pedroslopez pedroslopez merged commit e5f25e6 into main Oct 10, 2025
16 checks passed
@pedroslopez pedroslopez deleted the pedro/dataset-workaround branch October 10, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants