-
Notifications
You must be signed in to change notification settings - Fork 2
feat: find prior experiment run across datasets #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a dataset_prefix CLI option forwarded into the eval runtime and Phoenix dataset creation; renames connector filtering parameter to filtered_connectors; dataset naming now includes the prefix; prior-experiment lookup falls back to datasets sharing the same prefix when none exist on the current dataset. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant CLI as evals/cli.py
participant PR as phoenix_run.main
participant DS as dataset.get_or_create_phoenix_dataset
participant PX as Phoenix API
U->>CLI: run --dataset-prefix=<prefix> [--connectors ...]
CLI->>PR: main(connectors, dataset_prefix=<prefix>)
PR->>DS: get_or_create_phoenix_dataset(filtered_connectors=connectors, dataset_prefix=<prefix>)
DS->>PX: find/create dataset "<prefix>-<hash>" or "filtered-<prefix>-<hash>"
PX-->>DS: Dataset
DS-->>PR: Dataset
PR-->>U: Run evaluations using dataset
sequenceDiagram
autonumber
participant SUM as summary.find_prior_experiment
participant PX as Phoenix API
SUM->>PX: Get current dataset + experiments
alt Experiments with eval runs exist
PX-->>SUM: Experiments
else No experiments found
SUM->>SUM: Derive prefix from dataset name (before last "-")
SUM->>PX: List datasets starting with prefix
PX-->>SUM: Matching datasets
loop For each matching dataset
SUM->>PX: Get experiments (try/except)
PX-->>SUM: Experiments or error
end
SUM->>SUM: Aggregate experiments with eval runs
end
SUM-->>SUM: Return selected prior experiment or None
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (5)
Comment |
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This Branch via MCPTo test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration: {
"mcpServers": {
"connector-builder-mcp-dev": {
"command": "uvx",
"args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/dataset-workaround", "connector-builder-mcp"]
}
}
} Testing This Branch via CLIYou can test this version of the MCP Server using the following CLI snippet: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@pedro/dataset-workaround#egg=airbyte-connector-builder-mcp' --help PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
connector_builder_agents/src/evals/cli.py
(3 hunks)connector_builder_agents/src/evals/dataset.py
(1 hunks)connector_builder_agents/src/evals/phoenix_run.py
(1 hunks)connector_builder_agents/src/evals/summary.py
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
connector_builder_agents/src/evals/phoenix_run.py (2)
connector_builder_agents/src/evals/cli.py (1)
main
(72-116)connector_builder_agents/src/evals/dataset.py (1)
get_or_create_phoenix_dataset
(63-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build PokemonTGG Connector
- GitHub Check: Build Hubspot Connector
- GitHub Check: Build JSONPlaceholder Connector
- GitHub Check: Test Connector Build (PokemonTGG)
- GitHub Check: Test Connector Build (JSONPlaceholder)
- GitHub Check: Test Connector Build (PokemonTGG)
- GitHub Check: Test Connector Build (JSONPlaceholder)
🔇 Additional comments (5)
connector_builder_agents/src/evals/dataset.py (1)
63-65
: LGTM! Clean parameterization of dataset prefix.The function signature change properly introduces
dataset_prefix
as a keyword-only parameter, and the implementation consistently replaces the hardcoded prefix with the dynamic parameter. The docstring is updated appropriately.Also applies to: 70-70, 77-77, 79-79
connector_builder_agents/src/evals/cli.py (1)
8-8
: LGTM! Proper CLI argument integration.The new
--dataset-prefix
argument is well-integrated with appropriate help text, sensible default, and proper propagation to the evaluation runner. The added logging provides visibility into the configuration being used.Also applies to: 39-39, 44-45, 96-101
connector_builder_agents/src/evals/phoenix_run.py (1)
41-41
: LGTM! Parameter properly threaded through.The
dataset_prefix
parameter is correctly added to the function signature and passed to the dataset creation function, maintaining the keyword-only parameter pattern.Also applies to: 48-48
connector_builder_agents/src/evals/summary.py (2)
110-116
: Good error handling for robustness.The try-except blocks around individual dataset and experiment fetches ensure the cross-dataset search continues even when some datasets or experiments are inaccessible. The warning logs provide visibility into failures without breaking the overall flow.
Also applies to: 134-143
124-126
: Clear early return improves readability.The explicit check and early return when no prior experiments are found (after cross-dataset search) makes the control flow easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
connector_builder_agents/src/evals/dataset.py
(4 hunks)connector_builder_agents/src/evals/phoenix_run.py
(1 hunks)connector_builder_agents/src/evals/summary.py
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
connector_builder_agents/src/evals/phoenix_run.py (2)
connector_builder_agents/src/evals/cli.py (1)
main
(72-116)connector_builder_agents/src/evals/dataset.py (1)
get_or_create_phoenix_dataset
(63-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test Connector Build (JSONPlaceholder)
- GitHub Check: Pytest (Fast)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
connector_builder_agents/src/evals/dataset.py (1)
63-65
: Consider adding input validation for dataset_prefix.The
dataset_prefix
parameter is well-designed as keyword-only, but there's no validation to ensure it:
- Doesn't contain problematic characters (e.g., spaces, special chars)
- Doesn't end with a dash (which could create dataset names like
prefix--hash
)- Isn't empty
Invalid prefixes could cause issues with the dataset matching logic in
summary.py
(lines 100-101) that assumes the format{prefix}-{hash}
.Apply this diff to add validation:
def get_or_create_phoenix_dataset( filtered_connectors: list[str] | None = None, *, dataset_prefix: str ) -> Dataset: """Get or create a Phoenix dataset for the evals config. Args: filtered_connectors: Optional list of connector names to filter by. dataset_prefix: Prefix for the dataset name. """ + # Validate dataset_prefix + if not dataset_prefix or not dataset_prefix.strip(): + raise ValueError("dataset_prefix cannot be empty") + if dataset_prefix.endswith("-"): + raise ValueError("dataset_prefix cannot end with a dash") + if not dataset_prefix.replace("-", "").replace("_", "").isalnum(): + raise ValueError("dataset_prefix can only contain alphanumeric characters, dashes, and underscores") + dataframe, dataset_hash = get_dataset_with_hash(filtered_connectors=filtered_connectors)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
connector_builder_agents/src/evals/dataset.py
(3 hunks)connector_builder_agents/src/evals/summary.py
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pytest (Fast)
- GitHub Check: Test Connector Build (PokemonTGG)
- GitHub Check: Test Connector Build (JSONPlaceholder)
🔇 Additional comments (9)
connector_builder_agents/src/evals/summary.py (5)
42-43
: LGTM!The docstring accurately describes the new fallback behavior for cross-dataset search.
93-102
: Prefix matching logic appears sound.The filter
ds.get("name", "").startswith(dataset_prefix + "-")
correctly matches datasets with the extracted prefix followed by a dash, and excludes the current dataset.
108-122
: LGTM! Good defensive programming.The error handling ensures the search continues even if individual datasets cannot be accessed, and the logging provides visibility into failures.
136-145
: LGTM! Improved resilience.Wrapping each
get_experiment
call in a try/except block ensures that failures to fetch individual experiments don't prevent finding other prior experiments.
87-91
: Ignore incorrect prefix parsing concern
The Phoenix dataset name is always constructed as{dataset_prefix}-{hash}
(filtering only alters the hash), socurrent_dataset_name.rsplit("-", 1)[0]
reliably recovers the originaldataset_prefix
.Likely an incorrect or invalid review comment.
connector_builder_agents/src/evals/dataset.py (4)
18-18
: LGTM! More descriptive parameter name.Renaming
connectors
tofiltered_connectors
better communicates the parameter's purpose.
22-22
: LGTM! Consistent parameter usage.All references correctly updated to use
filtered_connectors
throughout the function, including in conditionals, dataframe operations, logging, and error messages.Also applies to: 35-46
72-72
: LGTM! Consistent with parameter rename.The call correctly passes
filtered_connectors
using keyword argument syntax.
74-74
: LGTM! Enables customizable dataset naming.The dataset name construction now uses the provided
dataset_prefix
instead of the hardcoded "builder-connectors", enabling the flexibility described in the PR objectives.
This pull request enhances the evaluation CLI and dataset management by allowing users to specify a custom dataset prefix, improving experiment lookup logic, and making the code more robust when searching for prior experiments. The main focus is on increasing flexibility for dataset naming and ensuring reliable retrieval of previous experiment results, even when datasets change due to test set updates.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation