fix: Add validation for manifest streams structure to prevent AttributeError #130

aaronsteers · 2025-10-10T20:44:31Z

fix: Add validation for manifest streams structure to prevent AttributeError

Summary

This PR fixes a bug where run_connector_readiness_test_report would crash with AttributeError: 'str' object has no attribute 'get' when processing manifests with malformed streams fields.

Root cause: The code assumed all entries in manifest_dict["streams"] are dictionaries with a name field, but AI-generated manifests during eval runs sometimes contained a list of strings instead of stream definition objects.

The fix: Added defensive validation that checks all stream entries are dicts before attempting to call .get("name") on them. If invalid entries are found, raises a clear ValueError with an actionable error message explaining what's wrong.

Key distinction preserved:

✅ Input arg coercion (existing): The streams parameter can still be passed as a comma-separated string and gets coerced to a list
✅ Manifest structure validation (new): The manifest's streams field is now validated to ensure it contains properly formatted stream definition objects

Review & Testing Checklist for Human

Verify error message clarity: Review the error message in the code to ensure it's actionable for users/agents encountering malformed manifests
Test with actual AI-generated manifests: If possible, test with the actual manifest from the CI logs that triggered this issue (eval-source-jsonplaceholder-1760125246)
Consider validation placement: This validates right before use, but should manifest structure validation happen earlier in the pipeline (e.g., in validate_manifest function)?
Check agent workflows: Verify that existing agent evaluation workflows still work correctly with this stricter validation

Recommended Test Plan

Run the new test: pytest tests/integration/test_validation_and_testing.py::test_malformed_manifest_streams_validation -v
If available, test with the actual malformed manifest from the CI logs
Verify existing agent eval runs still work correctly

Notes

Issue discovered in: CI eval run at https://github.com/airbytehq/connector-builder-mcp/actions/runs/18416705139/job/52481958952?pr=129
Impact: This bug exists in both PR feat: improved agent coordination and removal of '3 phases' guidance #129 and main branch (not a regression)
Test coverage: Added integration test with synthetic malformed manifest
Session: https://app.devin.ai/sessions/cd5da80fd9944143bf7c351ab8b2edf9
Requested by: @aaronsteers

Summary by CodeRabbit

Bug Fixes
- Added early validation for manifest streams to ensure each stream is an object with a required name and configuration.
- Emits a clear, descriptive error when streams are malformed to prevent downstream failures and improve troubleshooting.
Tests
- Added an integration test that confirms malformed stream lists produce the expected validation error, ensuring robust handling of invalid manifests.

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

…teError - Add defensive validation in run_connector_readiness_test_report to check that all streams in the manifest are properly formatted dicts - Raise clear ValueError with actionable message when streams field contains invalid entries (e.g., strings instead of stream definition objects) - Add test case to verify the validation provides clear error messages - Fixes issue where AI-generated manifests with malformed streams field caused AttributeError: 'str' object has no attribute 'get' This distinguishes between: - Input arg type coercion (streams param as string) - continues to work as before - Manifest structure validation - now validates and errors with clear message Related CI run: https://github.com/airbytehq/connector-builder-mcp/actions/runs/18416705139/job/52481958952?pr=129 Co-Authored-By: AJ Steers <[email protected]>

devin-ai-integration · 2025-10-10T20:44:35Z

Original prompt from AJ Steers

@Devin -  can you repro the reported issue in the LLM build log? `'str' has no method 'get'`?

<https://github.com/airbytehq/connector-builder-mcp/actions/runs/18416705139/job/52481958952?pr=129>

In PR 129, and (if yes) also check on main?
Thread URL: https://airbytehq-team.slack.com/archives/D089P0UPVT4/p1760127794224299?thread_ts=1760127794.224299

devin-ai-integration · 2025-10-10T20:44:36Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

coderabbitai · 2025-10-10T20:44:40Z

📝 Walkthrough

Walkthrough

Adds runtime validation to ensure manifest streams entries are dictionaries with required fields and raises a ValueError for malformed entries. Adds an integration test asserting the error for a manifest whose streams is a list of strings.

Changes

Cohort / File(s)	Summary of changes
Manifest streams validation `connector_builder_mcp/validation_testing.py`	Adds early structure checks for manifest `streams`; if `streams` is not a string and contains non-dict entries, raises `ValueError` describing invalid entries and required structure.
Integration tests `tests/integration/test_validation_and_testing.py`	Adds `test_malformed_manifest_streams_validation` which constructs a manifest with `streams` as a list of strings and asserts `run_connector_readiness_test_report` raises a `ValueError` with a message matching the invalid manifest structure pattern.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Test as Test Runner
  participant Validator as validation_testing.run_connector_readiness_test_report
  participant Manifest as manifest.streams

  Test->>Validator: call with manifest
  Validator->>Manifest: inspect streams field
  alt streams is non-string and contains non-dict entries
    Validator-->>Test: raise ValueError("Invalid manifest structure: streams ... must be list of stream definition objects")
  else valid streams structure
    Validator->>Validator: enumerate and validate each stream dict ('name' present, config)
    Validator-->>Test: proceed to readiness tests
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

bug

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the primary change by stating that validation for the manifest streams structure has been added to prevent an AttributeError, directly reflecting the PR’s intent to introduce defensive checks for malformed stream entries. It uses the standard “fix:” prefix to denote a bug fix and avoids extraneous detail, making it clear and specific. This phrasing gives reviewers immediate insight into both what was changed and why.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1728595175-fix-streams-validation

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06d5823 and 34e5ddb.

📒 Files selected for processing (1)

connector_builder_mcp/validation_testing.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

connector_builder_mcp/validation_testing.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test Connector Build (JSONPlaceholder)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Test Connector Build (PokemonTGG)
GitHub Check: Run Evals (Single Connector)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-10-10T20:44:41Z

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1728595175-fix-streams-validation", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1728595175-fix-streams-validation#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

/autofix - Fixes most formatting and linting issues
/poe <command> - Runs any poe command in the uv virtual environment
/poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

github-actions · 2025-10-10T20:46:27Z

PyTest Results (Fast)

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
0 files ±0 0 ❌ ±0

Results for commit 34e5ddb. ± Comparison against base commit af2f6c7.

♻️ This comment has been updated with latest results.

github-actions · 2025-10-10T20:46:43Z

PyTest Results (Full)

0 tests ±0 0 ✅ ±0 0s ⏱️ ±0s
0 suites ±0 0 💤 ±0
0 files ±0 0 ❌ ±0

Results for commit 34e5ddb. ± Comparison against base commit af2f6c7.

♻️ This comment has been updated with latest results.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

connector_builder_mcp/validation_testing.py (1)
569-578: LGTM! Validation correctly prevents AttributeError.

The defensive validation properly catches malformed manifests where streams contains non-dict entries before attempting to call .get("name") at line 580. The error message is clear and actionable, providing both the count and a sample of invalid entries.

Edge cases are well-handled:

Empty available_streams list is guarded by the if available_streams: check

Missing name field in valid dicts is handled by the fallback at line 580

Consider extracting this validation logic to a helper function in _util.py for better code organization and potential reuse:
def validate_stream_entries(streams: list[Any]) -> None:
    """Validate that all stream entries are dictionaries.
    
    Raises:
        ValueError: If any stream entries are not dictionaries
    """
    invalid_streams = [s for s in streams if not isinstance(s, dict)]
    if invalid_streams:
        raise ValueError(
            f"Invalid manifest structure: 'streams' must be a list of stream definition objects (dicts), "
            f"but found {len(invalid_streams)} invalid entry(ies). "
            f"Each stream should be a dict with at least a 'name' field and stream configuration. "
            f"Invalid entries: {invalid_streams[:3]}"
        )
Then use it at line 569:
if available_streams:
    validate_stream_entries(available_streams)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af2f6c7 and 06d5823.

📒 Files selected for processing (2)

connector_builder_mcp/validation_testing.py (1 hunks)
tests/integration/test_validation_and_testing.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/integration/test_validation_and_testing.py (1)

connector_builder_mcp/validation_testing.py (1)

run_connector_readiness_test_report (500-738)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Run Evals (Single Connector)
GitHub Check: Pytest (Fast)

🔇 Additional comments (1)

tests/integration/test_validation_and_testing.py (1)

177-201: LGTM! Test coverage validates the fix.

The test correctly verifies that malformed manifests with streams as a list of strings (instead of stream definition objects) raise a ValueError with the expected error message pattern. The test case directly addresses the root cause described in the PR.

Test structure is appropriate:

Minimal but valid manifest structure except for the malformed streams field

Uses pytest.raises with regex pattern matching to verify the error message

Calls run_connector_readiness_test_report without the streams parameter, ensuring the validation code path is exercised

connector_builder_mcp/validation_testing.py

devin-ai-integration bot assigned aaronsteers Oct 10, 2025

github-actions bot added bug Something isn't working security labels Oct 10, 2025

coderabbitai bot reviewed Oct 10, 2025

View reviewed changes

aaronsteers commented Oct 10, 2025

View reviewed changes

connector_builder_mcp/validation_testing.py Outdated Show resolved Hide resolved

Apply suggestion from @aaronsteers

34e5ddb

aaronsteers enabled auto-merge (squash) October 10, 2025 21:02

aaronsteers merged commit 374cb49 into main Oct 10, 2025
19 checks passed

aaronsteers deleted the devin/1728595175-fix-streams-validation branch October 10, 2025 21:04

coderabbitai bot mentioned this pull request Oct 11, 2025

feat: add primary key and record count validation to readiness report and evals #136

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add validation for manifest streams structure to prevent AttributeError #130

fix: Add validation for manifest streams structure to prevent AttributeError #130

Uh oh!

aaronsteers commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

devin-ai-integration bot commented Oct 10, 2025

Uh oh!

devin-ai-integration bot commented Oct 10, 2025

Uh oh!

coderabbitai bot commented Oct 10, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Uh oh!

github-actions bot commented Oct 10, 2025

Uh oh!

github-actions bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Add validation for manifest streams structure to prevent AttributeError #130

fix: Add validation for manifest streams structure to prevent AttributeError #130

Uh oh!

Conversation

aaronsteers commented Oct 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fix: Add validation for manifest streams structure to prevent AttributeError

Summary

Review & Testing Checklist for Human

Recommended Test Plan

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Oct 10, 2025

Uh oh!

devin-ai-integration bot commented Oct 10, 2025

🤖 Devin AI Engineer

Uh oh!

coderabbitai bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Oct 10, 2025

👋 Greetings, Airbyte Team Member!

Testing This Branch via MCP

Testing This Branch via CLI

PR Slash Commands

Uh oh!

github-actions bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Fast)

Uh oh!

github-actions bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PyTest Results (Full)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aaronsteers commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2025 •

edited

Loading

github-actions bot commented Oct 10, 2025 •

edited

Loading

github-actions bot commented Oct 10, 2025 •

edited

Loading