Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Oct 10, 2025

fix: Add validation for manifest streams structure to prevent AttributeError

Summary

This PR fixes a bug where run_connector_readiness_test_report would crash with AttributeError: 'str' object has no attribute 'get' when processing manifests with malformed streams fields.

Root cause: The code assumed all entries in manifest_dict["streams"] are dictionaries with a name field, but AI-generated manifests during eval runs sometimes contained a list of strings instead of stream definition objects.

The fix: Added defensive validation that checks all stream entries are dicts before attempting to call .get("name") on them. If invalid entries are found, raises a clear ValueError with an actionable error message explaining what's wrong.

Key distinction preserved:

  • ✅ Input arg coercion (existing): The streams parameter can still be passed as a comma-separated string and gets coerced to a list
  • ✅ Manifest structure validation (new): The manifest's streams field is now validated to ensure it contains properly formatted stream definition objects

Review & Testing Checklist for Human

  • Verify error message clarity: Review the error message in the code to ensure it's actionable for users/agents encountering malformed manifests
  • Test with actual AI-generated manifests: If possible, test with the actual manifest from the CI logs that triggered this issue (eval-source-jsonplaceholder-1760125246)
  • Consider validation placement: This validates right before use, but should manifest structure validation happen earlier in the pipeline (e.g., in validate_manifest function)?
  • Check agent workflows: Verify that existing agent evaluation workflows still work correctly with this stricter validation

Recommended Test Plan

  1. Run the new test: pytest tests/integration/test_validation_and_testing.py::test_malformed_manifest_streams_validation -v
  2. If available, test with the actual malformed manifest from the CI logs
  3. Verify existing agent eval runs still work correctly

Notes

Summary by CodeRabbit

  • Bug Fixes

    • Added early validation for manifest streams to ensure each stream is an object with a required name and configuration.
    • Emits a clear, descriptive error when streams are malformed to prevent downstream failures and improve troubleshooting.
  • Tests

    • Added an integration test that confirms malformed stream lists produce the expected validation error, ensuring robust handling of invalid manifests.

Important

Auto-merge enabled.

This PR is set to merge automatically when all requirements are met.

…teError

- Add defensive validation in run_connector_readiness_test_report to check that all streams in the manifest are properly formatted dicts
- Raise clear ValueError with actionable message when streams field contains invalid entries (e.g., strings instead of stream definition objects)
- Add test case to verify the validation provides clear error messages
- Fixes issue where AI-generated manifests with malformed streams field caused AttributeError: 'str' object has no attribute 'get'

This distinguishes between:
- Input arg type coercion (streams param as string) - continues to work as before
- Manifest structure validation - now validates and errors with clear message

Related CI run: https://github.com/airbytehq/connector-builder-mcp/actions/runs/18416705139/job/52481958952?pr=129

Co-Authored-By: AJ Steers <[email protected]>
Copy link
Contributor

Original prompt from AJ Steers
@Devin -  can you repro the reported issue in the LLM build log? `'str' has no method 'get'`?

<https://github.com/airbytehq/connector-builder-mcp/actions/runs/18416705139/job/52481958952?pr=129>

In PR 129, and (if yes) also check on main?
Thread URL: https://airbytehq-team.slack.com/archives/D089P0UPVT4/p1760127794224299?thread_ts=1760127794.224299

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added bug Something isn't working security labels Oct 10, 2025
Copy link

coderabbitai bot commented Oct 10, 2025

📝 Walkthrough

Walkthrough

Adds runtime validation to ensure manifest streams entries are dictionaries with required fields and raises a ValueError for malformed entries. Adds an integration test asserting the error for a manifest whose streams is a list of strings.

Changes

Cohort / File(s) Summary of changes
Manifest streams validation
connector_builder_mcp/validation_testing.py
Adds early structure checks for manifest streams; if streams is not a string and contains non-dict entries, raises ValueError describing invalid entries and required structure.
Integration tests
tests/integration/test_validation_and_testing.py
Adds test_malformed_manifest_streams_validation which constructs a manifest with streams as a list of strings and asserts run_connector_readiness_test_report raises a ValueError with a message matching the invalid manifest structure pattern.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Test as Test Runner
  participant Validator as validation_testing.run_connector_readiness_test_report
  participant Manifest as manifest.streams

  Test->>Validator: call with manifest
  Validator->>Manifest: inspect streams field
  alt streams is non-string and contains non-dict entries
    Validator-->>Test: raise ValueError("Invalid manifest structure: streams ... must be list of stream definition objects")
  else valid streams structure
    Validator->>Validator: enumerate and validate each stream dict ('name' present, config)
    Validator-->>Test: proceed to readiness tests
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

bug

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary change by stating that validation for the manifest streams structure has been added to prevent an AttributeError, directly reflecting the PR’s intent to introduce defensive checks for malformed stream entries. It uses the standard “fix:” prefix to denote a bug fix and avoids extraneous detail, making it clear and specific. This phrasing gives reviewers immediate insight into both what was changed and why.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1728595175-fix-streams-validation

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06d5823 and 34e5ddb.

📒 Files selected for processing (1)
  • connector_builder_mcp/validation_testing.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • connector_builder_mcp/validation_testing.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test Connector Build (JSONPlaceholder)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Test Connector Build (PokemonTGG)
  • GitHub Check: Run Evals (Single Connector)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This Branch via MCP

To test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration:

{
  "mcpServers": {
    "connector-builder-mcp-dev": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1728595175-fix-streams-validation", "connector-builder-mcp"]
    }
  }
}

Testing This Branch via CLI

You can test this version of the MCP Server using the following CLI snippet:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1728595175-fix-streams-validation#egg=airbyte-connector-builder-mcp' --help

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poe <command> - Runs any poe command in the uv virtual environment
  • /poe build-connector prompt="Star Wars API" - Run the connector builder using the Star Wars API.

📝 Edit this welcome message.

Copy link

github-actions bot commented Oct 10, 2025

PyTest Results (Fast)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit 34e5ddb. ± Comparison against base commit af2f6c7.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Oct 10, 2025

PyTest Results (Full)

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit 34e5ddb. ± Comparison against base commit af2f6c7.

♻️ This comment has been updated with latest results.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
connector_builder_mcp/validation_testing.py (1)

569-578: LGTM! Validation correctly prevents AttributeError.

The defensive validation properly catches malformed manifests where streams contains non-dict entries before attempting to call .get("name") at line 580. The error message is clear and actionable, providing both the count and a sample of invalid entries.

Edge cases are well-handled:

  • Empty available_streams list is guarded by the if available_streams: check
  • Missing name field in valid dicts is handled by the fallback at line 580

Consider extracting this validation logic to a helper function in _util.py for better code organization and potential reuse:

def validate_stream_entries(streams: list[Any]) -> None:
    """Validate that all stream entries are dictionaries.
    
    Raises:
        ValueError: If any stream entries are not dictionaries
    """
    invalid_streams = [s for s in streams if not isinstance(s, dict)]
    if invalid_streams:
        raise ValueError(
            f"Invalid manifest structure: 'streams' must be a list of stream definition objects (dicts), "
            f"but found {len(invalid_streams)} invalid entry(ies). "
            f"Each stream should be a dict with at least a 'name' field and stream configuration. "
            f"Invalid entries: {invalid_streams[:3]}"
        )

Then use it at line 569:

if available_streams:
    validate_stream_entries(available_streams)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af2f6c7 and 06d5823.

📒 Files selected for processing (2)
  • connector_builder_mcp/validation_testing.py (1 hunks)
  • tests/integration/test_validation_and_testing.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/integration/test_validation_and_testing.py (1)
connector_builder_mcp/validation_testing.py (1)
  • run_connector_readiness_test_report (500-738)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Run Evals (Single Connector)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
tests/integration/test_validation_and_testing.py (1)

177-201: LGTM! Test coverage validates the fix.

The test correctly verifies that malformed manifests with streams as a list of strings (instead of stream definition objects) raise a ValueError with the expected error message pattern. The test case directly addresses the root cause described in the PR.

Test structure is appropriate:

  • Minimal but valid manifest structure except for the malformed streams field
  • Uses pytest.raises with regex pattern matching to verify the error message
  • Calls run_connector_readiness_test_report without the streams parameter, ensuring the validation code path is exercised

@aaronsteers aaronsteers enabled auto-merge (squash) October 10, 2025 21:02
@aaronsteers aaronsteers merged commit 374cb49 into main Oct 10, 2025
19 checks passed
@aaronsteers aaronsteers deleted the devin/1728595175-fix-streams-validation branch October 10, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant