Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Aug 11, 2025

fix(declarative): Support nested $ref patterns in dynamic stream templates

Summary

This PR fixes a schema validation regression introduced in PR #560 (commit f52bc2e) that prevented the Airtable connector from validating against the declarative component schema. The issue occurred when the stream_template field in DynamicDeclarativeStream was changed from a simple $ref to an anyOf structure that only accepted direct references to DeclarativeStream or StateDelegatingStream, breaking Airtable's nested reference pattern {"$ref": "#/definitions/streams/airtable_stream"}.

Key Changes:

  • Added generic $ref object support to stream_template field in DynamicDeclarativeStream
  • Added generic $ref object support to requester field in multiple SimpleRetriever schema definitions
  • Added generic $ref object support to stream field in ParentStreamConfig
  • Allows $ref objects with additional properties (e.g., path, http_method) as used by Airtable

The fix maintains backward compatibility while enabling more flexible reference patterns needed by dynamic stream connectors like Airtable.

Review & Testing Checklist for Human

  • Test Airtable connector validation in Builder UI - Verify that Airtable can now be forked in the Connector Builder without the "Validation against json schema" error
  • Run schema validation tests across other connectors - Ensure the broader $ref pattern support doesn't break existing connectors that use direct references
  • Verify the pattern matching is restrictive enough - Check that the ^#/definitions/ pattern prevents invalid external references while allowing valid nested references
  • Test dynamic stream generation - Confirm that Airtable's dynamic stream discovery actually works end-to-end, not just schema validation

Recommended Test Plan:

  1. Load Airtable connector in Builder UI and verify "Generate Streams" works without validation errors
  2. Run a subset of existing connector tests to ensure no regressions
  3. Test creating a new connector with both direct and nested $ref patterns

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    Airtable["source-airtable/manifest.yaml"]:::context
    Schema["declarative_component_schema.yaml"]:::major-edit
    Builder["Connector Builder UI"]:::context
    
    Airtable --> |"validates against"| Schema
    Schema --> |"enables validation in"| Builder
    
    subgraph "Schema Changes"
        StreamTemplate["stream_template field<br/>(DynamicDeclarativeStream)"]:::major-edit
        RequesterField["requester field<br/>(SimpleRetriever)"]:::major-edit
        StreamField["stream field<br/>(ParentStreamConfig)"]:::major-edit
    end
    
    Schema --> StreamTemplate
    Schema --> RequesterField  
    Schema --> StreamField
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L3["Context/No Edit"]:::context
    end

classDef major-edit fill:#90EE90
classDef context fill:#FFFFFF
Loading

Notes

  • Root Cause: PR feat(dynamic-streams): Add StateDelegatingStream to DynamicDeclarativeStream #560 made stream template validation more restrictive, breaking Airtable's nested $ref pattern #/definitions/streams/airtable_stream
  • Validation Scope: The CDK's generic error masking made diagnosis difficult - the actual validation errors were much more specific than the reported "schema failed" message
  • Testing: Validated against Airtable manifest and synthetic test cases, but broader connector testing recommended
  • Risk: Schema changes affect all connectors, though this change is additive (allows more patterns) rather than restrictive

Session Details:

Summary by CodeRabbit

  • New Features
    • Expanded the declarative schema to accept inline reference objects alongside direct $ref usage across several fields, allowing both a direct "$ref" and an inline object containing "$ref". This broadens configuration flexibility and maintains backward compatibility. No runtime behavior changes.

- Add generic  object support to stream_template field in DynamicDeclarativeStream
- Add generic  object support to requester field in SimpleRetriever schemas
- Add generic  object support to stream field in ParentStreamConfig
- Fixes validation regression introduced in PR #560 (commit f52bc2e)
- Enables Airtable connector validation while maintaining backward compatibility
- Allows  with additional properties (e.g. path, http_method)

Resolves validation issue: 'Validation against json schema defined in declarative_component_schema.yaml schema failed'

Co-Authored-By: AJ Steers <[email protected]>
@Copilot Copilot AI review requested due to automatic review settings August 11, 2025 19:41
Copy link
Contributor

Original prompt from AJ Steers
SYSTEM:
=== BEGIN THREAD HISTORY (in #dev-extensibility) ===
AJ Steers (U05AKF1BCC9): The Airtable Source fails to validate against the declared JSON schema. This prevents me from forking it in the Builder. I also don't see any error messages in the Builder that might help me diagnose or resolve the issue, only `Error handling request: Validation against json schema defined in declarative_component_schema.yaml schema failed`.

AJ Steers (U05AKF1BCC9): Maybe the failure to show detailed validation messages is specific to this occurring during dynamic stream generation?

ATTACHMENT:"https://app.devin.ai/attachments/4e2caa3e-75dc-4b32-872c-5997d57e065c/image.png"

AJ Steers (U05AKF1BCC9): Airtable is unfortunately the only example of the `DynamicDeclarativeStream` use case which doesn't also need custom components.py.

AJ Steers (U05AKF1BCC9): <@U02T7NVJ6A3> - Is it expected we might not get specific validation errors if the validation error occurs during stream discovery of dynamic streams?

Lake Mossman (U02T7NVJ6A3): That error UI just shows whatever error the CDK surfaces when handling the `full_resolve` request that the `Generate Streams` button triggers, so it seems like the CDK is not returning a more detailed error message in this case

AJ Steers (U05AKF1BCC9): I see. Are there any cases where the Builder (today) would run a local JSON schema validation? Or is it always delegated to the CDK?

AJ Steers (U05AKF1BCC9): For the MCP server, I ended up adding a JSON schema validation step because the CDK wasn't giving the LLM caller that data when it failed. 

For this airtable issue, I've tried validating with third party validation tools and I can't understand the failure. Says it is missing "type" In a place where it has a $ref pointing to an instance that does have the `type` key.

AJ Steers (U05AKF1BCC9): Have we seen issues where $refs weren't followed, or else where they created json schema validation issues? This is specifically falling in the stream templat... (674 chars truncated...)

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added the bug Something isn't working label Aug 11, 2025
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1723407187-fix-dynamic-stream-template-validation#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1723407187-fix-dynamic-stream-template-validation

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a schema validation regression that prevented the Airtable connector from validating against the declarative component schema. The issue was caused by overly restrictive validation that only accepted direct $ref patterns, breaking Airtable's nested reference pattern.

  • Added generic $ref object support to multiple schema fields to allow nested reference patterns
  • Maintained backward compatibility while enabling more flexible reference patterns
  • Fixed validation errors that prevented Airtable connector from working in the Connector Builder UI

type: string
pattern: "^#/definitions/"
required: ["$ref"]
additionalProperties: true
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting additionalProperties: true allows any additional properties beyond $ref, which could lead to unexpected behavior or security issues. Consider restricting this to only allow known safe properties or set to false if additional properties aren't necessary.

Suggested change
additionalProperties: true
additionalProperties: false

Copilot uses AI. Check for mistakes.

properties:
$ref:
type: string
pattern: "^#/definitions/"
Copy link

Copilot AI Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern ^#/definitions/ only validates the prefix but doesn't prevent potentially malicious or malformed references like #/definitions/../../../sensitive/path. Consider using a more restrictive pattern that validates the complete reference structure.

Suggested change
pattern: "^#/definitions/"
pattern: "^#/definitions/[A-Za-z0-9_-]+$"

Copilot uses AI. Check for mistakes.

Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

📝 Walkthrough

Walkthrough

Expanded the declarative component schema to accept an inline object wrapper containing a $ref to definitions in six anyOf locations, alongside existing direct $ref usage. No runtime or control-flow changes; only schema validation broadened to allow both reference shapes.

Changes

Cohort / File(s) Summary of changes
Schema: inline $ref wrapper support
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
In six anyOf blocks, added support for an inline object with a required "$ref" (pattern ^#/definitions/), allowing additionalProperties, alongside existing direct "$ref" references. No functional logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

bug, declarative-component-schema, airbyte-python-cdk

Suggested reviewers

  • bnchrch
  • lmossman
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1723407187-fix-dynamic-stream-template-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (6)

1511-1517: Broadened retriever ref shape looks good; consider de-duplicating the wrapper schema.

Allowing an inline object with a local "$ref" plus extra fields is spot on for nested and embellished refs, and the pattern anchor restricts external refs. Would you consider extracting this wrapper into a shared definition (e.g., LocalManifestRefObject) to avoid repeating the same schema in six places and prevent future drift, wdyt?

Example:

+  LocalManifestRefObject:
+    description: Inline wrapper for a local manifest reference under the top-level 'definitions' of the manifest.
+    type: object
+    properties:
+      $ref:
+        type: string
+        pattern: "^#/definitions/"
+    required: ["$ref"]
+    additionalProperties: true

Then here:

-          - type: object
-            properties:
-              $ref:
-                type: string
-                pattern: "^#/definitions/"
-            required: ["$ref"]
-            additionalProperties: true
+          - "$ref": "#/definitions/LocalManifestRefObject"

2424-2430: LGTM; same inline $ref wrapper for DynamicSchemaLoader.retriever.

This mirrors the earlier change and maintains consistency. Would you also swap this repeated block for the shared LocalManifestRefObject if you adopt the de-dup refactor above, wdyt?


3231-3237: LGTM; ParentStreamConfig.stream now supports the wrapper.

This unblocks nested parent stream refs while keeping local-only restriction. If you adopt the shared definition, would you replace this block as well, wdyt?


3678-3684: LGTM; SimpleRetriever.requester now supports wrapper refs.

Nice for reuse across retriever shapes. Same de-duplication suggestion applies here if you add a LocalManifestRefObject, wdyt?


4233-4239: LGTM; HttpComponentsResolver.retriever updated consistently.

Good consistency across components. Would you also add a brief description to the wrapper definition (or to the shared one) clarifying it points to the manifest’s top-level 'definitions' (not JSON Schema defs), wdyt?


4374-4380: Fix for DynamicDeclarativeStream.stream_template looks correct; please add tests for regression coverage.

This addresses the regression for nested stream templates while preventing external refs; could you add/extend tests to cover:

  • passing cases: direct "$ref" and wrapper form,
  • wrapper with extra fields (e.g., path/http_method) accepted,
  • rejection of external refs (http://, file://),
  • optionally a negative case where the wrapper points to a non-stream definition to document expected behavior?

Happy to draft test cases if helpful, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c9049a and 8abeba1.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)

Copy link

PyTest Results (Fast)

3 696 tests  ±0   3 685 ✅ ±0   6m 28s ⏱️ -1s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 8abeba1. ± Comparison against base commit 1c9049a.

Copy link

PyTest Results (Full)

3 699 tests  ±0   3 688 ✅ ±0   11m 48s ⏱️ +14s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 8abeba1. ± Comparison against base commit 1c9049a.

@aaronsteers aaronsteers changed the title fix(declarative): Support nested $ref patterns in dynamic stream templates fix(declarative): Support nested $ref patterns in dynamic stream templates (do not merge) Aug 12, 2025
@aaronsteers aaronsteers marked this pull request as draft August 12, 2025 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant