-
Notifications
You must be signed in to change notification settings - Fork 67
fix: resolve issue where some destinations will fail due to null/missing stream namespace
keys in catalog
#748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
aaronsteers
wants to merge
25
commits into
main
Choose a base branch
from
devin/1755114459-fix-pyairbyte-null-namespace
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 23 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
4e24489
spike: Replace null namespaces with empty strings
devin-ai-integration[bot] 054093a
fix: Shorten comment to resolve Ruff lint error (line length)
devin-ai-integration[bot] 0c2f196
Update airbyte/_message_iterators.py
aaronsteers 5c05e1d
Update airbyte/_message_iterators.py
aaronsteers 761554e
fix: Replace null namespaces with empty strings in catalog validation
devin-ai-integration[bot] 3a2cb45
fix: Also handle null source_defined_cursor and is_resumable fields
devin-ai-integration[bot] d098050
fix: Clean up null namespaces in record messages from external sources
devin-ai-integration[bot] 201187a
fix: Clean up null namespaces in record messages from external source…
devin-ai-integration[bot] 808ec52
fix: Clean up null namespaces in record messages at source execution …
devin-ai-integration[bot] 9aad1fb
fix: Comment out performance-problematic namespace checks in high-vol…
devin-ai-integration[bot] 85a3b4e
fix: Ensure catalog cleaning changes are included
devin-ai-integration[bot] 4c5a6d2
fix: Use Pydantic exclude_none=True to omit null fields from catalog …
devin-ai-integration[bot] 50b07ad
fix: Add exclude_none=True to remaining model_dump_json() calls
devin-ai-integration[bot] 89ff68a
fix: Complete exclude_none=True implementation for all model_dump_jso…
devin-ai-integration[bot] 9c4f594
fix: Clean up null namespaces in trace stream_status messages
devin-ai-integration[bot] 82e7730
Apply suggestion from @aaronsteers
aaronsteers 405b947
Apply suggestion from @aaronsteers
aaronsteers 35d9d60
Apply suggestion from @aaronsteers
aaronsteers d0a6f24
Apply suggestion from @aaronsteers
aaronsteers a8205c4
Apply suggestion from @aaronsteers
aaronsteers c8d1d41
feat: Switch benchmark default from source-e2e-test to source-faker
devin-ai-integration[bot] c3dce14
fix: Simplify source-faker config to use only count and seed parameters
devin-ai-integration[bot] 88756af
feat: Test catalog-only exclude_none theory, update get_benchmark_sou…
devin-ai-integration[bot] 57c9863
fix: Address lint and type checking issues
devin-ai-integration[bot] dbb96d0
security: Replace insecure tempfile.mktemp() with secure tempfile.mks…
devin-ai-integration[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
#!/usr/bin/env python3 | ||
"""Helper script for manually testing source connectors with small record counts.""" | ||
|
||
import json | ||
import subprocess | ||
import tempfile | ||
from pathlib import Path | ||
|
||
CONNECTOR_IMAGE = "airbyte/source-e2e-test:dev" | ||
CONFIG = { | ||
"type": "INFINITE_FEED", | ||
"max_records": 5, | ||
"seed": 0, | ||
"message_interval": 1000 | ||
} | ||
CATALOG = { | ||
"streams": [{ | ||
"stream": { | ||
"name": "data", | ||
"json_schema": {}, | ||
"supported_sync_modes": ["full_refresh"] | ||
}, | ||
"sync_mode": "full_refresh", | ||
"destination_sync_mode": "overwrite" | ||
}] | ||
} | ||
OUTPUT_FILE = "/tmp/manual_source_output.jsonl" | ||
|
||
def run_connector(): | ||
"""Run the source connector and capture its output.""" | ||
config_file = Path(tempfile.mktemp(suffix=".json")) | ||
|
||
catalog_file = Path(tempfile.mktemp(suffix=".json")) | ||
|
||
|
||
config_file.write_text(json.dumps(CONFIG, indent=2)) | ||
catalog_file.write_text(json.dumps(CATALOG, indent=2)) | ||
|
||
cmd = [ | ||
"docker", "run", "--rm", | ||
"-v", f"{config_file}:/tmp/config.json", | ||
"-v", f"{catalog_file}:/tmp/catalog.json", | ||
CONNECTOR_IMAGE, | ||
"read", | ||
"--config", "/tmp/config.json", | ||
"--catalog", "/tmp/catalog.json" | ||
] | ||
|
||
with open(OUTPUT_FILE, "w") as f: | ||
subprocess.run(cmd, stdout=f, stderr=subprocess.PIPE, text=True, timeout=60) | ||
|
||
config_file.unlink(missing_ok=True) | ||
catalog_file.unlink(missing_ok=True) | ||
|
||
lines = Path(OUTPUT_FILE).read_text().strip().split('\n') | ||
print(f"Generated {len(lines)} lines in {OUTPUT_FILE}") | ||
for i, line in enumerate(lines[:3]): | ||
print(f" {i+1}: {line}") | ||
|
||
if __name__ == "__main__": | ||
print("Manual Source Connector Test") | ||
run_connector() | ||
print("Complete!") |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Switching to exclude_none=True for catalog JSON avoids strict validator null errors — please confirm downstream tolerance and add a rationale comment.
This aligns with the PR’s intent to stop surfacing explicit nulls in ConfiguredAirbyteStream fields (e.g., namespace, source_defined_cursor, is_resumable). The change reduces spurious OpenAPI4J validation failures. The one behavioral change to watch is that optional fields previously present with null will now be omitted entirely; if any destination implementations (custom or legacy) were relying on the presence of keys even when null, they might need an adjustment. Would you mind confirming there are no destinations that require those keys to be present when the values are None, and adding a brief comment here to document the rationale, wdyt?
Suggested inline comment for future readers:
To help double-check consistency across the repo, you could run this quick search to ensure all configured catalog dumps now use exclude_none=True (and spot any stragglers):
🏁 Script executed:
Length of output: 2474
Add a short rationale comment here and confirm downstream tolerance for omitted None fields — wdyt?
Quick check: rg shows this repo already uses model_dump_json(exclude_none=True) in multiple places and I found no occurrences of a hardcoded
"namespace": null
. It looks consistent, but could we add a short inline rationale and confirm no destinations rely on keys being present with nulls?Files to update / reference:
Suggested inline diff to add at the destinations file:
Would you mind adding that comment and confirming there are no destination implementations that expect those keys to exist with null values? wdyt?
📝 Committable suggestion
🤖 Prompt for AI Agents