-
Notifications
You must be signed in to change notification settings - Fork 53
remove sftp indexer run override #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
please review this was the reason we made the change The SFTP indexer was overriding run() to prefix file identifiers with sftp://host:port/, creating identifiers like sftp://3.22.168.92:22/uuid. This format contains forward slashes that break URL routing when the identifier is used as a path parameter in the ETL Job API. Since the identifier is only used for logging (file retrieval uses original_file_path from metadata), removing this override makes SFTP consistent with all other fsspec connectors that use UUID-only identifiers. |
|
Claude finished @potter-potter's task —— View job Code Review for SFTP Indexer Override RemovalTodo List
Review SummaryThis PR removes the AnalysisWhat was removed:def run(self, **kwargs: Any) -> Generator[FileData, None, None]:
for file in super().run(**kwargs):
new_identifier = (
f"sftp://"
f"{self.connection_config.host}:"
f"{self.connection_config.port}/"
f"{file.identifier}"
)
file.identifier = new_identifier
yield fileImpact Assessment:✅ Positive impacts:
Code Quality ReviewImplementation: ✅ Clean and correct
Base FsspecIndexer behavior: ✅ Reliable fallback
Recommendations
ConclusionThis is a well-reasoned fix that prioritizes API compatibility over connector-specific identifier formatting. The removal is clean, properly versioned, and brings SFTP in line with other connectors. The trade-off of breaking existing identifier dependencies is justified by fixing URL routing issues in the ETL Job API. Recommendation: Approve ✅ |
Note
Low Risk
Small, localized change to SFTP indexing output (file identifiers) plus a version bump; risk is mainly backward-compatibility for any downstream consumers relying on the previous identifier format.
Overview
Removes the SFTP indexer’s custom
run()override that rewrote eachFileData.identifierinto a fully qualifiedsftp://host:port/...URL, reverting to the baseFsspecIndexerbehavior.Bumps version to
1.2.38and records the change inCHANGELOG.md.Written by Cursor Bugbot for commit f284d5f. This will update automatically on new commits. Configure here.