-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Add bin/index_build.py script to create searchable connector component index #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mponent index - Creates searchable index mapping class names to connectors that use them - Shallow-checkouts airbytehq/airbyte repo to temp directory - Scans all manifest.yaml files in airbyte-integrations/connectors/source-*/ - Extracts ClassName-formatted identifiers using regex patterns - Filters out common false positives (HTTP methods, acronyms, etc.) - Generates JSON index with 2,271+ unique class names from 478+ connectors - Provides summary statistics and usage examples - Enables discovery of connectors using specific features/components Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot wasn't able to review any files in this pull request.
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This Branch via MCPTo test the changes in this specific branch with an MCP client like Claude Desktop, use the following configuration: {
"mcpServers": {
"connector-builder-mcp-dev": {
"command": "uvx",
"args": ["--from", "git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1723158344-index-build-script", "connector-builder-mcp"]
}
}
} Testing This Branch via CLIYou can test this version of the MCP Server using the following CLI snippet: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/connector-builder-mcp.git@devin/1723158344-index-build-script#egg=airbyte-connector-builder-mcp' --help PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
- Rename bin/index_build.py to bin/build_connector_feature_index.py - Update output path from connector_component_index.json to generated/connector-feature-index.json - Add poe task 'build' to invoke the script - Create generated/ directory for output files - Verify script works correctly with new configuration Co-Authored-By: AJ Steers <[email protected]>
Co-Authored-By: AJ Steers <[email protected]>
- Add lowercase character requirement to exclude all-caps matches like 'A13V1IB3VIYZZH' - Switch from JSON to CSV format with 'FeatureUsage' and 'ConnectorName' columns - Sort output by feature name first, then by connector name - Update output filename to connector-feature-index.csv - Reduce unique class names from 2,271 to 2,084 with better filtering Co-Authored-By: AJ Steers <[email protected]>
bin/build_connector_feature_index.py
Outdated
class_names = extract_class_names_from_yaml(yaml_content) | ||
|
||
filtered_class_names = set() | ||
false_positives = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devin, just dynamically exclude any keywords that are ALLCAPS or alllower.
…iltering - Remove hardcoded false_positives set with specific keywords - Use dynamic filtering to exclude ALLCAPS and alllower strings - Keep mixed-case class names like 'DeclarativeSource' - Simplifies maintenance and improves filtering robustness Addresses GitHub comment from @aaronsteers Co-Authored-By: AJ Steers <[email protected]>
…ucture - Move CSV output to connector_builder_mcp/resources/generated/ - Add find_connectors_by_feature() MCP tool for exact feature matching - Tool accepts comma-separated features and returns connectors with ALL features - Update build script to output to new location - Remove old JSON output file - Add csv import for proper CSV file handling Co-Authored-By: AJ Steers <[email protected]>
- Delete generated/connector-feature-index.json (replaced by CSV) - Delete generated/connector-feature-index.csv (moved to connector_builder_mcp/resources/generated/) Co-Authored-By: AJ Steers <[email protected]>
Co-Authored-By: AJ Steers <[email protected]>
pyproject.toml
Outdated
[tool.deptry] | ||
ignore = ["DEP002"] | ||
|
||
[tool.poe.tasks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Poe commands should be in the dedicated tasks file
- Move build task from pyproject.toml to poe_tasks.toml per @aaronsteers feedback - Verified task still works correctly with 'poe build' - Addresses GitHub PR comment about using dedicated tasks file Co-Authored-By: AJ Steers <[email protected]>
feat: Switch connector feature index to CSV output with improved filtering
Summary
This PR updates the connector feature index script with significant changes to output format and filtering logic based on user feedback:
bin/index_build.py
→bin/build_connector_feature_index.py
connector_component_index.json
→generated/connector-feature-index.csv
The script processes 478 manifest.yaml files and now generates 14,010 feature-connector pairs (previously 2,271 unique class names). The CSV format is sorted by feature name first, then connector name, making it easier to scan and maintain.
Review & Testing Checklist for Human
poe build
and verify it completes without errors and generates the CSV filegenerated/connector-feature-index.csv
and spot-check that the data looks correct with proper headers and sortingRecommended test plan: Run the script, examine the CSV output structure, and manually verify 5-10 connectors' extracted class names against their actual manifest.yaml files.
Diagram
Notes