-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Github connector: Add an include_code
option and index code
#4284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github connector: Add an include_code
option and index code
#4284
Conversation
@patrickfweston is attempting to deploy a commit to the Danswer Team on Vercel. A member of the Team first needs to authorize it. |
include_code
option and index codeinclude_code
option and index code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR adds code indexing capabilities to the GitHub connector, allowing users to include repository code files alongside existing content types through a new include_code
configuration option.
- Added
include_code
boolean field to GitHub connector configuration inweb/src/lib/connectors/connectors.tsx
with corresponding UI checkbox - Implemented code file indexing logic in
backend/onyx/connectors/github/connector.py
with new methods_convert_code_file_to_document
and_fetch_repo_code_files
- Added example configuration with
include_code
option inbackend/scripts/add_connector_creation_script.py
for reference implementation - Handles GitHub API rate limiting and includes error handling for code file fetching
- Potential concern around timestamp handling and file content processing that needs review
💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!
3 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile
file_url = f"https://github.com/{repo_name}/blob/{file_content.sha}/{file_content.path}" | ||
return Document( | ||
id=file_url, # Or use file_content.download_url if available | ||
sections=[TextSection(link=file_url, text=file_content.decoded_content.decode() or "")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Decoding binary content without checking file type could fail for non-text files. Need to handle or skip binary files
I'm going to close this because it's a duplicate of #1650 and it was mentioned that the Onyx team is adding code support in a different way:
|
Description
Index the code in your Github repositories. Also, adds an
include_code
option to the settings screen.Addresses feature request: #4025
How Has This Been Tested?
Backporting (check the box to trigger backport action)
Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.
Notes
github.mdx
to indicate code is indexed documentation#189