Skip to content

Github connector: Add an include_code option and index code #4284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

patrickfweston
Copy link
Contributor

@patrickfweston patrickfweston commented Mar 14, 2025

Description

Index the code in your Github repositories. Also, adds an include_code option to the settings screen.

Addresses feature request: #4025

How Has This Been Tested?

  1. Add a new Github connector
  2. Specify the details and configure the connector
  3. When setting up the options, check the "Include code?" box
  4. Let the connector run
  5. Verify that code files are being included using the Explorer tool

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Notes

Copy link

vercel bot commented Mar 14, 2025

@patrickfweston is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

@patrickfweston patrickfweston changed the title WIP: Github connector: Add an include_code option and index code Github connector: Add an include_code option and index code Mar 14, 2025
@patrickfweston patrickfweston marked this pull request as ready for review March 14, 2025 16:26
@patrickfweston patrickfweston requested a review from a team as a code owner March 14, 2025 16:26
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR adds code indexing capabilities to the GitHub connector, allowing users to include repository code files alongside existing content types through a new include_code configuration option.

  • Added include_code boolean field to GitHub connector configuration in web/src/lib/connectors/connectors.tsx with corresponding UI checkbox
  • Implemented code file indexing logic in backend/onyx/connectors/github/connector.py with new methods _convert_code_file_to_document and _fetch_repo_code_files
  • Added example configuration with include_code option in backend/scripts/add_connector_creation_script.py for reference implementation
  • Handles GitHub API rate limiting and includes error handling for code file fetching
  • Potential concern around timestamp handling and file content processing that needs review

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

3 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile

file_url = f"https://github.com/{repo_name}/blob/{file_content.sha}/{file_content.path}"
return Document(
id=file_url, # Or use file_content.download_url if available
sections=[TextSection(link=file_url, text=file_content.decoded_content.decode() or "")],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Decoding binary content without checking file type could fail for non-text files. Need to handle or skip binary files

@patrickfweston
Copy link
Contributor Author

I'm going to close this because it's a duplicate of #1650 and it was mentioned that the Onyx team is adding code support in a different way:

Hello! We're going to be building code search into the project in the feature (hopefully starting around end of year). The issue with ingesting code like with documentation or conversations is that code has dependencies across files that can't be captured well by a pure top n approach. We'll be building code ingestion through a different pipeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant