Skip to content

Git info in the index (hash, version...) #57

Open
@Le09

Description

@Le09

I was looking at how to add the git info to the index, something like:

**Source Repository:** [https://github.com/Le09/Tutorial-Codebase-Knowledge/](https://github.com/Le09/Tutorial-Codebase-Knowledge/)

**Commit Hash:** d66d97639092051cd7eb0df82a96bec5a5b6bec4

**Branch Name:** main

So that it could potentially be used as a reference doc.
Also, having git info could be extended to had links to functions, classes, etc.

However, there are 3 different cases:

  1. local git repository
  2. remote repository, via ssh
  3. remote https repository

Case 1 is (mostly) easy; the only issue is that there might be a host alias.
Case 2 is problematic because the project is checked out in a temporary directory that is created within crawl_github_files.
Case 3 uses the API so it may be less of a problem, there's less duplication of work.

Except for case 1, I think it's a flow in the abstraction, since crawl_github_files is an isolated function, but there may be more that you want to extract from git.
Why have this complexity altogether, and not always clone the repository in .cache?
If it's to save on size, it can be done with a depth 1, although the describe tags wouldn't work in that case.
But it's only relevant for large repositories, and the time spent cloning is dwarfed by the time calling the LLM whatever the size may be.

I've made a small commit for the local case: 01b7c28
Do you have an opinion on the matter to make it into a real PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions