Git info in the index (hash, version...)

I was looking at how to add the git info to the index, something like:
```
**Source Repository:** [https://github.com/Le09/Tutorial-Codebase-Knowledge/](https://github.com/Le09/Tutorial-Codebase-Knowledge/)

**Commit Hash:** d66d97639092051cd7eb0df82a96bec5a5b6bec4

**Branch Name:** main
```
So that it could potentially be used as a reference doc. 
Also, having git info could be extended to had links to functions, classes, etc.

However, there are 3 different cases:
1) local git repository
2) remote repository, via ssh
3) remote https repository

Case 1 is (mostly) easy; the only issue is that there might be a host alias.
Case 2 is problematic because the project is checked out in a temporary directory that is created within `crawl_github_files`. 
Case 3 uses the API so it may be less of a problem, there's less duplication of work.

Except for case 1, I think it's a flow in the abstraction, since `crawl_github_files` is an isolated function, but there may be more that you want to extract from git.
Why have this complexity altogether, and not always clone the repository in `.cache`? 
If it's to save on size, it can be done with a depth 1, although the describe tags wouldn't work in that case. 
But it's only relevant for large repositories, and the time spent cloning is dwarfed by the time calling the LLM whatever the size may be.

I've made a small commit for the local case: https://github.com/The-Pocket/Tutorial-Codebase-Knowledge/commit/01b7c289e0f604a0606522cb134943793fcc260e
Do you have an opinion on the matter to make it into a real PR?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Git info in the index (hash, version...) #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Git info in the index (hash, version...) #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions