Extract content from git repository #62
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This captures some unfinished early work of mine on extracting information about content contained within referenced git repos by fetching the latest commit of the default branch into a scratch repo.
What I ran into and didn't finish resolving before I stopped working on this was that some projects reference Git repos outside GitHub that are down and/or have malformed responses and the
gitCLI is not good at telling us about this when we try tofetchorls-remoteon them, and just hangs indefinitely in some cases.There are two things we could/should do next:
gitprotocol to pull the latest commit. Even doing a shallow fetch of just the latest commit, pulling viagitrequires downloading a lot more content and dealing with a lot more failure modes than using the GitHub API. The vast majority of repos are on GitHub, so while we want to ultimately support all sorts of Git hosts, GitHub represents a worthwhile special case to optimize for by using their API insteadgitmore resilient for cases where we can't use the GitHub API. I was thinking we might either:child_processto invoke the localgitclient to probe repositoriesgitthat just quickly checks that a connection can be established and we see some fingerprint in the response that tells us a git server is responding on the other end