Description
Is your feature request related to a problem? Please describe.
The GithubRunner works so fine to extract data from Github, but it is not possible to use the same runner to extract data from Enterprise accounts.
Describe the solution you'd like
I would to use the GithubRunner to extract data from a Github Enterprise account. So, to enable this feature, I believe the SimpleGitHubConfig class should have a new parameter to pass the base URL API from the Github Enterprise, as shown in the code below:
from unstructured.ingest.connector.git import GitAccessConfig
from unstructured.ingest.connector.github import SimpleGitHubConfig
from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
from unstructured.ingest.runner import GithubRunner
if __name__ == "__main__":
runner = GithubRunner(
processor_config=ProcessorConfig(
verbose=True,
output_dir="github-ingest-output",
num_processes=2,
),
read_config=ReadConfig(),
partition_config=PartitionConfig(),
connector_config=SimpleGitHubConfig(
url="<MyOrg>/<MyInternalRepo>", branch="main", access_config=GitAccessConfig(), base_url=base_url="https://<host_of_my_github_enterprise>/api/v3"
),
)
runner.run()
Describe alternatives you've considered
Of course, It is necessary that the source code has to be compatible with the Github and Github enterprise API, but I already tested and it should be interesting to remove the line 32 condition
, in order to be possible to allow other github hosts. Because in this way, we are not able to configure Github Enterprise account, which has different domains.
Additional context
- The user should be able to pass the domain other than "github.com".