Provide a library interface for vectara-ingest

As I see it now, the only out-of-the box way to run the crawlers in this repo is to run it in a Docker container, installing all dependencies.
In my opinion, it would be more useful if you also provided a "library" interface, wherein I could import the crawler in my own Python project and call it as I need.
(I understand that this is technically already possible, but it's not as straightforward as it could be)

With that approach, you should be able to split up the dependencies so that one only needs to install the dependencies needed for the crawlers they plan on using, and not _all_ dependencies (which seems to be the case right now).

I see you are using [`uv`](https://docs.astral.sh/uv/) to install dependencies in the [Dockerfile](https://github.com/vectara/vectara-ingest/blob/main/Dockerfile#L26-L27), why not use that to manage the project dependencies as well? That would simplify maintaining the dependencies as I described above.

 Just wanted to start a discussion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a library interface for vectara-ingest #190

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide a library interface for vectara-ingest #190

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions