Allow option to use DataGeometry objects à la scikit-learn pipelines

Currently, if you want to repeatedly transform text samples with `hypertools.tools.format_data()` using the same parameters, the function re-fits both the vectorizer and text model on each call.  This ends up being fairly inefficient, and for expensive/numerous operations, makes working directly with the underlying `sklearn` classes the better option.  

We could add an argument to return the fit models for reuse, but a really nice feature would be something like a [scikit-learn Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) object that you could create, fit, save, and reuse to perform various processing steps with a single call.  This would also be a very attractive feature for hypertools, since it could also additionally implement methods like `.plot()` and `.describe()`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions