Validate `consumes` and infer `produces` for Lightweight Python components

When the user uses Lightweight Python components (https://github.com/ml6team/fondant/issues/558) we want to get any information we currently get from the component spec from the provided Python code.

For the `consumes` section, we can assume it matches the schema of the `dataset` the operation is applied to, possibly altered by the `consumes` argument passed to the `apply` method.

For the `produces` section, the user can either provide a schema via the `produces` argument on the `apply` method, or we can try to infer it by simulating the `transform` function. We could do this by generating dummy data based on the `consumes` schema, and applying the `transform` method on it. 

This only makes sense for Transform components since we always expect the user to provide a `produces` schema for a Read component, and a Write component doesn't produce anything. 

Inferring the `produces` schema by simulation would also validate the `consumes` schema if it succeeds. It doesn't invalidate it when failing though, since there can be multiple reasons for a failed simulation: either the `consumes` schema is incorrect, there's a bug in the component, or a bug in the dummy data generation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate `consumes` and infer `produces` for Lightweight Python components #752

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validate consumes and infer produces for Lightweight Python components #752

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Validate `consumes` and infer `produces` for Lightweight Python components #752