Description
What type of report is this?
Correction
Please describe the issue.
Description
Narwhals tutorials could be significantly enhanced by consolidating backend-agnostic patterns into a single, robust tutorial tailored for enterprise-grade machine learning (ML) and artificial intelligence (AI) workflows. This tutorial would focus on practical, production-ready patterns that simplify common tasks, ensure backend consistency, and align with scalable development workflows.
Key Focus Areas:
-
Data Validation Patterns:
- Eager validation for immediate feedback (e.g., numeric and categorical feature validation).
- Lazy validation for optimized workflows across larger datasets.
-
Time Series Operations:
- Group-level metrics such as temporal aggregations (mean, null counts) for time-indexed datasets.
- Temporal validation for uniqueness, consistency, and handling mixed frequencies.
-
Feature Engineering:
- Backend-agnostic numeric and categorical transformations.
- Patterns for missing value imputation, standardization, and case consistency.
In our package TemporalScope, which leverages Narwhals for model-agnostic explainability in AI/ML workflows, these patterns would be immensely valuable for ensuring robust data preparation and validation. Specifically:
- Use Case:
- Validating and transforming features across Pandas, Polars, and Dask backends for explainable ML workflows.
- Handling time series data in both single-step and multi-step forecasting pipelines.
- Development Workflow:
- Lean Main Environment:
Ahatch
environment limited to Narwhals, without heavy dependencies like Pandas or Dask. - Comprehensive Test Environment:
Ahatch
environment including all relevant libraries (Pandas, Polars, Dask) to validate runtime behavior and backend-agnostic patterns.
- Lean Main Environment:
By integrating these patterns into a single, enterprise-grade tutorial, Narwhals would provide developers with clear, actionable guidance for robust AI/ML workflows [CC @kanenorman].
Suggestion
Create a condensed tutorial notebook that demonstrates these patterns, building directly on the feedback shared:
- Universal Backend Support:
Showcase compatibility with Pandas, Polars, and Dask. - Core Narwhals Patterns:
Focus on the use of@nw.narwhalify
, lazy/eager evaluation strategies, and backend-agnostic transformations. - Production-Ready Use Cases:
Condense practical examples that are directly applicable to AI/ML pipelines, following @FBruzzesi recommendations (e.g., usingpass_through=True
orstrict=False
where necessary).
If you have a suggestion on how it should be, add it below.
No response