Skip to content

[Doc]: Enhance Narwhals Tutorials with Backend-Agnostic Patterns #1696

Open
@philip-ndikum

Description

@philip-ndikum

What type of report is this?

Correction

Please describe the issue.

Description

Narwhals tutorials could be significantly enhanced by consolidating backend-agnostic patterns into a single, robust tutorial tailored for enterprise-grade machine learning (ML) and artificial intelligence (AI) workflows. This tutorial would focus on practical, production-ready patterns that simplify common tasks, ensure backend consistency, and align with scalable development workflows.

Key Focus Areas:

  1. Data Validation Patterns:

    • Eager validation for immediate feedback (e.g., numeric and categorical feature validation).
    • Lazy validation for optimized workflows across larger datasets.
  2. Time Series Operations:

    • Group-level metrics such as temporal aggregations (mean, null counts) for time-indexed datasets.
    • Temporal validation for uniqueness, consistency, and handling mixed frequencies.
  3. Feature Engineering:

    • Backend-agnostic numeric and categorical transformations.
    • Patterns for missing value imputation, standardization, and case consistency.

In our package TemporalScope, which leverages Narwhals for model-agnostic explainability in AI/ML workflows, these patterns would be immensely valuable for ensuring robust data preparation and validation. Specifically:

  • Use Case:
    • Validating and transforming features across Pandas, Polars, and Dask backends for explainable ML workflows.
    • Handling time series data in both single-step and multi-step forecasting pipelines.
  • Development Workflow:
    • Lean Main Environment:
      A hatch environment limited to Narwhals, without heavy dependencies like Pandas or Dask.
    • Comprehensive Test Environment:
      A hatch environment including all relevant libraries (Pandas, Polars, Dask) to validate runtime behavior and backend-agnostic patterns.

By integrating these patterns into a single, enterprise-grade tutorial, Narwhals would provide developers with clear, actionable guidance for robust AI/ML workflows [CC @kanenorman].


Suggestion

Create a condensed tutorial notebook that demonstrates these patterns, building directly on the feedback shared:

  • Universal Backend Support:
    Showcase compatibility with Pandas, Polars, and Dask.
  • Core Narwhals Patterns:
    Focus on the use of @nw.narwhalify, lazy/eager evaluation strategies, and backend-agnostic transformations.
  • Production-Ready Use Cases:
    Condense practical examples that are directly applicable to AI/ML pipelines, following @FBruzzesi recommendations (e.g., using pass_through=True or strict=False where necessary).

If you have a suggestion on how it should be, add it below.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions