Skip to content

Simplify code using skrub TableReport and TableVectorizer #866

@ArturoAmorQ

Description

@ArturoAmorQ
  • Add a notebook + video to show how all the pandas code in the Visual inspection of data subsection can be simplified using skrub.TableReport:
  • Replace ColumnTransformer with skrub.TableVectorizer starting from the Using numerical and categorical variables together notebook
    • In the same notebook, section Fitting a more powerful model, replace OrdinalEncoder by skrub.ToCategorical.
    • Explicitly mention that TableVectorizer makes the column selection automatically by using its dtype
    • Introduce concept of "low/high cardinality" and demonstrate effect of cardinality_threshold on the "native-country" column in the Adult Census dataset.
    • Update visualizing scikit-learn pipelines video to use TableVectorizer (with scikit-learn version >= 1.8)
    • Modify wrap-up quizzes that use the Ames Housing dataset i.e. M1, M4 and M5 to select subset of numerical columns with pandas
  • Redo the datasets description using TableReport

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions