Open
Description
Here's a snippet from the Polars groupby benchmarks:
pl.read_csv(src_grp, schema_overrides={"id4":pl.Int32, "id5":pl.Int32, "id6":pl.Int32, "v1":pl.Int32, "v2":pl.Int32
Looks like id4
, id5
, id6
and v1
are using Int32 columns.
Other engines, like Spark, are just inferring the column types:
x = spark.read.csv(src_grp, header=True, inferSchema='true')
I think we should either have all the engines infer the column data types or all the engines specify the column data types for a better comparison. It's not apples:apples if some engines are using int32 and others are using int64.
Metadata
Metadata
Assignees
Labels
No labels