Use the same column data types for all engines in benchmarks

Here's a snippet from the Polars groupby benchmarks:

```python
pl.read_csv(src_grp, schema_overrides={"id4":pl.Int32, "id5":pl.Int32, "id6":pl.Int32, "v1":pl.Int32, "v2":pl.Int32
```

Looks like `id4`, `id5`, `id6` and `v1` are using Int32 columns.

Other engines, like Spark, are just inferring the column types:

```python
x = spark.read.csv(src_grp, header=True, inferSchema='true')
```

I think we should either have all the engines infer the column data types or all the engines specify the column data types for a better comparison.  It's not apples:apples if some engines are using int32 and others are using int64.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use the same column data types for all engines in benchmarks #101

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use the same column data types for all engines in benchmarks #101

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions