Unmanaged memory because of block splitting in pandas

**Describe the issue**:

pandas started splitting blocks in 2.0 to improve performance of setitem when a full column is replaced. This keeps unused data in memory.

**Minimal Complete Verifiable Example**:

```python
import dask.array as da
import dask.dataframe as dd

# Create columns with 400MB each
ddf = dd.from_array(da.random.random((50_000_000, 10)), columns=list("abcdefghij"))


ddf["b"] = 1
# ddf = ddf.rename(columns={"a": "x"})
ddf.persist()

```

cc @crusaderky  we chatted offline about this last week. Anything we can do here? Should this be counted as managed memory?
Rename triggers a deep copy before we persist, which brings the unmanaged memory down.

**Anything else we need to know?**:

**Environment**:

- Dask version: 2023.04
- pandas 2.0
- Python version: 3.10
- Operating System: Mac OS
- Install method (conda, pip, source): conda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unmanaged memory because of block splitting in pandas #7800

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Unmanaged memory because of block splitting in pandas #7800

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions