Skip to content

Conversation

@nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Jan 22, 2026

Streaming queries that collect to an in-memory DataFrame and consist only of column projections (and pl.len()) will now disable morsel splitting at supported sources.

E.g -

  • Scan->InMemorySink scan_parquet().collect()
  • Fast-count scan_parquet().select(pl.len()).collect()
  • Simple projections scan_parquet().select("<column>", "column", ..).collect()

Equivalently when starting from InMemorySource -

  • InMemorySource->InMemorySink LazyFrame().collect()
  • Fast-count LazyFrame().select(pl.len()).collect()
  • Simple projections LazyFrame().select("<column>", "column", ..).collect()

Benchmark

Description: scan_parquet().select(pl.len()).collect()
File: 4B rows x 0 columns

Runtime Before Runtime After Speedup
0.0728s 0.000207s 351x

Before this PR the parquet source would split to morsels of 100k rows (sending ~42,949 morsels). It now sends only a single morsel.

Test script
from time import perf_counter
import polars as pl

path = "/Users/nxs/git/polars/.env/_data_out/big.parquet"
pl.LazyFrame(height=(1 << 32) - 1).sink_parquet(path)

q = pl.scan_parquet(path).select(pl.len())
print(q.explain(engine="streaming"))

timings = []
for _ in range(5):
    t = perf_counter()
    q.collect()
    timings.append(perf_counter() - t)

print(f"{min(timings) = }")

@github-actions github-actions bot added performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars labels Jan 22, 2026
@nameexhaustion nameexhaustion force-pushed the nxs/fast-count-no-split-morsels branch from d4c1ea4 to 5e0ec6c Compare January 22, 2026 17:14
@nameexhaustion nameexhaustion changed the title perf: Disable morsel splitting for fast-count perf: Disable morsel splitting for fast-count on streaming Jan 22, 2026
@github-actions github-actions bot added the A-streaming Related to the streaming engine label Jan 22, 2026
@codecov
Copy link

codecov bot commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 99.15254% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.12%. Comparing base (5dd9b23) to head (889ab84).

Files with missing lines Patch % Lines
...ates/polars-stream/src/nodes/io_sources/ipc/mod.rs 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #26245      +/-   ##
==========================================
+ Coverage   78.35%   81.12%   +2.76%     
==========================================
  Files        1777     1777              
  Lines      241720   241816      +96     
  Branches     3085     3085              
==========================================
+ Hits       189406   196172    +6766     
+ Misses      51517    44848    -6669     
+ Partials      797      796       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nameexhaustion nameexhaustion force-pushed the nxs/fast-count-no-split-morsels branch 2 times, most recently from 2663041 to df594b2 Compare January 22, 2026 21:40
@nameexhaustion nameexhaustion force-pushed the nxs/fast-count-no-split-morsels branch from 2242cb6 to 889ab84 Compare January 22, 2026 23:23
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@pola-rs pola-rs deleted a comment from github-actions bot Jan 22, 2026
@nameexhaustion nameexhaustion changed the title perf: Disable morsel splitting for fast-count on streaming perf: Disable morsel splitting for fast-count on streaming engine Jan 23, 2026
@nameexhaustion nameexhaustion marked this pull request as ready for review January 23, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-streaming Related to the streaming engine performance Performance issues or improvements python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disable morsel splitting in I/O sources if scans are connected directly to in-mem sink

2 participants