feat(dataobj): Add query stats collection to the dataobj readers #17128

benclive · 2025-04-11T13:50:53Z

What this PR does / why we need it:
This PR adds support for collecting various statistics when reading data objects. They are the equivalent of the chunk stats we already collect and are merged & contribute to summaries etc. as normal.

I added dataobj specific stats such as pages scanned, pages downloaded and number of batches fetched, as well as various stats we already collect from chunks such as lines processed, post-filter lines etc.

There are several differences in the chunk-like stats: In dataobjects we read partial data in order to match against predicates and then fill in the rest of the line for matching rows. In chunks, we read everything and then decide whether to keep it. This means fields such as "lines scanned" aren't directly equivalent and it's unclear whether to use the pre-predicate row count or the post-predicate row count from dataobj. I opted for pre-, because that covers the work we are doing. The bytes value should give an indicate of if we are doing too much pre- or post- predicate matching.

While I added protos for structured metadata stats, they aren't easy to calculate so I've omitted them for now. The generic reader where the predicate matching happens does not know if a column is metadata or not, and as we don't return non-matching rows to the higher level readers, we can't accurately calculate how many columns are structured metadata and how many are log lines / stream IDs / timestamps. I am happy to remove this field completely but I think we should find a way to implement it.

owen-d

looks good, but need to fix some tests

benclive requested a review from a team as a code owner April 11, 2025 13:50

pull-request-size bot added the size/L label Apr 11, 2025

owen-d approved these changes Apr 11, 2025

View reviewed changes

ashwanthgoli mentioned this pull request Apr 14, 2025

chore(dataobj): collect and report dataobj reader stats #17208

Closed

6 tasks

benclive added 3 commits April 14, 2025 14:04

feat(dataobj): Add query stats collection to the dataobj readers

5a0a61b

Add json tags to protos

ec7a7bb

Update tests

e46bf85

benclive force-pushed the instrument-dataobj-reader-stats branch from 8ccdf54 to e46bf85 Compare April 14, 2025 13:19

Update util tests

6a16774

benclive merged commit 9e68fb0 into main Apr 14, 2025
61 checks passed

benclive deleted the instrument-dataobj-reader-stats branch April 14, 2025 14:02

loki-gh-app bot mentioned this pull request Jun 4, 2025

chore(k257): release 3.5.0 #17926

Open

This was referenced Jun 24, 2025

chore(k260): release 3.5.0 #18210

Open

chore(k261): release 3.5.0 #18278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dataobj): Add query stats collection to the dataobj readers #17128

feat(dataobj): Add query stats collection to the dataobj readers #17128

Uh oh!

benclive commented Apr 11, 2025

Uh oh!

owen-d left a comment

Uh oh!

Uh oh!

Uh oh!

feat(dataobj): Add query stats collection to the dataobj readers #17128

feat(dataobj): Add query stats collection to the dataobj readers #17128

Uh oh!

Conversation

benclive commented Apr 11, 2025

Uh oh!

owen-d left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!