Implement statistics recomputation with updates #2640

RobinTF · 2026-01-12T18:36:08Z

This is a preparation PR for #2408. It implements a function that allows to recompute statistics.
This is currently not used to correct outdated information (though ideally it should be), but that is a potential option for the future.

codecov · 2026-01-12T19:33:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.54%. Comparing base (7ec91f8) to head (5384779).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2640      +/-   ##
==========================================
+ Coverage   91.52%   91.54%   +0.02%     
==========================================
  Files         479      480       +1     
  Lines       41177    41259      +82     
  Branches     5474     5483       +9     
==========================================
+ Hits        37688    37772      +84     
+ Misses       1910     1909       -1     
+ Partials     1579     1578       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

joka921

Mostly small comments, but I have a serious question about blank nodes.

src/util/ParallelExecutor.h

src/index/IndexImpl.cpp

joka921 · 2026-01-13T07:30:42Z

src/index/IndexImpl.cpp

+            if (id.getDatatype() == Datatype::BlankNodeIndex) {
+              nextBlankNode =
+                  std::max(nextBlankNode, id.getBlankNodeIndex().get() + 1);


I think this doesn't work if you in the located triples have randomly allocated blank nodes, as you will have a really large maximum. There is a precondition, that those blank nodes from the main index are dense. Let's discuss this subtlety.

joka921 · 2026-01-13T07:34:29Z

src/engine/GroupByImpl.cpp

+    // queries the statistics which are never updated. Consider calling
+    // `IndexImpl::recomputeStatistics` and storing the result somewhere in this
+    // case. It also doesn't return the correct result for internal
+    // permutations.


Don't we have functions in the CompressedRelation that correctly computes distinct bla (currently i think col1, but maybe we can extend this? And then we can only use this expensive computation if there are any updates at all. At least extend the comment.

Yes, and these are also used by group by (but obviously they take longer to compute than just reading the stats)

sparql-conformance · 2026-01-14T13:53:43Z

Overview

Number of Tests	Passed ✅	Intended ✅	Failed ❌	Not tested
547	450	73	24	0

Conformance check passed ✅

No test result changes.

Details: https://qlever.dev/sparql-conformance-ui?cur=5384779322b617eefd5d2c188b9251b7ff7a97ec&prev=7ec91f836cb8c98b4e07c906574cb4bf7f6f4e02

sonarqubecloud · 2026-01-14T16:33:26Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

joka921

Only a very small additional comment:)

src/index/IndexImpl.cpp

joka921

Thank you very much.

RobinTF added 4 commits January 12, 2026 19:33

Implement basic stat recomputation

1bab488

Extract common code

ce13f16

Add unit test for helper function

41ec20c

Add unit test for statistics recomputation

102360a

RobinTF requested a review from joka921 January 12, 2026 18:36

RobinTF added 2 commits January 12, 2026 22:55

Fix test failure

fc31712

Improve coverage

492ae1f

joka921 requested changes Jan 13, 2026

View reviewed changes

ad-freiburg deleted a comment from joka921 Jan 14, 2026

RobinTF added 2 commits January 14, 2026 11:06

Adjust signature and variable names

6b2dacf

Fix bug in count distinct

5384779

joka921 reviewed Jan 14, 2026

View reviewed changes

src/index/IndexImpl.cpp Show resolved Hide resolved

joka921 approved these changes Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement statistics recomputation with updates #2640

Implement statistics recomputation with updates #2640

RobinTF commented Jan 12, 2026

Uh oh!

codecov bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

joka921 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joka921 Jan 13, 2026

Uh oh!

joka921 Jan 13, 2026

Uh oh!

RobinTF Jan 14, 2026

Uh oh!

sparql-conformance bot commented Jan 14, 2026

Uh oh!

sonarqubecloud bot commented Jan 14, 2026

Uh oh!

joka921 left a comment

Uh oh!

Uh oh!

joka921 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement statistics recomputation with updates #2640

Are you sure you want to change the base?

Implement statistics recomputation with updates #2640

Conversation

RobinTF commented Jan 12, 2026

Uh oh!

codecov bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joka921 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

joka921 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

joka921 Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

RobinTF Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

sparql-conformance bot commented Jan 14, 2026

Overview

Conformance check passed ✅

Uh oh!

sonarqubecloud bot commented Jan 14, 2026

Quality Gate passed

Uh oh!

joka921 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joka921 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 12, 2026 •

edited

Loading