Fix caching in ShapeBuilder. #1192

riga · 2025-12-05T10:34:59Z

This PR fixes a dangerous anti-pattern in the ShapeBuilder.

getShape, getPdf, shape2Data and shape2Pdf use cache dictionaries that are initialized as keyword argument defaults. Mutable defaults are global singletons, resulting in the caches being involuntarily shared between instances (that don't even have to exist at the same time).

This might have already caused issues in the HH tools where ShapeBuilder instances were used for datacard parsing.

Summary by CodeRabbit

Refactor
- Optimized shape operation caching mechanisms for improved performance. Methods now use per-instance cache management, streamlining repeated operations and enhancing memory efficiency.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-05T10:35:19Z

Walkthrough

Refactored the ShapeBuilder caching mechanism to use per-instance caches instead of mutable default parameters. Changed five methods to accept _cache=None and initialize corresponding per-instance cache dictionaries when needed.

Changes

Cohort / File(s)	Change Summary
Per-instance cache initialization `python/ShapeTools.py`	Refactored caching mechanism: added `_get_shape_cache`, `_get_pdf_cache`, `_shape2data_cache`, `_shape2pdf_cache` in `ShapeBuilder.__init__`. Changed default `_cache={}` to `_cache=None` in `getShape`, `getData`, `getPdf`, `shape2Data`, and `shape2Pdf` methods. Methods now initialize and use per-instance caches when `_cache` is None.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

Cache initialization logic: Verify that per-instance caches are properly initialized in __init__ and that the fallback to per-instance caches when _cache=None is correctly implemented across all five methods.
Backward compatibility: Confirm that existing callers passing explicit _cache dictionaries continue to work without breaking changes.
Thread safety: Consider whether per-instance caches introduce any thread-safety concerns if the ShapeBuilder instance is shared across threads.
Memory implications: Review whether storing caches as instance variables could have memory implications for long-lived instances.

Suggested reviewers

ajgilbert

Poem

🐰 Caches tucked in instance homes,
No more mutable defaults roam,
Per-instance storage, clean and bright,
Shape and PDF cached just right!
A rabbit's optimization delight! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix caching in ShapeBuilder' directly addresses the main change: correcting a caching anti-pattern in ShapeBuilder methods to use per-instance caches instead of shared mutable defaults.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18685de and e2fc2fc.

📒 Files selected for processing (1)

python/ShapeTools.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

python/ShapeTools.py (1)

python/Datacard.py (1)

shape (242-250)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: LCG_108 - ROOT 6.36.02
GitHub Check: CMSSW_14_1_0_pre4 - ROOT 6.30.07
GitHub Check: CMSSW_14_0_0_pre1 - ROOT 6.26.11
GitHub Check: dev3/latest - ROOT LCG master
GitHub Check: LCG_102 - ROOT 6.26.04
GitHub Check: LCG_106 - ROOT 6.32.02
GitHub Check: CMSSW_11_3_4 - ROOT 6.22.09
GitHub Check: Compile (py3.12, root6.34.4)
GitHub Check: Compile (py3.10, root6.32.2)
GitHub Check: Compile (py3.10, root6.26.4)

🔇 Additional comments (5)

python/ShapeTools.py (5)

74-77: LGTM! Per-instance caches correctly fix the mutable default argument anti-pattern.

Initializing caches as instance attributes ensures proper isolation between ShapeBuilder instances, preventing the cross-contamination issue described in the PR.

662-664: LGTM! Correct cache initialization pattern.

Using None as the default and initializing to the instance cache avoids the mutable default argument issue while maintaining API flexibility.

855-857: LGTM! Consistent cache initialization pattern.

1216-1218: LGTM! Consistent cache initialization pattern.

1255-1257: LGTM! Consistent cache initialization pattern.

coderabbitai · 2025-12-05T10:37:59Z

python/ShapeTools.py

+    def getData(self, channel, process, syst="", _cache=None):
        return self.shape2Data(self.getShape(channel, process, syst), channel, process)


⚠️ Potential issue | 🟡 Minor

Unused _cache parameter in getData.

The _cache parameter is declared but never used in the method body. It's not passed to either getShape or shape2Data. This appears to be dead code.

Either remove the unused parameter or pass it to the delegated methods:

Option 1 - Remove unused parameter:

- def getData(self, channel, process, syst="", _cache=None): + def getData(self, channel, process, syst=""): return self.shape2Data(self.getShape(channel, process, syst), channel, process)

Option 2 - Pass cache to shape2Data (if caching control is intended):

def getData(self, channel, process, syst="", _cache=None): - return self.shape2Data(self.getShape(channel, process, syst), channel, process) + return self.shape2Data(self.getShape(channel, process, syst), channel, process, _cache=_cache)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def getData(self, channel, process, syst="", _cache=None):

return self.shape2Data(self.getShape(channel, process, syst), channel, process)

def getData(self, channel, process, syst=""):

return self.shape2Data(self.getShape(channel, process, syst), channel, process)

🤖 Prompt for AI Agents

In python/ShapeTools.py around lines 852-853, the method getData declares an unused _cache parameter; either remove _cache from the signature and update any callers accordingly, or forward it to the delegated calls (pass _cache into getShape and/or shape2Data as required by their signatures) so the cache is actually used; pick the option consistent with existing caching design and update call sites and docstring/comments to match.

codecov · 2025-12-05T10:48:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 20.67%. Comparing base (18685de) to head (e2fc2fc).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1192   +/-   ##
=======================================
  Coverage   20.67%   20.67%           
=======================================
  Files         195      195           
  Lines       26173    26173           
  Branches     3927     3927           
=======================================
  Hits         5412     5412           
  Misses      20761    20761

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix caching in ShapeBuilder.

e2fc2fc

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

anigamova merged commit 3ea584a into cms-analysis:main Dec 5, 2025
16 checks passed

riga deleted the fix/shape_tools_caching branch December 5, 2025 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix caching in ShapeBuilder. #1192

Fix caching in ShapeBuilder. #1192

Uh oh!

riga commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 5, 2025

Uh oh!

codecov bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		def getData(self, channel, process, syst="", _cache=None):
		return self.shape2Data(self.getShape(channel, process, syst), channel, process)

Fix caching in ShapeBuilder. #1192

Fix caching in ShapeBuilder. #1192

Uh oh!

Conversation

riga commented Dec 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riga commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

codecov bot commented Dec 5, 2025 •

edited

Loading