chore: POC for backward compatible checks of models by williambdean · Pull Request #2273 · pymc-labs/pymc-marketing

williambdean · 2026-02-09T22:21:14Z

Description

The model_version in the ModelBuilder is a bit brittle.

This puts some checks that a PRs changes would be able to be loaded in again.

Current checks might also be brittle as well. Open to suggestions.

Related Issue

Closes #
Related to #

Checklist

Checked that the pre-commit linting/style checks pass. Feel free to comment pre-commit.ci autofix to auto-fix.
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks) using numpydoc format.
If you are a pro: each commit corresponds to a relevant logical change

📚 Documentation preview 📚: https://pymc-marketing--2273.org.readthedocs.build/en/2273/

cursor · 2026-02-09T22:21:20Z

PR Summary

Medium Risk
Adds new CI workflow logic (worktrees, environment setup, patching files) and model load/fit execution paths, which may be brittle across dependency/version changes despite being isolated to CI/scripts.

Overview
Introduces a backwards-compatibility CI check that captures baseline model artifacts from main, captures the same models from the PR head, and fails if versions or posterior variable/dim structures differ (.github/workflows/backcompat.yml).

Adds a new scripts/backcompat toolkit (capture/compare, mock sampling, auto-discovered model definitions, and local runner scripts) to quickly build+fit models with mocked sampling, persist *.nc + manifest.json, and compare against baselines; also patches pymc_marketing/pytensor_utils.py to import ancestors compatibly across PyTensor versions used in CI.

^{Written by Cursor Bugbot for commit 8b58b56. This will update automatically on new commits. Configure here.}

The mamba-org/setup-micromamba@v1 action requires either 'latest' or a version with build number (e.g., '2.3.3-0'). Changed to 'latest' for simplicity and automatic updates.

- Add scripts/backcompat/__init__.py for proper module structure - Enable cache-environment and cache-downloads in setup-micromamba - Remove manual environment creation step (handled by action) This will significantly speed up CI by caching the conda environment across all 7 matrix jobs. Environment is only rebuilt when environment.yml changes.

Add concurrency control to automatically cancel outdated workflow runs when a new commit is pushed to the same PR. This prevents wasting CI resources on superseded commits. Concurrency group uses: - workflow name (Backcompat Check) - PR number (for pull requests) - ref (for direct pushes to branches)

The scripts/__init__.py file only exists on the backwards-compat branch, not on origin/main yet. When CI checks out main to /tmp/repo-main to capture baselines, the import fails. Solution: Copy the __init__.py files from the current branch into the main worktree before running capture. These are just package markers with no logic, so this is safe. Once this PR merges to main, future PRs won't need this workaround.

The entire scripts/backcompat/ directory doesn't exist on origin/main. We need to: 1. Create the directory structure (scripts/backcompat/models/) 2. Copy ALL backcompat scripts (not just __init__.py files) This ensures the backcompat code from the current branch can run against the model definitions from main.

PyTensor 2.31+ moved the 'ancestors' function from pytensor.graph.traversal to pytensor.graph.basic. This change: 1. Updates pymc_marketing/pytensor_utils.py with try/except fallback for compatibility with both old and new PyTensor versions 2. Patches main branch's pytensor_utils.py in CI workflow with the same compatibility fix before running backcompat tests This ensures backcompat tests work regardless of PyTensor version.

codecov · 2026-02-09T22:49:04Z

Codecov Report

❌ Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.04%. Comparing base (43f39c7) to head (8b58b56).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc_marketing/pytensor_utils.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2273      +/-   ##
==========================================
- Coverage   93.05%   93.04%   -0.02%     
==========================================
  Files          78       78              
  Lines       12228    12231       +3     
==========================================
+ Hits        11379    11380       +1     
- Misses        849      851       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cursor · 2026-02-09T22:58:10Z

scripts/backcompat/capture.py

+    build_kwargs = model_def.build_args_fn()
+    fit_kwargs = model_def.fit_args_fn() if model_def.fit_args_fn else {}
+    fit_data_kwargs = (
+        model_def.fit_data_fn(build_kwargs) if model_def.fit_data_fn else build_kwargs


Inconsistent fit_data_fn fallback between capture and compare

Medium Severity

The fit_data_fn fallback differs between capture.py and compare.py. When fit_data_fn is None, capture.py falls back to build_kwargs (e.g., {"data": df}) while compare.py falls back to {}. This affects 6 of 7 active models (all CLV models). The capture step passes unexpected build_kwargs like data=df through to pm.sample(), while the compare step passes nothing — meaning baseline and candidate models are fitted under different conditions.

Additional Locations (1)

scripts/backcompat/compare.py#L99-L101

williambdean · 2026-02-15T01:23:41Z

@copilot Please add tqdm to the conda enviroment so that it will be installed for these backward compatible checks in the CI/CD

Copilot · 2026-02-15T01:23:49Z

@williambdean I've opened a new pull request, #2295, to work on those changes. Once the pull request is ready, I'll request review from you.

* Initial plan * chore: add tqdm to conda environment for backcompat checks Co-authored-by: williambdean <57733339+williambdean@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: williambdean <57733339+williambdean@users.noreply.github.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

cursor · 2026-02-15T01:43:43Z

scripts/backcompat/compare.py

+    fit_kwargs = model_def.fit_args_fn() if model_def.fit_args_fn else {}
+    fit_data_kwargs = (
+        model_def.fit_data_fn(build_kwargs) if model_def.fit_data_fn else {}
+    )


Inconsistent fit_data_kwargs fallback between capture and compare

Medium Severity

When fit_data_fn is None, capture.py falls back to build_kwargs (e.g., {"data": df}) while compare.py falls back to {}. This affects 6 of 8 models (all CLV models without an explicit fit_data_fn). The capture step passes extra kwargs like data=df through to fit() and ultimately to pm.sample(), but the compare step does not. The two steps are fitting models with different arguments, undermining the validity of the backward-compatibility comparison. It only works silently because mock_sample accepts and ignores unknown **kwargs.

Additional Locations (1)

scripts/backcompat/capture.py#L29-L32

cursor · 2026-02-15T01:43:43Z

scripts/backcompat/local_backcompat.sh

+git fetch origin main --depth=1
+git worktree remove -f "${WORKTREE_DIR}" >/dev/null 2>&1 || true
+rm -rf "${WORKTREE_DIR}"
+git worktree add "${WORKTREE_DIR}" origin/main


Local script missing worktree script copy unlike CI

Medium Severity

The local script creates the origin/main worktree and immediately runs python -m scripts.backcompat.capture from it, but unlike the CI workflow, it never copies the current branch's backcompat scripts into the worktree. The CI workflow explicitly does cp -r scripts/backcompat/*.py into the main worktree because these scripts don't exist on main yet. The local script will fail with a ModuleNotFoundError before this PR merges, and after merging, it will use potentially stale scripts from main whenever the backcompat harness is modified on a branch — diverging from CI behavior.

Additional Locations (1)

.github/workflows/backcompat.yml#L53-L59

ricardoV94 · 2026-02-15T14:40:05Z

What about building a hash of the generated model (more precisely fgraph_from_model), and having that hard-coded in the CI. Then there is no need to try and run in older versions of the library.

Want something dumb? mmm.model.logp().dprint(print_type=True) If the logp hasn't changed then the model likely neither? Maybe add model.named_vars_to_dims for good measure

williambdean · 2026-02-19T19:27:53Z

Would you recommend changing this:

pymc-marketing/pymc_marketing/model_builder.py

Lines 175 to 223 in beb6224

    
               @property 
        
               def id(self) -> str: 
        
                   """Generate a unique hash value for the model. 
        
                   The hash value is created using the last 16 characters of the SHA256 hash encoding, 
        
                   based on the model configuration, version, and model type. 
        
                   Returns 
        
                   ------- 
        
                   str 
        
                       A string of length 16 characters containing a unique hash of the model. 
        
                   Examples 
        
                   -------- 
        
                   .. code-block:: python 
        
                       model = MyModel() 
        
                       model.id 
        
                       "0123456789abcdef" 
        
                   """ 
        
                   def _serialize_for_hash(obj): 
        
                       """Serialize objects for deterministic hashing.""" 
        
                       if hasattr(obj, "to_dict"): 
        
                           return obj.to_dict() 
        
                       if hasattr(obj, "model_dump"): 
        
                           # Handle Pydantic models (e.g., HSGPKwargs) 
        
                           return obj.model_dump(mode="json") 
        
                       # For other objects, try to use their __dict__ but filter out non-serializable items 
        
                       if hasattr(obj, "__dict__"): 
        
                           # Filter out methods, functions, and other non-serializable attributes 
        
                           return { 
        
                               k: v 
        
                               for k, v in obj.__dict__.items() 
        
                               if not callable(v) and not k.startswith("_") 
        
                           } 
        
                       # Last resort: convert to string representation 
        
                       return str(obj) 
        
                   hasher = hashlib.sha256() 
        
                   # Use JSON serialization with custom default for deterministic model IDs 
        
                   config_json = json.dumps( 
        
                       self._serializable_model_config, sort_keys=True, default=_serialize_for_hash 
        
                   ) 
        
                   hasher.update(config_json.encode()) 
        
                   hasher.update(self.version.encode()) 
        
                   hasher.update(self._model_type.encode()) 
        
                   return hasher.hexdigest()[:16]

Currently, it doesn't need data or build for an id.

ricardoV94 · 2026-02-21T16:15:31Z

I don't have a good picture tbh. Caching is not a trivial problem and MMM models have so many moving parts that my confidence in it is pretty low.

some backward compat logic

fc273b6

williambdean added 7 commits February 9, 2026 17:23

fix: use latest micromamba version in backcompat workflow

f675f1a

The mamba-org/setup-micromamba@v1 action requires either 'latest' or a version with build number (e.g., '2.3.3-0'). Changed to 'latest' for simplicity and automatic updates.

add init file

c4a67f4

cursor bot reviewed Feb 9, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 15, 2026

chore: Add backward compatibility checks for ModelBuilder #2295

Merged

5 tasks

Copilot AI and others added 2 commits February 14, 2026 20:30

Merge branch 'main' into backwards-compat

8b58b56

cursor bot reviewed Feb 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: POC for backward compatible checks of models#2273

chore: POC for backward compatible checks of models#2273
williambdean wants to merge 10 commits intomainfrom
backwards-compat

williambdean commented Feb 9, 2026 •

edited by github-actions bot

Loading

Uh oh!

cursor bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

cursor bot Feb 9, 2026

Uh oh!

williambdean commented Feb 15, 2026

Uh oh!

Copilot AI commented Feb 15, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 15, 2026

Uh oh!

cursor bot Feb 15, 2026

Uh oh!

ricardoV94 commented Feb 15, 2026 •

edited

Loading

Uh oh!

williambdean commented Feb 19, 2026

Uh oh!

ricardoV94 commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

williambdean commented Feb 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Checklist

Uh oh!

cursor bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

codecov bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cursor bot Feb 9, 2026

Choose a reason for hiding this comment

Inconsistent fit_data_fn fallback between capture and compare

Uh oh!

williambdean commented Feb 15, 2026

Uh oh!

Copilot AI commented Feb 15, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 15, 2026

Choose a reason for hiding this comment

Inconsistent fit_data_kwargs fallback between capture and compare

Uh oh!

cursor bot Feb 15, 2026

Choose a reason for hiding this comment

Local script missing worktree script copy unlike CI

Uh oh!

ricardoV94 commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

williambdean commented Feb 19, 2026

Uh oh!

ricardoV94 commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

williambdean commented Feb 9, 2026 •

edited by github-actions bot

Loading

cursor bot commented Feb 9, 2026 •

edited

Loading

codecov bot commented Feb 9, 2026 •

edited

Loading

Inconsistent `fit_data_fn` fallback between capture and compare

Inconsistent `fit_data_kwargs` fallback between capture and compare

ricardoV94 commented Feb 15, 2026 •

edited

Loading