Implement configurable options for comps algorithm methodology by wagnerlmichael · Pull Request #449 · ccao-data/model-res-avm

wagnerlmichael · 2026-02-18T21:13:59Z

Over the last year, we tested a number of algorithm variations for comps. This PR centralizes them all and allows us to choose which methodology we run through the params.yaml configuration file.

I've done moderate testing locally for all 4 methods and extract_tree_weights runs, producing comps that pass the eyeball test.

The four options:

unweighted
unweighted_with_error_reduction
error_reduction
prediction_variance

Closes #405

wagnerlmichael · 2026-02-20T17:29:39Z

pipeline/04-interpret.R

-    message("First 5 weights:")
-    print(head(tree_weights, 5))
+  if (is.matrix(tree_weights)) {
+    if (!all(rowSums(tree_weights) %in% c(0, 1))) {


Added the negation here because unless I'm reading incorrectly, I think that we had this backwards?

python/comps.py

pipeline/04-interpret.R

…-algorithms-to-the-code

Damonamajor · 2026-02-20T21:33:00Z

R/helpers.R

+  )
+
+  # ---------------------------------------------------------
+  # unweighted (vector with 1/n_trees for each tree)


I'm gonna standardize the comments once review is done to name: Description.

Damonamajor · 2026-02-20T21:40:23Z

And this ahead of the .R function is outdated

# Helper function to return weights for comps
# Computes per-tree weights from cumulative leaf node values.

# Basic Steps
# For every observation, map its assigned leaf index in
# each tree to the corresponding leaf value.
# Compute the row-wise cumulative sums of these
# leaf values (stand-in for training data predictions).
# Calculate the absolute prediction error.
# Compute the reduction in error.
# Normalize these improvements so that row-weights sum to 1.

jeancochrane

This is great, thanks you two! Some small comments below, but nothing that I'm super concerned about overall. Once final code changes are in, I'm going to kick off a few test runs to make sure each algorithm works.

@wagnerlmichael are you down to take a stab at extending the Python tests to confirm these new changes work? A few cases I think we should test:

Add additional parameterized test cases to test_get_comps to make sure that the weights indexing works as expected when the weights are a vector instead of a matrix
Add additional parameterized test cases to test_get_comps_raises_on_invalid_inputs to make sure we properly raise errors in the 1-D case

Pytest parameterized tests can be a bit tricky if you're not familiar with them, so let me know if you want any help figuring out how to do this.

Also, I noticed that the GitHub workflow that runs the Python tests was disabled, so we're not running these tests as part of our CI checks. I just enabled that workflow again, so hopefully the next time you push a commit, GitHub will run your tests; if that doesn't happen, let me know and we can continue troubleshooting.

pipeline/04-interpret.R

R/helpers.R

wagnerlmichael · 2026-02-23T15:01:10Z

This is great, thanks you two! Some small comments below, but nothing that I'm super concerned about overall. Once final code changes are in, I'm going to kick off a few test runs to make sure each algorithm works.

@wagnerlmichael are you down to take a stab at extending the Python tests to confirm these new changes work? A few cases I think we should test:

Add additional parameterized test cases to test_get_comps to make sure that the weights indexing works as expected when the weights are a vector instead of a matrix

Add additional parameterized test cases to test_get_comps_raises_on_invalid_inputs to make sure we properly raise errors in the 1-D case

Pytest parameterized tests can be a bit tricky if you're not familiar with them, so let me know if you want any help figuring out how to do this.

Also, I noticed that the GitHub workflow that runs the Python tests was disabled, so we're not running these tests as part of our CI checks. I just enabled that workflow again, so hopefully the next time you push a commit, GitHub will run your tests; if that doesn't happen, let me know and we can continue troubleshooting.

Yes, sounds good! I'll add some tests

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

python/tests/test_comps.py

…-algorithms-to-the-code

…o-the-code' of github.com:ccao-data/model-res-avm into 405-persist-all-possible-significant-sales-algorithms-to-the-code Merge changes`

…t-all-possible-significant-sales-algorithms-to-the-code Merge in fix for failing tests workflow

…-algorithms-to-the-code

jeancochrane

@wagnerlmichael Tiny little nit I just noticed while testing -- since we're adding a new parameter to params.yaml, we should make sure we save it to the model.metadata output table in the finalize stage:

model-res-avm/pipeline/05-finalize.R

Lines 94 to 95 in 64a3b60

    
           comp_enable = comp_enable, 
        
           comp_num_comps = params$comp$num_comps,

Damonamajor · 2026-02-24T21:36:26Z

I added it, do we need documentation for it anywhere?

jeancochrane · 2026-02-24T21:52:18Z

I added it, do we need documentation for it anywhere?

Great question -- we probably should document the fields in model.metadata in our data catalog, but currently we don't, so no action currently needed.

python/comps.py

…o-the-code' of github.com:ccao-data/model-res-avm into 405-persist-all-possible-significant-sales-algorithms-to-the-code Merge

wagnerlmichael · 2026-02-25T15:37:01Z

python/comps.py

            f"(n_comparisons, n_trees), got {weights.ndim}-D"
        )

+    # Avoid editing the df in-place


Would you add more extensive documentation about the reason for adding this here? @jeancochrane

Yeah, I think so. I actually don't even understand why we need this. Do we mutate the observation dataframe later on?

In the following chunk, when get_comps runs

# Test with matrix weights (error_reduction style) tree_weights_matrix = np.asarray( [np.random.dirichlet(np.ones(num_trees)) for _ in range(num_comparisons)] ) start = time.time() get_comps(leaf_nodes, training_leaf_nodes, tree_weights_matrix) end = time.time() print(f"get_comps (matrix weights) runtime: {end - start}s")

this code that creates (in-place) a new column in the observation_df within the function also edits the leaf_nodes data frame out of the scope of the function

# Chunk the observations so that the script can periodically report progress observation_df["chunk"] = pd.cut( observation_df.index, bins=num_chunks, labels=False )

such that when we finish the matrix test and move onto the vector test, the leaf_nodes object column number has been increased to 500 to 501, which causes our value error tests to catch a dimension mismatch

# Test with vector weights (unweighted / prediction_variance style) tree_weights_vector = np.random.dirichlet(np.ones(num_trees)) start = time.time() get_comps(leaf_nodes, training_leaf_nodes, tree_weights_vector) end = time.time() print(f"get_comps (vector weights) runtime: {end - start}s")

Here is a reproducible isolated example that I think replicates the behaviour:

import pandas as pd import numpy as np # Create toy dataframes leaf_nodes = pd.DataFrame(np.random.randint(0, 10, size=[5, 3])) print("Before:", leaf_nodes.shape) # (5, 3) def add_chunk_column(observation_df): observation_df["chunk"] = [0, 0, 1, 1, 1] add_chunk_column(leaf_nodes) print("After:", leaf_nodes.shape) # (5, 4) print(leaf_nodes)

Does this make sense? I feel like pandas inplace trickiness always gets me

Ahh right! Thanks for the clear explanation. I see now that the mutation happens literally on the next line lol, my bad for missing it 🤦🏻‍♀️ Since we perform the mutation immediately after this copy operation, I don't actually think we need to document the decision any more thoroughly than this.

No prob! Got it, sounds good

jeancochrane

This is good to go. Nice work you two!

4 working functions

5bf04c5

wagnerlmichael linked an issue Feb 18, 2026 that may be closed by this pull request

Persist all possible significant sales algorithms to the code #405

Closed

wagnerlmichael added 8 commits February 18, 2026 21:46

Update comps docs

22e38e5

Add algorithm param for extract_tree_weights

5b655cf

Add input checking

6f0ece5

Adjust tree_weights shape checking

363b108

Test vector inclusion

123dd81

Attempt using only 1 get_top_comps function

4c0d8cc

Fix message

b52fbd8

Switch boolean check

1923e19

wagnerlmichael commented Feb 20, 2026

View reviewed changes

python/comps.py Show resolved Hide resolved

wagnerlmichael commented Feb 20, 2026

View reviewed changes

python/comps.py Show resolved Hide resolved

wagnerlmichael added 3 commits February 20, 2026 19:58

Style

972772c

Format

1f6bf19

Lint

365dee4

wagnerlmichael commented Feb 20, 2026

View reviewed changes

pipeline/04-interpret.R Show resolved Hide resolved

wagnerlmichael and others added 2 commits February 20, 2026 20:39

Lint

fa8ce67

Merge branch 'master' into 405-persist-all-possible-significant-sales…

b433e60

…-algorithms-to-the-code

wagnerlmichael changed the title ~~[WIP] Implement 4 config options for comps methodology~~ Implement 4 configurable options for comps algorithm methodology Feb 20, 2026

wagnerlmichael changed the title ~~Implement 4 configurable options for comps algorithm methodology~~ Implement configurable options for comps algorithm methodology Feb 20, 2026

wagnerlmichael marked this pull request as ready for review February 20, 2026 21:01

wagnerlmichael requested review from jeancochrane and wrridgeway as code owners February 20, 2026 21:01

Damonamajor reviewed Feb 20, 2026

View reviewed changes

jeancochrane reviewed Feb 20, 2026

View reviewed changes

pipeline/04-interpret.R Show resolved Hide resolved

R/helpers.R Outdated Show resolved Hide resolved

R/helpers.R Outdated Show resolved Hide resolved

R/helpers.R Outdated Show resolved Hide resolved

wagnerlmichael and others added 2 commits February 23, 2026 09:01

Update R/helpers.R

d32eb89

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update R/helpers.R

441c4cd

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

wagnerlmichael and others added 2 commits February 23, 2026 11:47

Update R/helpers.R

5ca1ed7

Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

Update tests to accomdate vector weight support

1e32a95

wagnerlmichael commented Feb 23, 2026

View reviewed changes

python/tests/test_comps.py Show resolved Hide resolved

wagnerlmichael and others added 9 commits February 23, 2026 21:06

Remove redundant test

899450c

Fix failing Python test infrastructure on CI

4e269fc

improve commenting

3396b13

Merge branch 'master' into 405-persist-all-possible-significant-sales…

f0d3540

…-algorithms-to-the-code

Make sure to install test dependencies in test workflow

7136808

Fix incorrect path reference in interpret stage

82605f8

CHeck to make sure weights sum to 1

1200ea3

Merge branch '405-persist-all-possible-significant-sales-algorithms-t…

39d4d03

…o-the-code' of github.com:ccao-data/model-res-avm into 405-persist-all-possible-significant-sales-algorithms-to-the-code Merge changes`

Merge branch 'jeancochrane/fix-failing-test-workflow' into 405-persis…

51c582c

…t-all-possible-significant-sales-algorithms-to-the-code Merge in fix for failing tests workflow

wagnerlmichael requested a review from jeancochrane February 23, 2026 21:49

Damonamajor and others added 2 commits February 23, 2026 15:54

Update helpers.R

1cfb285

Merge branch 'master' into 405-persist-all-possible-significant-sales…

6bb28ec

…-algorithms-to-the-code

jeancochrane temporarily deployed to deploy February 24, 2026 16:31 — with GitHub Actions Inactive

jeancochrane reviewed Feb 24, 2026

View reviewed changes

Damonamajor added 2 commits February 24, 2026 15:34

Add param to finalize.R

dc09147

alphabetize

2215b3d

jeancochrane reviewed Feb 24, 2026

View reviewed changes

python/comps.py Show resolved Hide resolved

wagnerlmichael added 2 commits February 25, 2026 15:35

Add in-place adustment

0d55dc1

Merge branch '405-persist-all-possible-significant-sales-algorithms-t…

a775f6e

…o-the-code' of github.com:ccao-data/model-res-avm into 405-persist-all-possible-significant-sales-algorithms-to-the-code Merge

wagnerlmichael commented Feb 25, 2026

View reviewed changes

Remove space

ffa4f1b

jeancochrane approved these changes Feb 26, 2026

View reviewed changes

jeancochrane mentioned this pull request Feb 26, 2026

Update 2026 comps run to use unweighted algorithm ccao-data/data-architecture#991

Merged

wagnerlmichael merged commit a0a8f81 into master Feb 26, 2026
6 checks passed

wagnerlmichael deleted the 405-persist-all-possible-significant-sales-algorithms-to-the-code branch February 26, 2026 17:47

	comp_enable = comp_enable,
	comp_num_comps = params$comp$num_comps,

Conversation

wagnerlmichael commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wagnerlmichael Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Damonamajor Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Damonamajor commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeancochrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wagnerlmichael commented Feb 23, 2026

Uh oh!

Uh oh!

jeancochrane left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Damonamajor commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeancochrane commented Feb 24, 2026

Uh oh!

Uh oh!

wagnerlmichael Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wagnerlmichael Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

wagnerlmichael Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wagnerlmichael commented Feb 18, 2026 •

edited

Loading

Damonamajor Feb 20, 2026 •

edited

Loading

Damonamajor commented Feb 20, 2026 •

edited

Loading

jeancochrane left a comment •

edited

Loading

Damonamajor commented Feb 24, 2026 •

edited

Loading

jeancochrane Feb 25, 2026 •

edited

Loading