Consolidate significant sales qc analysis doc by Damonamajor · Pull Request #151 · ccao-data/homeval

Damonamajor · 2025-11-26T17:16:24Z

This modifies the existing sales_qc doc to compare multiple runs. It is currently set to use 2025 run id's, but that can be modified to use the 2024 runs by uncommenting out the params.

It groups the output table by run_id and creates tabsets for the charts of the characteristics. These are modified to use the same sample of 2000 pins for each run_id.

Damonamajor · 2025-11-26T17:18:14Z

reports/algorithm-comparison/algorithm-comparison.qmd

+  description:
+    - "Unweighted"
+    - "Unweighted (error-reducing trees only)"
+    - "Error reduction"


Error Reduction and Error reduction (random) appear to be identical.

Actually it appears that there are differences for multi-card properties only and appear to be inversed.

billy <- dbGetQuery( conn, glue(" select * from model.comp where run_id = '2025-04-25-fancy-free-billy' ") ) bowen <- dbGetQuery( conn, glue(" select * from model.comp where run_id = '2025-03-26-vibrant-bowen' ") ) data <- rbind(billy, bowen) comp_cols <- paste0("comp_pin_", 1:5) diffs <- data %>% group_by(pin, card) %>% mutate( across(all_of(comp_cols), ~n_distinct(.x, na.rm = TRUE) > 1, .names = "diff_{col}") ) %>% filter(if_any(starts_with("diff_"), ~ .x)) %>% ungroup()

[Suggestion, required] Oops, you're right! I think I just misread the run descriptions for two of these runs. Instead of 2025-03-26-vibrant-bowen, we want to use 2025-02-11-charming-eric for the "Error reduction (semi-random)" run.

Damonamajor · 2025-11-26T17:18:38Z

reports/algorithm-comparison/algorithm-comparison.qmd

+    - "Error reduction"
+    - "Error reduction (semi-random)"
+    - "Prediction variance"
+  keep_top_n_comps: 5


Moved these to params even though it wasn't specified since it seems better than hardcoding the first line.

Damonamajor · 2025-11-26T17:19:30Z

reports/algorithm-comparison/algorithm-comparison.qmd

+  }
+)
+
+comps <- map(data, "comps") %>%


I tried filtering outputs by triad, but it resulted in NA values, I'm presuming from the preds_pin preds_card, etc. It keeps in the filtering around line 387.

[Thought, non-blocking] I think it's expected that we would see null values if we tried to filter the comps dataframe by triad, because it's possible for comps to be in different triads than their target properties. However, all of our targets should be in the reassessment triad, so I would expect that filtering by triad should work for preds_pin and preds_card. That being said, we filter targets by triad on line 385 below, so there's no particular reason to do it here unless we're facing memory problems and want to be even more strict about our memory use.

Damonamajor · 2025-11-26T17:20:00Z

reports/algorithm-comparison/algorithm-comparison.qmd

+    {agg_log_char_sql('avg', 'char_land_sf', floor = 1)},
+    stddev(cast(char_yrblt as double)) as agg_stddev_yrblt,
+    {agg_log_char_sql('stddev', 'char_bldg_sf', floor = 1)},
+    {agg_log_char_sql('stddev', 'char_land_sf', floor = 1)}


Removes all beds and baths from entirity of the report.

Damonamajor · 2025-11-26T17:20:56Z

reports/algorithm-comparison/algorithm-comparison.qmd

+    )
+  ) %>%
+  # Filter for only cards in the selected tri
+  filter(town_get_triad(target_township_code) == triad)


for some reason prefixing this with ccao:: doesn't work. Don't know if that has to do with positron, but just flagging it.

[Question, non-blocking] Hmm, we shouldn't need the ccao:: prefix because we explicitly load the package via a library() call on line 39, but it should still work with the prefix. It seems to work fine when I mess around with it locally. If you're interested in debugging this, can you share the error message you're seeing? I don't think it's a huge deal either way though, because leaving out the prefix here is perfectly acceptable (and is in fact the idiomatic thing to do given that we're explicitly loading the package).

I ran it again and it works for me too. The only thing I can think of is that I think renv::snapshot may have updated the package and it was between two iterations since I believe there were some modifications in it recently.

Damonamajor · 2025-11-26T17:21:37Z

reports/algorithm-comparison/algorithm-comparison.qmd

+# Take a sample of target properties with sales to plot using plotly. Sample
+# because using all properties makes the plots too large to render
+# Select the same sample for all runs
+sample_pins <- comps_by_pin_sales_agg %>%


Use the same sample for each run.

I checked the card numbers, there are 7million card 1's and just 10,000 other card values. I don't know if we want to have them joined based on card or filter multi-cards out or just leave it as is since it seems like a tiny amount.

[Thought, non-blocking] That's an interesting point, though I say we leave them in for now for the sake of convenience.

Damonamajor · 2025-11-26T17:22:19Z

reports/algorithm-comparison/algorithm-comparison.qmd

+  "Target Sale Price",
+  "Avg. Comp Sale Price",
+  "Class",
+  id_prefix = "sale_price_class"


id_prefix is used to separatre the html tags so they don't interact with each other. Don't exactly understand it, but it makes it work.

[Thought, non-blocking] I think the reason that the IDs are important is that Bootstrap uses element IDs to determine which elements a tab should control, so if you reuse IDs, the tabs won't switch properly. Not 100% confident in that answer, but either way, setting different IDs for different tabsets is a good idea!

jeancochrane

Great work here! A couple small suggestions below, but nothing serious.

jeancochrane · 2025-12-01T21:47:39Z

reports/algorithm-comparison/algorithm-comparison.qmd

+)
+
+s3_bucket <- "s3://ccao-model-results-us-east-1"
+run_ids <- params$run_id


[Nitpick, optional] Doesn't seem like we're using this variable, so we might as well scrap it for the sake of simplicity:

Suggested change

run_ids <- params$run_id

jeancochrane · 2025-12-01T22:44:04Z

reports/algorithm-comparison/algorithm-comparison.qmd

+    )
+  ) %>%
+  # Filter for only cards in the selected tri
+  filter(town_get_triad(target_township_code) == triad)


[Question, non-blocking] Hmm, we shouldn't need the ccao:: prefix because we explicitly load the package via a library() call on line 39, but it should still work with the prefix. It seems to work fine when I mess around with it locally. If you're interested in debugging this, can you share the error message you're seeing? I don't think it's a huge deal either way though, because leaving out the prefix here is perfectly acceptable (and is in fact the idiomatic thing to do given that we're explicitly loading the package).

jeancochrane · 2025-12-01T22:47:02Z

reports/algorithm-comparison/algorithm-comparison.qmd

+  }
+)
+
+comps <- map(data, "comps") %>%


[Thought, non-blocking] I think it's expected that we would see null values if we tried to filter the comps dataframe by triad, because it's possible for comps to be in different triads than their target properties. However, all of our targets should be in the reassessment triad, so I would expect that filtering by triad should work for preds_pin and preds_card. That being said, we filter targets by triad on line 385 below, so there's no particular reason to do it here unless we're facing memory problems and want to be even more strict about our memory use.

jeancochrane · 2025-12-01T22:48:00Z

reports/algorithm-comparison/algorithm-comparison.qmd

+# Take a sample of target properties with sales to plot using plotly. Sample
+# because using all properties makes the plots too large to render
+# Select the same sample for all runs
+sample_pins <- comps_by_pin_sales_agg %>%


[Thought, non-blocking] That's an interesting point, though I say we leave them in for now for the sake of convenience.

jeancochrane · 2025-12-01T22:50:43Z

reports/algorithm-comparison/algorithm-comparison.qmd

+  "Target Sale Price",
+  "Avg. Comp Sale Price",
+  "Class",
+  id_prefix = "sale_price_class"


[Thought, non-blocking] I think the reason that the IDs are important is that Bootstrap uses element IDs to determine which elements a tab should control, so if you reuse IDs, the tabs won't switch properly. Not 100% confident in that answer, but either way, setting different IDs for different tabsets is a good idea!

jeancochrane · 2025-12-01T22:58:26Z

reports/algorithm-comparison/algorithm-comparison.qmd

+  description:
+    - "Unweighted"
+    - "Unweighted (error-reducing trees only)"
+    - "Error reduction"


[Suggestion, required] Oops, you're right! I think I just misread the run descriptions for two of these runs. Instead of 2025-03-26-vibrant-bowen, we want to use 2025-02-11-charming-eric for the "Error reduction (semi-random)" run.

jeancochrane · 2025-12-02T02:31:34Z

reports/algorithm-comparison/algorithm-comparison.qmd

+}
+```
+
+## Topline Aggregate Stats


[Nitpick, required] In these charts, could we use a more descriptive name than description for the header on the column that displays the algorithm that the run used? Even just Algorithm would be clearer in my opinion.

jeancochrane · 2025-12-02T02:32:06Z

reports/algorithm-comparison/algorithm-comparison.qmd

@@ -0,0 +1,1132 @@
+---
+title: "Algorithm-Comparison"


[Nitpick, required] Let's make it clear up front what kind of algorithm we're comparing:

Suggested change

title: "Algorithm-Comparison"

title: "Comps Algorithm Comparison"

jeancochrane · 2025-12-02T02:34:44Z

reports/algorithm-comparison/algorithm-comparison.qmd

+}
+```
+
+## Topline Aggregate Stats


[Suggestion, optional] Since these charts require horizontal scrolling, it would be super handy to freeze the leftmost identifier columns so that we can keep track of which rows we're comparing. For each chart, I think those columns should be:

Overall

Algorithm (AKA description)

By Township

Triad

Township

Algorithm

By Class

Class

Algorithm

Damonamajor added 8 commits November 12, 2025 15:55

initial positron push

2e90fd9

working except for html formatting

dbe32c2

Working written out explicetely

c772599

working with looping

41fbdf6

re-add accidentally deleted chunk

ecdce73

remove weighting

e907b99

remove beds

ef194f2

lintr and remove bed/baths

bad660c

Damonamajor linked an issue Nov 26, 2025 that may be closed by this pull request

Consolidate significant sales QC analysis doc #146

Closed

Damonamajor commented Nov 26, 2025

View reviewed changes

Damonamajor self-assigned this Nov 26, 2025

Damonamajor requested a review from jeancochrane November 26, 2025 17:25

Damonamajor and others added 6 commits December 1, 2025 10:57

Update algorithm-comparison.qmd

b9849a4

Use old qmd file

0c91a2a

Use new qmd file

1c1cf5e

Remove Weighting from Name

d2a4357

remove first line

89bc783

switch to target class and township

252ac8d

Damonamajor marked this pull request as ready for review December 1, 2025 18:04

jeancochrane approved these changes Dec 2, 2025

View reviewed changes

Damonamajor and others added 4 commits December 2, 2025 18:43

Jean edits

0ace6e3

include the correct script

aadd62b

renv

9e79754

Update algorithm-comparison.qmd

4455be0

remove ccao::

7a5e40a

Damonamajor merged commit 9bb395b into main Dec 5, 2025
1 check passed

Damonamajor deleted the 146-consolidate-significant-sales-qc-analysis-doc branch December 5, 2025 16:41

	title: "Algorithm-Comparison"
	title: "Comps Algorithm Comparison"

Conversation

Damonamajor commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Damonamajor Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Damonamajor Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeancochrane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Damonamajor commented Nov 26, 2025 •

edited

Loading

Damonamajor Nov 26, 2025 •

edited

Loading

Damonamajor Dec 1, 2025 •

edited

Loading