Skip to content
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions pipeline/03-evaluate.R
Original file line number Diff line number Diff line change
Expand Up @@ -120,16 +120,20 @@ gen_agg_stats <- function(data, truth, estimate, bldg_sqft,
sum_fns_list <- list(
min = \(x) min(x, na.rm = TRUE),
q25 = \(x) quantile(x, na.rm = TRUE, probs = 0.25),
mean = \(x) mean(x, na.rm = TRUE),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, non-blocking] I haven't spent much time with this metrics code, so I may just be lacking context here, but can you explain why we only want to add mean/SD metrics to the sum_fns_list and sum_sqft_fns_list, and not ys_fns_list or yoy_fns_list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ys_fns_list is ml specific metrics, but i don't see why we couldn't add it to yoy_fns_list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add to yoy_fns_list if desired. I'd need to go back and look at those calculations, but I believe they are year over year changes expressed as percentages- in which case, I would want to be cautious about how we interpret those (depending on whether those numbers already represent an aggregate/how exactly that is calculated). I'll need to re-read the code and get back to you.
(For a little call-out on one of my cautions- see the approx. the 4th paragraph here

median = \(x) median(x, na.rm = TRUE),
q75 = \(x) quantile(x, na.rm = TRUE, probs = 0.75),
max = \(x) max(x, na.rm = TRUE)
max = \(x) max(x, na.rm = TRUE),
sd = \(x) sd(x, na.rm = TRUE)
)
sum_sqft_fns_list <- list(
min = \(x, y) min(x / y, na.rm = TRUE),
q25 = \(x, y) quantile(x / y, na.rm = TRUE, probs = 0.25),
median = \(x, y) median(x / y, na.rm = TRUE),
mean = \(x, y) mean(x / y, na.rm = TRUE),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we choose whether to have mean before or after median for consistency's sake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved mean in sum_funs_list to after median for consistency

q75 = \(x, y) quantile(x / y, na.rm = TRUE, probs = 0.75),
max = \(x, y) max(x / y, na.rm = TRUE)
max = \(x, y) max(x / y, na.rm = TRUE),
sd = \(x, y) sd(x / y, na.rm = TRUE)
)
yoy_fns_list <- list(
min = \(x, y) min((x - y) / y, na.rm = TRUE),
Expand Down
Loading