Skip to content

Add meta_pin_num_card to proration grouping#381

Closed
Damonamajor wants to merge 18 commits intomasterfrom
371-update-proration-modeling
Closed

Add meta_pin_num_card to proration grouping#381
Damonamajor wants to merge 18 commits intomasterfrom
371-update-proration-modeling

Conversation

@Damonamajor
Copy link
Contributor

@Damonamajor Damonamajor commented May 30, 2025

Quick fix to a long problem, to add meta_pin_num_card to grouping.

@Damonamajor Damonamajor changed the title Update proration Update proration to group by char_bldg_sf May 30, 2025
@Damonamajor Damonamajor self-assigned this May 30, 2025
@Damonamajor
Copy link
Contributor Author

@jeancochrane @wagnerlmichael
The thing that I've been noticing is that prorated pins are often multi-card. In our multi-card aggregating, I assume that we are summing sq_footage as though the entirety of that value is on one pin, where my understanding is that it should be shared between two pins. Is that a correct assumption?

@Damonamajor Damonamajor marked this pull request as ready for review June 2, 2025 22:21
@wagnerlmichael
Copy link
Member

wagnerlmichael commented Jun 3, 2025

@jeancochrane @wagnerlmichael The thing that I've been noticing is that prorated pins are often multi-card. In our multi-card aggregating, I assume that we are summing sq_footage as though the entirety of that value is on one pin, where my understanding is that it should be shared between two pins. Is that a correct assumption?

I don't think we currently modify a prorated card's square footage contribution to the frankencard total sqft number. Perhaps it is worth testing that out, it makes sense to me intuitively. Could be worth consulting with valuations

group_by(meta_tieback_key_pin, meta_card_num, char_bldg_sf) %>%
mutate(
pred_card_intermediate_fmv = ifelse(
is.na(meta_tieback_key_pin),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to then use our 2 / 3 card technique below this, which I'm not sure is how we want it to interact with prorated pins.

@wrridgeway
Copy link
Member

Is this supposed to be a draft still?

@Damonamajor
Copy link
Contributor Author

Damonamajor commented Oct 9, 2025

Is this supposed to be a draft still?

yup, I want to check the model runs of 2025-10-05-didactic-rob and 2025-10-03-compassionate-gabe

@Damonamajor
Copy link
Contributor Author

Damonamajor commented Oct 9, 2025

library(noctua)
library(DBI)
library(readr)
library(dplyr)

noctua_options(cache_size = 10)

con <- dbConnect(noctua::athena())

sql_query <- "
select meta_pin, meta_card_num, pred_card_initial_fmv
  from model.assessment_card
where run_id IN ('2025-10-05-didactic-rob', '2025-10-03-compassionate-gabe')
"

data <- dbGetQuery(conn = con, statement = sql_query)

test <- data %>%
  group_by(meta_pin, meta_card_num) %>%
  summarize(distinct_values = n_distinct(pred_card_initial_fmv))

# Show the table of distinct values
table(test$distinct_values)

Further feedback on this means that we need to figure out where diferences in final pin value come from

@Damonamajor
Copy link
Contributor Author

OK, it's ready now.

@Damonamajor Damonamajor changed the title Update proration to group by char_bldg_sf Remove the split-card averaging from pipeline Oct 9, 2025
@Damonamajor
Copy link
Contributor Author

Damonamajor commented Oct 10, 2025

The changed file now includes the steps which I used to reduce the mismatched errors for pred_pin_final_fmv_round. For this, I ran stage 2 of the pipeline with the changes here and here and without. These can be understood as the modifications which would remove the Jean's proration's step from the pipeline.

Without any modifications, there are ~50,000 pins which had a different value for pred_pin_final_fmv_round. There are two key reasons why this happens. The first is that we round values before we do proration. To remedy this, I just replaced the value of pred_pin_final_fmv_round_no_prorate with pred_pin_final_fmv_round.

Then, we cap and assess land rates for prorated PINs. Since we now have modified values, these are going to be different. Because of this, I averaged the values of pred_pin_final_fmv_round_no_prorate grouped by the tieback. This is not what should be done, but does serve to match the two different runs.

At the end, there were 52 values which still did not match.

@Damonamajor Damonamajor changed the title Remove the split-card averaging from pipeline Add meta_pin_num_card to proration grouping Oct 21, 2025
@Damonamajor
Copy link
Contributor Author

In the meantime, I'm changing the scope of this issue to reflect the one change that we are confident we need to make to the modeling pipeline: Excluding 2-3 card PINs with different numbers of cards from the card-level prediction averaging step. This is the block code that needs to change to exclude 2-3 card PINs with different numbers of cards:

So we want the same grouping just with an additional value of pin_num_cards or the equivilent?

@jeancochrane

@Damonamajor
Copy link
Contributor Author

@jeancochrane Can we close this until further notice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exclude prorated 2-3 card PINs with different card numbers from card-level averaging in assess stage

4 participants