Skip to content

consider removing rows where scenario data is not available #10

@cjyetman

Description

@cjyetman

https://github.com/RMI-PACTA/pacta.data.preparation/blob/ba0f8b8518afb2d00bfe5d9bff1a935418eaa5dd/R/dataprep_abcd_scen_connection.R#L267-L303

When the scenario data is left_joined with the ABCD data, it's possible/likely that some rows of the ABCD data will not match any rows in the scenario data by = c("scenario_geography", "year", "ald_sector", "technology"), and therefore the columns from the scenario data that are added (scenario_source, scenario, units, direction, fair_share_perc) will be filled with NA for those rows. Are these rows useful at all after this point?

I think we should carefully consider whether these lines with no scenario data are meaningful for any reason, and if not we should filter them out to potentially reduce the size of the data substantially. @jacobvjk @jdhoffa @AlexAxthelm

It's possible we do want at least one row of the ABCD data to be left in place even if no scenario data matches it, in which case we'll need something more sophisticated... though the scenario_geography and equity_market columns will make multiple rows distinct even while the rest of the data is duplicated?

related #7

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions