Suggestions#289
Conversation
316db9a to
eb84b11
Compare
Fuhan-Yang
left a comment
There was a problem hiding this comment.
My lesson learned here is how to code in a more OOP way. It seems to me is that to put signatures as class attributes as much as it makes sense (which enables handling by instance method or classmethod) is more OOP than having external signatures handled by staticmethod. There are questions about defining groups as a global variable, see below!
|
|
||
| class CoverageModel(abc.ABC): | ||
| groups = ["season", "geography"] | ||
|
|
There was a problem hiding this comment.
I wonder why groups is coded this way rather than having a global variable defined in iup.utils? This way hard-code groups for multiple classes (including RFModel), which makes it harder to debug than having a global variable governing groups for all classes? In addition, this way will lead to importing CoverageModel to retrieve groups in scripts/eval.py, which I feel is not that efficient?
There was a problem hiding this comment.
The more I think about this, the more I think we should just write out ["season", "geography"] everywhere.
First, if we ever do want to add different groups to this codebase, then we're going to need to figure out what to do with the LPL model and its crossterms. Any design of code that allows for different groups is wholly speculative, since we haven't figured out the science first.
Second, the word "groups" gets used in two ways. It refers to the things that the models use (which includes season_geo for the LPL model, but not for the RF) and to the things used in scoring and visualization (which does not include season_geo, even for the LPL model). So it's not even the case that, as I wrote it, we can say the "groups" for the LPL models are season, geo, season_geo and the groups for the RF model are season, geo, since for the LPL model the "groups" sometimes include season_geo and sometimes don't.
There was a problem hiding this comment.
Seems to me this is the discrepancy between what we plan to do versus what the final results to be: what we plan to do is to differ the groups for fitting versus the groups for postprocessing, in this case it makes sense to define groups for fitting and for postprocessing separately. While the final results are that only the groups used for LPLModel fitting needs cross-terms, that's why I tend to add the cross-terms in the LPLModel.preprocess() and it will only use for fitting. I guess it is ok to use either method as it achieve the same functionality? I'm still a bit bothered that, if pursuing the track of what we plan to do, that the groups are separately hard-coded for abstract class and its subclass, and importing the groups from abstract class in scripts...seems to me it's bug-prone, especially if in the future different groups will be added...But if we decide this is just for current version of only considering season and geo, I guess this is ok too? All of this is pretty subtle...
There was a problem hiding this comment.
I redid this, writing out the groups everywhere, and it's not so bad. I think it actually makes things a little clearer.
| """ | ||
|
|
||
| groups = ["season", "geography", "season_geo"] | ||
|
|
There was a problem hiding this comment.
Similarly, hard-coding groups in LPLModel independent from CoverageModel feels little finicky...
| iup.CumulativeCoverageData(data.rename({"sample_size": "N_tot"})) | ||
| @classmethod | ||
| def _preprocess(cls, data: pl.DataFrame) -> pl.DataFrame: | ||
| out = ( |
There was a problem hiding this comment.
Making _preprocess() to be a class method is more OOP than static method, which is nice! Does this lead to hard-coding groups (since groups is not processed here, it needs to be defined somewhere)?
| import iup.models | ||
| from iup.utils import date_to_season | ||
|
|
||
| GROUPS = iup.models.CoverageModel.groups |
There was a problem hiding this comment.
This is where I feel finicky about importing a global variable from an abstract class instead of utils...
alphainto the top-level config, rather than being mixed across models and plotsscripts/preprocess.py-- this means that the season starts are no longer model parametersseedinto the model paramsLPLModela single_preprocess()method that does both the column renaming and the indexing (via_index())