Conversation
zmbc
commented
Oct 24, 2025
- Make the file work if get_draws is not installed
- Don't require MEIDs when using gbd_mapping source
src/vivarium_helpers/lbwsg/lbwsg.py
Outdated
| elif source == 'get_ids': | ||
| cats = get_category_to_meid_map(lbwsg_exposure_df, validate=validate) | ||
| else: | ||
| cats = lbwsg_exposure_df.reset_index()[['parameter']].drop_duplicates().set_index('parameter') |
There was a problem hiding this comment.
So this is the gbd_mapping case (hmm, should I have made that more explicit...?)? Maybe add a comment here to explain what this is doing and why it works? Will the result of this be an empty DataFrame with the categories in the index?
There was a problem hiding this comment.
Ok, now that I spent a little more time looking over this function, I think I'm remembering what I intended here (I wish I had documented it better...).
I think my intention was that the cat_df dataframe would always contain the MEID, in the spirit of recording as much information as possible about the category so you could easily look it up later with the GBD shared functions. Since gbd_mapping does not include the MEID, the MEID needs to come from somewhere else, either the hard-coded values in the module, or an exposure dataframe from GBD (which is where the hard-coded values came from in the first place). The source parameter was then a way to map MEIDs or category names to category descriptions, namely get_ids, gbd_mapping, or a user-provided dictionary.
In particular, if I'm understanding correctly, I think the new logic is confusing things a bit (again, my fault for not documenting anything). Namely, I don't think it's exactly correct to change the else clause to elif source == 'get_ids', because the code in that block doesn't rely on get_ids at all, it just needs the exposure dataframe.
The PR description says the goal is "Don't require MEIDs when using gbd_mapping source," but I think MEIDs are already not "required" in this case, in the sense that the user doesn't need to pass them in: If the user passes 'gbd_mapping' for source and does not provide an exposure dataframe, then the MEIDs will be read from the hard-coded dictionary instead of a user-supplied source. Were you running into some kind of error with the MEIDs when using the gbd_mapping source?
If the intention is to simply omit the MEIDs when using the gbd_mapping source instead of reading the hard-coded values, then it seems like maybe it would be better to add another boolean flag, include_meids, and adjust the logic accordingly. If you don't want to include MEIDs, and you want to use the GBD mapping source, then you shouldn't need the cats Series/DataFrame at all, since you get all the information you want from gbd_mapping directly. So you could pass in None for lbwsg_exposure_df, and the code could skip reading the meid_map from either the dataframe or the hard-coded dictionary (the final construction of the returned DataFrame would have to be changed, since the point of this code in the gbd_mapping case was to merge in the MEIDs).
Does that all make sense? I haven't tried running this code recently, so I could be misremembering or misunderstanding things and missing the mark.
There was a problem hiding this comment.
@zmbc Oh, I realized: I bet you tried passing Artifact data in for the LBWSG exposure distribution. That won't work because all the functions in this module were designed to work with get_draws output, not Artifact data, so if you don't have get_draws output available, the correct way to call the function is to pass None. The ability to even pass an exposure dataframe here was a recent addition I made when updating to GBD 2021, so that I could verify the MEIDs were the same. Previously, the only parameter to this function was source, and it always just read the hard-coded MEID dictionary.
So perhaps the new functions signature is a bit confusing, and I've never liked the hard-coding aspect of it. An alternative would be:
get_category_descriptions(
source='gbd_mapping',
cat_to_meid_map=None,
validate=None,
)Then calling with the default values would use gbd_mapping to get the category descriptions, bypassing the MEIDs, which would not be included in the result. If you want the MEIDs, you could pass in either a dictionary (e.g., explicitly pass in the hard-coded dict in the module) or Series for cat_to_meid_map, or you could pass in a dataframe that contains both categories and MEIDs as columns, like an exposure dataframe from get_draws. If you switched the source to get_ids, then the cat_to_meid_map would have to be provided since in that case you need to go through MEIDs to get the description. And validate would only be applicable if cat_to_meid_map is not None.
Does that seem like a reasonable approach? Maybe it's a bit complicated, but the goal was to make things more explicit...
There was a problem hiding this comment.
@NathanielBlairStahn I've updated this PR with something along the lines of what you said, but a bit simpler. I separated out the validation of the LBWSG exposure dataframe from the getting of the metadata. I didn't make the cat_to_meid_map configurable but I made the default gbd_mapping.
There was a problem hiding this comment.
Copying over my comment from Slack:
Nathaniel Blair-Stahn
Today at 10:14 AM
Hey @Zeb, I reviewed your PR after you made the update and was going to request a change, but I never got around to it due to Alzheimer's deadlines. I forget exactly what I was thinking, but I think the basic idea was that I was going to request restoring the ability to pass in a dataframe in casesource='get_ids', to avoid relying on either the hard-coded values or ongbd_mapping, both of which could get out of sync with the latest round of GBD. I may have had some idea how to do this that was simpler than my original proposal, but I forget...
10:18
Oh, maybe my idea was not to pass in a dataframe, but to allow passing in the Series returned byget_category_to_meid_map, which could default to the hardcoded dictionary if we want. (edited)
- Makes gbd_mapping the default source of LBWSG categories - Separates loading from validation (new, separate validation function) - Adds some rudimentary docstrings