ihmeuw
diff --git a/‎docs/source/models/causes/neonatal/index.rst‎
Lines changed: 13 additions & 4 deletions b/‎docs/source/models/causes/neonatal/index.rst‎
Lines changed: 13 additions & 4 deletions
diff --git a/‎docs/source/models/causes/neonatal/preterm_birth.rst‎
Lines changed: 87 additions & 25 deletions b/‎docs/source/models/causes/neonatal/preterm_birth.rst‎
Lines changed: 87 additions & 25 deletions
diff --git a/‎docs/source/models/concept_models/vivarium_mncnh_portfolio/anemia_component/module_document.rst‎
Lines changed: 22 additions & 1 deletion b/‎docs/source/models/concept_models/vivarium_mncnh_portfolio/anemia_component/module_document.rst‎
Lines changed: 22 additions & 1 deletion
@@ -208,7 +208,11 @@ unit time.
 Modeling Strategy
 +++++++++++++++++
 
-The neonatal death model requires only the probability of death (aka "mortality risk") for the early and late neonatal time periods. Rather than using GBD mortality rates and converting them into probability of deaths, we will use mortality risk as direct input data into our model. We will calculate mortality risk input data as age-specific death counts divided by live birth counts from GBD.
+The neonatal death model requires only the probability of death (aka "mortality risk") for the early and late neonatal time periods.
+These mortality risks are age-group-, sex-, and location-specific.
+For brevity, sex and location subscripts are omitted in all equations.
+
+Rather than using GBD mortality rates and converting them into probability of deaths, we will use mortality risk as direct input data into our model. We will calculate mortality risk input data as age-specific death counts divided by live birth counts from GBD.
 
 Note that this strategy does not require any conversion between rates to probabilities NOR does it require any scaling to the duration of the age group. The mortality risk calculated as described below already represents the probability of dying within a neonatal age group and can be used directly as such in the simulation.
 
@@ -231,7 +235,8 @@ and for a given cause of death:
 
 Note that this strategy was updated in May of 2025 from a prior strategy of converting GBD mortality rates to probabilities. `The pull request that updated this strategy can be found here for reference. <https://github.com/ihmeuw/vivarium_research/pull/1654>`_ This strategy update was pursued following verification and validation issues in neonatal mortality and an exploration of potential solutions in model runs 6.1 through 6.4. Ultimately, a change from mortality rates to mortality risk was preferred given that it is the more policy relevant measure in the context of neonates, and accurately apportioning person time alive within the neonatal age group given the input data available to us was a challenge we judged to be unnecessary.
 
-The calculation of :math:`\text{ACMRisk}_i` (the all-cause mortality risk for a single simulant, :math:`i`) is a bit complicated, however. We begin with a population ACMRisk and use the LBWSG PAF to derive a risk-deleted ACMRisk to which we can then apply the relative risk of LBWSG matching any risk exposure level.  Mathematically this is achieved by the following formula:
+The calculation of :math:`\text{ACMRisk}_i` (the all-cause mortality risk for a single simulant, :math:`i`) is a bit complicated, however. We begin with a population ACMRisk and use the LBWSG PAF to derive a risk-deleted ACMRisk to which we can then apply the relative risk of LBWSG matching any risk exposure level. Mathematically this is achieved by the following formula.
+Starting with this equation, we omit age group subscripts for brevity; all quantities are still age-, sex-, and location-specific.
 
 .. math::
     \begin{align*}
@@ -253,7 +258,6 @@ where :math:`\text{BW}_i` and :math:`\text{GA}_i` are the birth weight and gesta
 and :math:`\text{CSMRisk}_{i}^{k}` is the cause-specific mortality risk for subcause :math:`k` for simulant :math:`i` (both detailed in the `Modeled Subcauses`_
 linked from this page).
 
-
 In addition to determining which simulants die due to any cause, we also need to determine which subcause is underlying the death.  This is done by sampling from a categorical distribution obtained by renormalizing the CSMRisks:
 
 .. math::
@@ -349,7 +353,10 @@ Data Tables
       - GBD + assumption about relative risks + intervention model effects
       - see subcause models for details
 
-**Details of the** :math:`\text{PAF}_\text{LBWSG}` **calculation:**
+.. _details_of_the_lbwsg_paf_calculation:
+
+Details of the LBWSG PAF calculation
+++++++++++++++++++++++++++++++++++++
 
 As stated in the table above, :math:`\text{PAF}_\text{LBWSG}` is the population attributable fraction of all-cause mortality for low birth weight and short gestation. It is computed so that PAF = 1 - 1 / E(:math:`\text{RR}_{\text{BW},\text{GA}}`) from the capped interpolated relative risk function (with expectation taken over the distribution of LBWSG exposure). 
 
@@ -374,6 +381,8 @@ Using the `LBWSG PAF calculation simulation <https://github.com/ihmeuw/vivarium_
 
 So,
 
+.. _details_of_the_lbwsg_paf_calculation_equation:
+
 .. math::
 
   E(\text{RR})_\text{population} = \frac{\sum_{\text{cat}} E(\text{RR})_\text{cat} \times p^\text{birth}_\text{cat} \times \frac{n_\text{cat} - n^\text{deaths}_\text{cat}}{n_\text{cat}}}{\sum_{\text{cat}} p^\text{birth}_\text{cat} \times \frac{n_\text{cat} - n^\text{deaths}_\text{cat}}{n_\text{cat}}}
 
@@ -154,11 +154,17 @@ Note that these probabilities are not used directly in the model and are include
 Modeling Strategy
 +++++++++++++++++
 
-The Preterm Birth submodel requires only the birth-weight- and gestation-age-stratified cause specific mortality risks for preterm birth complications with and without respiratory distress syndrome during the early and late neonatal periods.
+The Preterm Birth submodel only needs to produce the birth-weight- and gestation-age-stratified cause specific mortality risks for preterm birth complications with and without respiratory distress syndrome during the early and late neonatal periods.
+(These risks are also implicitly stratified by age group, sex, and location.)
 
 Since this is a PAF-of-one cause, the calculation must take into account the "structural zeros" representing no mortality risk for simulants with a gestational age of 37 or more weeks.
 
-The way these CSMRisks are used is the same for all subcauses, and therefore is included in the :ref:`Overall Neonatal Disorders Model <2021_cause_neonatal_disorders_mncnh>` page.  This page describes the birth-weight- and gestational-age-specific cause specific mortality risks that are used for this cause on that page, :math:`\text{CSMRisk}^{\text{preterm with RDS}}_{\text{BW},\text{GA}}` and :math:`\text{CSMRisk}^{\text{preterm without RDS}}_{\text{BW},\text{GA}}`. In both cases, the formula is:
+The way these CSMRisks are used is the same for all subcauses, and therefore is included in the :ref:`Overall Neonatal Disorders Model <2021_cause_neonatal_disorders_mncnh>` page.  This page describes how to calculate the birth-weight- and gestational-age-specific cause specific mortality risks that are used for the preterm subcauses on that page, namely :math:`\text{CSMRisk}^{\text{preterm with RDS}}_{\text{BW},\text{GA}}` and :math:`\text{CSMRisk}^{\text{preterm without RDS}}_{\text{BW},\text{GA}}`.
+As in the equations on the overall neonatal disorders model page, all quantities here
+are age-group-, sex-, and location-specific; these subscripts are omitted for brevity.
+For both preterm subcauses, the formula is:
+
+.. _preterm_csmrisk_equation:
 
 .. math::
     \begin{align*}
@@ -171,7 +177,8 @@ The way these CSMRisks are used is the same for all subcauses, and therefore is
     \end{align*}
 
 where :math:`k` is the subcause of interest (preterm birth with or without RDS),
-:math:`\text{CSMRisk}` is the cause-specific mortality riskk for preterm birth complications,
+:math:`\text{CSMRisk}` is the cause-specific mortality risk for preterm birth complications,
+:math:`p_{\text{preterm}}` is the prevalence of preterm (gestational age < 37 weeks) at the *beginning* of the age group,
 :math:`f_k` is the fraction of preterm deaths due to subsubcause :math:`k` (with or without RDS), :math:`\text{RR}_{\text{BW},\text{GA}}` is the relative risk of all-cause mortality for a birth weight of :math:`\text{BW}` and gestational age of :math:`\text{GA}`, and :math:`Z` is a normalizing constant selected so that :math:`E[\text{RR}_{\text{BW,GA}} | \text{GA}<37] \cdot Z = 1`. Solving for :math:`Z` gives :math:`Z = 1 / E[\text{RR}_{\text{BW,GA}} | \text{GA}<37]`.
 
 .. note::
@@ -184,7 +191,7 @@ where :math:`k` is the subcause of interest (preterm birth with or without RDS),
 
   We will use a **population size of 195_112** for this calculation. This number was selected in order to satisfy the following criteria:
 
-  - The population size per LWBSG exposure category is required to be a perfect square to be compatible with our strategy of initializing individual exposures on a grid within each LBWSG exposure category
+  - The population size per LBWSG exposure category is required to be a perfect square to be compatible with our strategy of initializing individual exposures on a grid within each LBWSG exposure category
 
   - The total population size of the PAF calculation pipeline must be divisible by the product of the number of LBWSG exposure categories (58), the number of sexes (2), and the number of age groups (2) used in the PAF calculation
 
@@ -202,7 +209,73 @@ where :math:`k` is the subcause of interest (preterm birth with or without RDS),
   Also, it is possible that the choice of :math:`\text{RR}_{\text{BW},\text{GA}}` might not work for every subcause. Since we're moving all the preterm mortality into the preterm categories, there is less room there for mortality from other causes, so depending on the risks involved, we may need to shift mortality from some other causes into the non-preterm categories in order to avoid making things negative.
   It is even possible that there is no way to make this work consistently, meaning that any choice of weight function would lead to negative mortality risks.  We expect that this will not be an issue, but we haven't actually tried it with the real data yet.
 
-Each individual simulant :math:`i` has their own :math:`\text{CSMR}_i^k` that might be different from :math:`\text{CSMRisk}^k_{\text{BW}_i,\text{GA}_i}` (meaning the average birth-weight- and gestational-age-specific CSMRisk for simulants with the birth weight and gestational age matching simulant :math:`i`.  We recommend implementing this as a pipeline eventually because it will be modified by interventions (or access to interventions) relevant to this subcause.  (Until we implement those, we will have :math:`\text{CSMRisk}_{i}^k = \text{CSMRisk}^k_{\text{BW}_i,\text{GA}_i}`, though.)
+:math:`\text{CSMRisk}` and :math:`p_{\text{preterm}}` are calculated differently for the ENN and LNN age groups.
+For clarity of notation, in what follows we will again make explicit the age group
+subscripts that have been implicit on every quantity to this point.
+(Sex and location remain implicit.)
+We define the ENN CSMRisk as:
+
+.. math::
+
+  \text{CSMRisk}_\text{ENN} = \frac{\text{enn_death_count}}{\text{live_birth_count}},
+
+where the :math:`\text{enn_death_count}` and :math:`\text{live_birth_count}` are
+quantities pulled from GBD, as detailed in the table below.
+
+The LNN CSMRisk is:
+
+.. math::
+
+  \text{CSMRisk}_\text{LNN} = \frac{\text{lnn_death_count}}{\text{live_birth_count} - \text{enn_all_cause_death_count}},
+
+where, again, all quantities are pulled from GBD as detailed in the table below.
+
+:math:`p_{\text{preterm}}`, as mentioned above, represents the prevalence/exposure
+of preterm (gestational age < 37 weeks) at the *beginning* of the age group.
+For ENN, the beginning of the age group is birth, so the prevalence of preterm
+at birth is a sum of the birth prevalence for all LBWSG categories with gestational
+age less than 37 weeks:
+
+.. math::
+
+  p_{\text{preterm},\text{ENN}} = \sum_{\{\text{cat}: \text{GA}<37\}} \text{lbwsg_birth_prevalence}_\text{cat},
+
+where :math:`\text{lbwsg_birth_prevalence}` can be pulled from GBD with minor transformations,
+as detailed in the table below.
+
+For LNN, the situation is more complicated, because we need to account
+for differential mortality in the ENN period.
+Therefore, the easiest way to calculate :math:`p_{\text{preterm},\text{LNN}}` is to get the end-of-ENN preterm
+prevalence from the same LBWSG PAF calculation pipeline used for :math:`Z`
+above.
+As detailed at :ref:`details_of_the_lbwsg_paf_calculation` on the neonatal all-cause
+mortality page, there are two iterative steps using microsimulation, with the late neonatal calculations
+using the result of the early neonatal calculations.
+Similarly to the LNN PAF, *after* the early neonatal calculations are complete, the prevalence of
+preterm at the end of the ENN age group should be calculated.
+This value should be used as :math:`p_{\text{preterm},\text{LNN}}` for the purposes
+of the CSMRisk equation.
+
+Determining the prevalence of preterm is a bit more complex than it sounds, because in the PAF calculation pipeline,
+the same number of simulants are assigned to each LBWSG category, rather than assigning each simulant
+to a random category with probability equal to that category's prevalence at birth.
+Due to this initialization strategy, all quantities calculated in the pipeline must use *weights*
+to account for the fact that the simulants in the categories with higher birth prevalence represent more people.
+Therefore, :math:`p_{\text{preterm},\text{LNN}}` is calculated as follows:
+
+.. math::
+
+  p_{\text{preterm},\text{LNN}} = \frac{
+    \sum_{\{\text{cat}: \text{GA}<37\}} \text{lbwsg_birth_prevalence}_\text{cat} \times \frac{n_\text{cat} - n^\text{deaths}_\text{cat}}{n_\text{cat}}
+  }{
+    \sum_{\text{cat}} \text{lbwsg_birth_prevalence}_\text{cat} \times \frac{n_\text{cat} - n^\text{deaths}_\text{cat}}{n_\text{cat}}
+  },
+
+where :math:`n_\text{cat}` is the number of simulants initialized into each LBWSG category at birth
+and :math:`n^\text{deaths}_\text{cat}` is the number of deaths in each category when ENN mortality was applied.
+Note that :math:`n_\text{cat}` will not vary by LBWSG exposure category under the current approach of assigning the same number of simulants to each LBWSG category.
+
+Each individual simulant :math:`i` has their own :math:`\text{CSMR}_i^k` that might be different from :math:`\text{CSMRisk}^k_{\text{BW}_i,\text{GA}_i}` (meaning the average birth-weight- and gestational-age-specific CSMRisk for simulants with the birth weight and gestational age matching simulant :math:`i`).  We recommend implementing this as a Vivarium pipeline eventually because it will be modified by interventions (or access to interventions) relevant to this subcause.  (Until we implement those, we will have :math:`\text{CSMRisk}_{i}^k = \text{CSMRisk}^k_{\text{BW}_i,\text{GA}_i}`, though.)
 
 The following table shows the data needed for these
 calculations.
@@ -212,8 +285,9 @@ Data Tables
 
 .. note::
 
-  All quantities pulled from GBD in the following table are for a
-  specific year, sex, age group, and location.
+  All quantities pulled from GBD in the following table are pulled
+  for all modeled years, sexes, age groups, and locations,
+  except when the age group is explicitly specified.
 
 .. list-table:: Data values and sources
     :header-rows: 1
@@ -224,36 +298,24 @@ Data Tables
       - Note
     * - enn_all_cause_death_count
       - Count of deaths due to all causes in the early neonatal age group
-      - GBD: source='codcorrect', metric_id=1, cause_id=294
+      - GBD: source='codcorrect', metric_id=1, cause_id=294, age_group_id=2
       - 
     * - enn_death_count
       - Count of deaths due to cause neonatal preterm birth complications in the early neonatal age group
-      - GBD: source='codcorrect', metric_id=1, cause_id=381
+      - GBD: source='codcorrect', metric_id=1, cause_id=381, age_group_id=2
       - 
     * - lnn_death_count
       - Count of deaths due to cause neonatal preterm birth complications in the late neonatal age group
-      - GBD: source='codcorrect', metric_id=1, cause_id=381
+      - GBD: source='codcorrect', metric_id=1, cause_id=381, age_group_id=3
       - 
     * - live_birth_count
       - Count of live births
       - GBD: covariate_id = 1106
       - 
-    * - csmrisk_enn
-      - neonatal preterm birth complications mortality risk in the early neonatal age group
-      - enn_death_count / live_birth_count
-      - 
-    * - csmrisk_lnn
-      - neonatal preterm birth complications mortality risk in the late neonatal age group
-      - lnn_death_count / (live_birth_count - enn_all_cause_death_count)
-      - 
-    * - :math:`\text{CSMRisk}`
-      - neonatal preterm birth complications mortality risk
-      - either csmrisk_enn or csmrisk_lnn depending on the simulant's age group
+    * - lbwsg_birth_prevalence
+      - Birth prevalence of low birthweight and short gestation risk factor
+      - GBD with post-processing: rei_id = 339, then remove the extraneous category and rescale prevalence :ref:`as described here <rescaling_lbwsg_exposure_data_pulled_from_gbd_2019>`.
       - 
-    * - :math:`p_\text{preterm}`
-      - Prevalence of gestational age <37 weeks at birth
-      - Derived from :ref:`GBD LBWSG exposure <risk_exposure_lbwsg>`
-      - Equal to the sum of exposures for all categories with gestational age at birth <37 weeks. A list of such categories can be generated in a manner similar to `this notebook <https://github.com/ihmeuw/vivarium_research_nutrition_optimization/blob/data_prep/data_prep/LBW%20categories.ipynb>`_ 
     * - :math:`f_\text{preterm w RDS}`
       - fraction of preterm deaths with RDS
       - 85%
 
@@ -201,7 +201,28 @@ Note that simulants who died during labor should not experience any YLDs due to
 4.0 Verification and Validation Criteria
 +++++++++++++++++++++++++++++++++++++++++
 
-- Baseline simulated anemia YLDs should match corresponding pregnancy-specific GBD values. TODO: define specifically what these are (do they save pregnancy-specific impairment prevalence in GBD 2023 or do we need to calculate our own targets again?)
+- Baseline simulated anemia YLDs should match corresponding pregnancy-specific GBD values. Run the following command to load the data from GBD 2023:
+
+.. code-block:: python
+
+   get_outputs(
+       location_id=[165,179,214],
+       topic='rei',
+       rei_id=[206,206,207], # We also have rei_id=192 for all anemia and rei_id=432 for moderate and severe combined
+       population_group_id=16,
+       sex_id=2,
+       year_id=2023,
+       release_id=16, # release_id=33 also works
+       compare_version_id=8306,
+       measure_id=[3,5],
+       age_group_id=[7, 8, 9, 10, 11, 12, 13, 14, 15, 24, 169]
+   )
+
+.. note::
+
+   Make sure you have the latest version of ``db_queries`` to be able to use the ``population_group_id`` argument. To get pregnancy-specific results, the population group and the age groups need to be specified, because the default is all ages.
+   As of the time of writing (July 2025), we can only use ``population_group_id=16`` with ``get_outputs()``. There were a few EPIC/COMO runs with pregnancy this GBD round, which are noted in the `tracking HUB page <https://hub.ihme.washington.edu/spaces/GBDdirectory/pages/229280352/GBD+2023+EPIC+COMO+tracking>`_.
+
 
 5.0 References
 +++++++++++++++