We will include 604 patients of 24-92 years of age who were diagnosed with the triple negative breast cancer (TNBC) at Institute XYZ between 19XX and 20XX.
The primary goal for this study is to estimate the percentage of patients manifesting the estrogen receptor (ER)-positive, progesterone receptor (PR)-positive, or human epidermal growth factor receptor 2 (HER2)-positive among the patients who are diagnosed with a subsequent breast cancer event during the entire follow-up time, as well as the cumulative incidence rates of ER, PR, or HER2-specific subsequent breast cancer events in 1, 2, 3, 5, 6, and 10 years, respectively. We will also estimate the median of follow-up time in the entire cohort; the median time to the overall subsequent breast cancer events from the initial diagnosis of TNBC; the frequency of the overall subsequent breast cancer events; and the incidence rates of the overall subsequent breast cancer events in 1, 2, 3, 5, 6, and 10 years, respectively.
The secondary goal for this study is to evaluate the factors associated with the overall subsequent breast cancer events within 3-year, 5-year, and 6-year follow-up, respectively; the factors associated with the survival time till the overall subsequent breast cancer events; the factors associated with the ER-specific subsequent breast cancer events within 3-year, 5-year, and 6-year follow-up, respectively; the factors associated with the survival time till the ER-specific subsequent breast cancer events; the factors associated with the PR-specific subsequent breast cancer events within 3-year, 5-year, and 6-year follow-up, respectively; the factors associated with the survival time till the PR-specific subsequent breast cancer events. Note that we will not investigate the factors associated with the HER2-specific subsequent breast cancer events within X-year follow-up or associated with the survival time till the HER2-specific subsequent breast cancer events, considering that only 2 patients manifest the HER2-positive subsequent breast cancer events.
The other goal for this study is to evaluate the patient characteristics that manifest the racial disparity. The patient characteristics of interest include:
- demographical variables, for example, age at diagnosis;
- disease history-related variables, for example, clinical T stage (i.e., T1|T2, T3|T4), clinical N stage (i.e., N0, N1|N2|N3), past ipsilateral breast cancer (i.e., No, Yes), tumor size by imaging, nodal metastases (i.e., No, Yes), histology (i.e., IDC, IDC/ILC, ILC, Metaplastic, Other), high-grade disease (i.e., No, Yes), and primary laterality (i.e., Left, Right);
- breast cancer management-related variables, for example, mammographically screening detected (i.e., No, Yes), magnetic resonance imaging screening detected (i.e., No, Yes), lumpectomy (i.e., No, Yes), mastectomy surgery (i.e., No, Yes), contralateral prophylactic mastectomy (i.e., No, Yes), sentinel lymph node biopsy (i.e., No, Yes), axillary lymph node dissection (i.e., No, Yes), neoadjuvant chemotherapy (i.e., No, Yes), and pathologic complete response of neoadjuvant chemotherapy (i.e., No, Yes);
- event-related variables, for example, ER status (i.e., Negative, Positive), PR status (i.e., Negative, Positive), HER2 status (i.e., Negative, Positive), laterality for subsequent breast cancer events (i.e., Left, Right), overall subsequent breast cancer events (i.e., No, Yes), 3-year overall subsequent breast cancer events (i.e., No (0), Yes (1)), 5-year overall subsequent breast cancer events (i.e., No (0), Yes (1)), 6-year overall subsequent breast cancer events (i.e., No (0), Yes (1)), ER-specific subsequent breast cancer events (i.e., No subsequent breast cancer events (0), ER-negative subsequent breast cancer events (1), ER-positive subsequent breast cancer events (2)), 3-year ER-specific subsequent breast cancer events (i.e., No subsequent breast cancer events within 3-year follow-up (0), ER-negative subsequent breast cancer events within 3-year follow-up (1), ER-positive subsequent breast cancer events within 3-year follow-up (2)), 5-year ER-specific subsequent breast cancer events (i.e., No subsequent breast cancer events within 5-year follow-up (0), ER-negative subsequent breast cancer events within 5-year follow-up (1), ER-positive subsequent breast cancer events within 5-year follow-up (2)), 6-year ER-specific subsequent breast cancer events (i.e., No subsequent breast cancer events within 6-year follow-up (0), ER-negative subsequent breast cancer events within 6-year follow-up (1), ER-positive subsequent breast cancer events within 6-year follow-up (2)), PR-specific subsequent breast cancer events (i.e., No subsequent breast cancer events (0), PR-negative subsequent breast cancer events (1), PR-positive subsequent breast cancer events (2)), 3-year PR-specific subsequent breast cancer events (i.e., No subsequent breast cancer events within 3-year follow-up (0), PR-negative subsequent breast cancer events within 3-year follow-up (1), PR-positive subsequent breast cancer events within 3-year follow-up (2)), 5-year PR-specific subsequent breast cancer events (i.e., No subsequent breast cancer events within 5-year follow-up (0), PR-negative subsequent breast cancer events within 5-year follow-up (1), PR-positive subsequent breast cancer events within 5-year follow-up (2)), and 6-year PR-specific subsequent breast cancer events (i.e., No subsequent breast cancer events within 6-year follow-up (0), PR-negative subsequent breast cancer events within 6-year follow-up (1), PR-positive subsequent breast cancer events within 6-year follow-up (2)).
For the primary goal, descriptive statistics including frequency and percent will be calculated to summarize the occurrences of ER-positive, PR-positive, or HER2-positive among the patients who are diagnosed with a subsequent breast cancer event during the entire follow-up time, and to summarize the occurrences of the overall subsequent breast cancer events among the patients who are initially diagnosed with TNBC during the entire follow-up time. The descriptive statistics including mean, standard deviation, median, and interquartile range will be calculated to summarize the distributions of follow-up time in the entire cohort and the time to the overall or hormone receptor-specific subsequent breast cancer events from the initial diagnosis of TNBC. We will use the cumulative incidence function, which accounts for the competing risk between the positive biomarker and the negative biomarker (i.e., ER-positive vs ER-negative, PR-positive vs PR-negative, HER2-positive vs HER2-negative), to estimate the cumulative incidence rates of the ER, PR, or HER2-specific subsequent breast cancer events in 1, 2, 3, 5, 6, and 10 years, respectively. We will use the Kaplan-Meier estimator to estimate the survival probabilities of the overall subsequent breast cancer events in 1, 2, 3, 5, 6, and 10 years, respectively, then use the formula “1 - survival probability” to derive the incidence rates of the overall subsequent breast cancer events in 1, 2, 3, 5, 6, and 10 years, respectively. The cumulative incidence rates of the ER, PR, or HER2-specific subsequent breast cancer events and the incidence rates of the overall subsequent breast cancer events will be delivered in the formats of both tables and figures.
For the secondary goal, we will group patients by the 2 categories of 3-year overall subsequent breast cancer events, 2 categories of 5-year overall subsequent breast cancer events, 2 categories of 6-year overall subsequent breast cancer events, 3 categories of 3-year ER-specific subsequent breast cancer events, 3 categories of 5-year ER-specific subsequent breast cancer events, 3 categories of 6-year ER-specific subsequent breast cancer events, 3 categories of 3-year PR-specific subsequent breast cancer events, 3 categories of 5-year PR-specific subsequent breast cancer events, 3 categories of 6-year PR-specific subsequent breast cancer events, respectively. The categorization of these subsequent breast cancer events is mentioned before in 4) event-related variables. The Fisher’s exact test will be used to compare the proportions of a categorical patient characteristic between the patients who experience the 3-year/5-year/6-year overall subsequent breast cancer events, and the patients who do not experience the 3-year/5-year/6-year overall subsequent breast cancer events, respectively. This test will also be used to compare the proportions of a categorical patient characteristic between the patients who experience the 3-year/5-year/6-year hormone receptor-specific subsequent breast cancer events, and the patients who do not experience the 3-year/5-year/6-year hormone receptor-specific subsequent breast cancer events, respectively. The Wilcoxon rank sum test or Kruskal Wallis rank sum test will be used to compare the medians (more strictly, medians, shapes, and spreads) of a continuous patient characteristic between the patients who experience the 3-year/5-year/6-year overall or hormone receptor-specific subsequent breast cancer events, and the patients who do not experience the 3-year/5-year/6-year overall or hormone receptor-specific subsequent breast cancer events, respectively. Multivariable logistic regression models or multivariable multinomial logistic regression models will be explored to evaluate the independent effects of patient characteristics on the 3-year/5-year/6-year overall or hormone receptor-specific subsequent breast cancer events. These patient characteristics are those that show statistical significance in the associations with the 3-year/5-year/6-year overall or hormone receptor-specific subsequent breast cancer events in the Fisher’s exact test, Wilcoxon rank sum test, or Kruskal Wallis rank sum test. We will also use univariable cox proportional hazards regression models to examine the crude effects of patient characteristics on the survival time till the overall subsequent breast cancer events. Multivariable cox proportional hazards regression models will be explored to evaluate the independent effects of patient characteristics which show statistical significance in univariable cox proportional hazards regression models, on the survival time till the overall subsequent breast cancer events. Furthermore, we will use univariable subdistribution hazard regression models to evaluate the crude effects of patient characteristics on the survival time till the hormone receptor-specific subsequent breast cancer events. Multivariable subdistribution hazard regression models will be explored to assess the independent effects of patient characteristics which show statistical significance in univariable subdistribution hazard regression models, on the survival time till the hormone receptor-specific subsequent breast cancer events.
For the third goal, we will group patients in the following ways: 1) 5 categories of race/ethnicity (i.e., White American (WA), African American (AA), Asian, Hispanic/Latino (Hisp/Latina), Other); 2) 4 categories of race/ethnicity excluding the “Other” (i.e., WA, AA, Asian, Hisp/Latina); 3) 2 categories of race/ethnicity (i.e., WA, All other races/ethnicities); 4) 2 categories of race/ethnicity (i.e., WA, AA). Descriptive statistics including mean, standard deviation, median, and interquartile range will be calculated to summarize continuous patient characteristics grouped by each of 4 race/ethnicity variables. The frequency and percent will be calculated to summarize categorical patient characteristics grouped by each of 4 race/ethnicity variables. The Fisher’s exact test will be used to compare the proportions of a categorical patient characteristic grouped by each of 4 race/ethnicity variables. The Wilcoxon rank sum test (for two groups comparison) or Kruskal Wallis rank sum test (for more than two groups comparison), as appropriate, is used to compare the medians (more strictly, medians, shapes, and spreads) of a continuous patient characteristic grouped by each of 4 race/ethnicity variables.
Collinearity between covariates in the multivariable logistic, multinomial logistic, and cox proportional hazards regression models will be evaluated prior to the formulation of the final multivariable models. Adjusted regression coefficients and 95% confidence intervals for patient characteristics of interest will be estimated from the multivariable logistic, multinomial logistic, cox proportional hazards, and subdistribution hazard regression models. All adjusted regression coefficient estimates from the multivariable models will serve as preliminary data (i.e., hypothesis-generating) for future studies.
All p-values will be two-sided with statistical significance evaluated at the 0.05 alpha level. All analyses will be performed in R Version 4.2.2 (R Foundation for Statistical Computing, Vienna, Austria).