TNBC/TNBC_v3.Rmd

---
title: "Triple Negative Breast Cancer Data Analysis"
author: "Anni Liu"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
knit: knitautomator::knit_filename
output:
  word_document:
    fig_caption: no
    highlight: null
    toc: yes
    reference_docx: manuscript_style_V0.docx
params:
  date.analysis: !r format(Sys.Date(), "%Y%b%d")
  plot.fig: TRUE
  results.folder: FALSE
editor_options: 
  chunk_output_type: console
---

```{r shorcut, include=FALSE}
#################################################################
##                  RStudio keyboard shortcut                  ##
#################################################################
# Cursor at the beginning of a command line: Ctrl + A
# Cursor at the end of a command line: Ctrl + E
# Clear all the code from your console: Ctrl + L
# Create an assignment operator <-: Alt + - (Windows) or Option + - (Mac).
# Create a pipe operator %>%: Ctrl + Shift + M (Windows) or Cmd + Shift + M (Mac)
# Knit a document (knitr): Ctrl + Shift + K (Windows) or Cmd + Shift + K (Mac)
# Comment or uncomment current selection: Ctrl + Shift + C (Windows) or Cmd + Shift + C (Mac)
```


```{r attach_lib_func, include=F}
##------Attach libraries and functions------
easypackages::libraries("multcomp", "BTKR", "readxl", "tidyverse", 
                        "bannerCommenter", "parallel", "formatR",
                        "tidycmprsk", "ggsurvfit", "ragg", "magrittr",
                        "foreach", "future.apply", "fst", "data.table") |> suppressPackageStartupMessages()
"%_%" <- function(m, n) paste0(m, "_", n)
"%0%" <- function(m, n) paste0(m, n)

walk(c("uni.coxph.R", "fcphuni.stat2.R", "fcphuni.tbl2.R"), source)

##------Set the customized theme for all plots------
theme_set(theme_classic())
theme_update(
  legend.position = "bottom",
  strip.background = element_blank(),
  axis.text = element_text(size = 10, color = 'black'), # 
  axis.title = element_text(size = 12),
  legend.text = element_text(size = 10))
theme.list <- theme_get()
```


```{r global_options, include=F}
#################################################################
##                          Automator                          ##
#################################################################
if (params$plot.fig) {
  dir.fig <- "../report/figs" %_% params$date.analysis %0% "/"
  # Need "/", otherwise, the images are saved directly under the report folder
  
  if (!dir.exists(dir.fig)) { 
    # If the figure directory does not exist, we create a new directory under the folder report using the name figs + current date passed from the params$date.analysis in YAML
    dir.create(dir.fig) 
  }
  
  knitr::opts_chunk$set( # Setting parameters when figures are plotted
    fig.width = 4, fig.height = 4, 
    fig.path = dir.fig, dev = "png", dpi = 300,
    echo = FALSE, warning = FALSE, message = FALSE,
    cache = FALSE,
    comment = ""
  )
} else { # Setting parameters when figures are not plotted
  knitr::opts_chunk$set(
    echo = FALSE, warning = FALSE, message = FALSE,
    cache = FALSE,
    comment = ""
  )
}

if (params$results.folder) { # Suitable when the results need to be stored outside the microsoft word report
  dir.result <- "../report/results" %_% params$date.analysis
  
  if (!dir.exists(dir.result)) {
    # If the directory does not exist, we create a new directory under the folder report using the name results + current date passed from the params$date.analysis in YAML 
    
    dir.create(dir.result)
  }
}
```


```{r fast_check_data, include=F}
##------Load preprocessed data------
dat.work <- read_fst(path = "../data/derived/2024Nov22_dat_TNBC.RData")
```


# Data preparation

Among `r count <- sum(dat.work$Overall.Subsequent.BC == "Yes", na.rm = T); count` (`r n <- nrow(dat.work); round(count/n * 100, 2)`%, n = `r n`) patients with the subsequent breast cancer events, `r sum(dat.work$Overall.Subsequent.BC == "Yes" & (is.na(dat.work$Date.Subsequent.BC.Event)), na.rm = T)` patients (`r dat.work$ID[which(dat.work$Overall.Subsequent.BC == "Yes" & (is.na(dat.work$Date.Subsequent.BC.Event)))]`) do not have the date of the subsequent breast cancer events. Among `r count <- sum(dat.work$Overall.Subsequent.BC == "No", na.rm = T); count` (`r round(count/n * 100, 2)`%, n = `r n`) patients without the subsequent breast cancer events, `r sum(dat.work$Overall.Subsequent.BC == "No" & (is.na(dat.work$Date.of.Diagnosis)), na.rm = T)` patients (`r dat.work$ID[which(dat.work$Overall.Subsequent.BC == "No" & (is.na(dat.work$Date.of.Diagnosis)))]`) do not have the date of diagnosis of triple negative breast cancer (TNBC).

In the current analysis, we classify `r sum(dat.work$Race.Ethnicity == "Arabic/Mideastern", na.rm = T)` patient with Arabic/Mideastern as White American (WA). 

The overall subsequent breast cancer events free survival composed of variables `SBE.event` and `SBE.time` is calculated from the date of TNBC diagnosis to the date of event of interest and censored at the date of death or the date of last follow-up whichever is earlier for patients not experiencing the event of interest.

The overall subsequent breast cancer events at 3, 5, 10, and 15 years represented by the variables `SBE.3year`, `SBE.5year`, `SBE.10year`, and `SBE.15year` are evaluated among patients who develop the event of interest within 3, 5, 10, or 15 years from the diagnosis of TNBC and patients who are at risk at the 3rd, 5th, 10th, or 15th year from the diagnosis of TNBC, respectively.

We use the cumulative incidence function, which considers the competing risk between the positive marker and the negative marker (e.g., ER positive vs ER negative), to estimate the cumulative incidence rates of ER-specific, PR-specific, HER2-specific, , HR-specific subsequent breast cancer events in 1, 2, 3, 5, 10, and 15 years, respectively. We use the Kaplan-Meier estimator to estimate the incidence rates of the overall subsequent breast cancer events in 1, 2, 3, 5, 10, and 15 years, respectively. 

The association between a categorical variable and a grouping variable (e.g., `Race.Ethnicity2`) is examined using the Fisher's exact test. The difference in the value of a continuous variable among patients of different groups is examined using the Wilcoxon rank sum test (for two groups comparison) or Kruskal Wallis rank sum test (for more than two groups comparison). Notice that the p-value for the Wilcoxon rank sum test or Kruskal Wallis rank sum test is aligned with the median (IQR) summaries in a summary table. For variable with missing values, the difference in the proportion of missingness across different groups is also examined using the Fisher's exact test. 

All p-values are two-sided with statistical significance evaluated at the 0.05 alpha level. All analyses are performed in R Version 4.3.1 (R Foundation for Statistical Computing, Vienna, Austria).


```{r clean_data_start, eval=F}
##------Load the original data------
dat0 <- read_xlsx(
  path = "../data/raw/TNBC WCM 1998-2018_Deidentified_3_3_23.xlsx",
  sheet = "All patients", 
  range = "A1:BQ605",
  na = c("Unknown", "", "NA", "Not applicable ", "Not applicable", "Not aplicable", "n/a", "Unknown grade (not reported or unavailable)", "Unknown (not reported or unavailable)")) %>%
  select(-c("ER expression (%)", "PR expression (%)", "HER2/neu (IHC)", "HER2/neu (FISH)")) |>
  data.frame()


##------Remove 7 patients------
id.rm <- 'ID'%0%c(48, 199, 205, 230, 239, 383, 378)
dat0 <- dat0[!dat0$ID%in%id.rm, ]


##------Fix the column names------
names(dat0) <- sapply(strsplit(names(dat0), split="\\."), function(x)
  paste(x[x!=""], collapse="."))
dat0 <- rename(dat0, ER.stain=X.62, PR.stain=X.64)
dat0 <- as.data.table(dat0)
dat0[,Tumor.Size.by.Path:=Tumor.Size.best.estimate.By.Path.Primary.surg.case]
dat0[,Tumor.Size.by.Imaging:=Tumor.Size.best.estimate.By.Breast.Imaging.NACT.case]
dat0[,Histology.Primary:=Histology.14]
dat0[,Histology.Subsequent:=Histology.60]
dat0[,Laterality.Subsequent:=Laterality.Subsequent.BC.Event]
dat0[,Genetic.Testing.Results:=Genetic.Testing.Result.s]
dat0[,Overall.Subsequent.BC:=Any.Subsequent.BC.event]


##------Recode variables------
dat0[,SBE.ER.cat:=case_when(
  ER.stain%in%c('<1', '0') ~ '<1%',
  ER.stain%in%c('5', '<10', '10') ~ '1-10%',
  ER.stain%in%c('>10', '30', '50') ~ '11-50%', 
  ER.stain%in%c('88', '95', '100') ~ '51-100%', 
  TRUE ~ ER.stain)]
dat0[which(ER.stain=='<10'), ID] # "ID17" "ID31" "ID83"
dat0[which(ER.stain=='>10'), ID] # "ID453"
# ? < 10; ? > 10

dat0[,SBE.PR.cat:=case_when(
  grepl("^<1|^0$", PR.stain) ~ "<1%", 
  grepl("^>1$|^1-5$|^2$|^<10$|^<5$", PR.stain) ~ "1-10%", 
  grepl("^>10$|^15$|^45$|^50$", PR.stain) ~ "11-50%",  
  grepl("^75$|^95$|^>50$", PR.stain) ~ "51-100%",
  grepl("^unk", PR.stain) ~ NA,
  TRUE ~ PR.stain)]
dat0[which(PR.stain=='>1'), ID] # "ID119" "ID2"   "ID85" 
dat0[which(PR.stain=='>10'), ID] # "ID453" "ID58" 
# ? > 1; ? > 10

dat0[,SBE.ER.cat.2:=case_when(
  SBE.ER.cat=='<1%' ~ '1',
  SBE.ER.cat=='1-10%' ~ '2',
  SBE.ER.cat=='11-50%' ~ '3', 
  SBE.ER.cat=='51-100%' ~ '4', 
  TRUE ~ SBE.ER.cat) |> as.numeric()]

dat0[,SBE.PR.cat.2:=case_when(
  SBE.PR.cat=='<1%' ~ '1',
  SBE.PR.cat=='1-10%' ~ '2',
  SBE.PR.cat=='11-50%' ~ '3', 
  SBE.PR.cat=='51-100%' ~ '4', 
  TRUE ~ SBE.PR.cat)|> as.numeric()]

dat0[,SBE.HR.cat:=case_when(
  (!is.na(SBE.ER.cat)) & !is.na(SBE.PR.cat) & (SBE.ER.cat.2 >= SBE.PR.cat.2) ~ SBE.ER.cat,
  (!is.na(SBE.ER.cat)) & !is.na(SBE.PR.cat) & (SBE.ER.cat.2 < SBE.PR.cat.2) ~ SBE.PR.cat,
  (!is.na(SBE.ER.cat)) & is.na(SBE.PR.cat) ~ SBE.ER.cat,
  is.na(SBE.ER.cat) & (!is.na(SBE.PR.cat)) ~ SBE.PR.cat,
  is.na(SBE.ER.cat) & is.na(SBE.PR.cat) ~ NA
)]

dat0[,Tumor.Size.by.Path:=case_when(Tumor.Size.by.Path == "<0.1" ~ "0.05", Tumor.Size.by.Path == "<1" ~ "0.9", TRUE ~ Tumor.Size.by.Path) |> as.numeric()]
# typeof(dat0$Tumor.Size.by.Path)

dat0[, Genetic.Testing.Results:=case_when(Genetic.Testing.Results == "BRCA positive" ~ "BRCA",
                                          ###BRCA1
                                          Genetic.Testing.Results == "5385incC BRCA1 deleterious" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 +" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 del exons 21-24" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 positive" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 positive  182 del AG" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 positive  187 dell AG mutation" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 postitive" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1 postitive." ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA 1+   4160 DelAG" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA1 gene mutation" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA1 positive" ~ "BRCA1",
                                          Genetic.Testing.Results == "BRCA1 positive (187delAG)" ~ "BRCA1",
                                          Genetic.Testing.Results == "Negative for BRCA1 and BRCA2 in 2007 - retested in 2017 and found to be positive for BRCA1" ~ "BRCA1",
                                          Genetic.Testing.Results == "Positive for BRCA1" ~ "BRCA1",
                                          Genetic.Testing.Results == "Positive for BRCA1 mutation" ~ "BRCA1",
                                          ###BRCA1 VUS
                                          Genetic.Testing.Results == "Genetic variant of BRCA 1 of unknown significance" ~ "BRCA1 VUS",
                                          ###BRCA1/2
                                          Genetic.Testing.Results == "BRCA 1/2 positive" ~ "BRCA1/2",
                                          ###BRCA1/2 negative
                                          Genetic.Testing.Results == "BRCA 1-2, ATM, CDH1, CHEK2, PALB2, PTEN, TP53 were all negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 and BART negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "brca 1/2 negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 Negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 NEGATIVE" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 Negative  25 gene susceptibility mutations negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 negative  Negative 40 genes" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA 1/2 negative  PTEN variant of unknown significance" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "brca negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA negative but (+)APC mutation" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA. negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA1 mutation carrier" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA1 negative; BRCA2 polymorphism" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA1 PMUS" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA1/2 and CHEK2 negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "BRCA1/2 negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "MYD88 not detected" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "Negative" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "Negative for BRCA1 and BRCA2" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "Negative for BRCA1/2" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "Negative per patient" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "no BRCA detected" ~ "BRCA1/2 negative",
                                          Genetic.Testing.Results == "No mutation detected" ~ "BRCA1/2 negative",
                                          ###BRCA2
                                          Genetic.Testing.Results == "BRCA 2 mutation" ~ "BRCA2",
                                          Genetic.Testing.Results == "BRCA 2 mutation c.1854delCinsAA" ~ "BRCA2",
                                          Genetic.Testing.Results == "BRCA 2 positive" ~ "BRCA2",
                                          Genetic.Testing.Results == "BRCA2 mutation (6174 del T)" ~ "BRCA2",
                                          ###BRCA2 VUS
                                          Genetic.Testing.Results == "BRCA 2 mutation of unknown significance" ~ "BRCA2 VUS",
                                          Genetic.Testing.Results == "BRCA 2 of unknown significance" ~ "BRCA2 VUS",
                                          Genetic.Testing.Results == "BRCA 2 VUS" ~ "BRCA2 VUS",
                                          Genetic.Testing.Results == "Variant in BRCA2 known as I2285V" ~ "BRCA2 VUS",
                                          Genetic.Testing.Results == "VUS in BRCA2" ~ "BRCA2 VUS",
                                          ###CHEK VUS
                                          Genetic.Testing.Results == "Low penetrance CHEK  Variants of uncertain significance RB1, SMAD4, TSC2" ~ "CHEK VUS",
                                          ###CHEK2
                                          Genetic.Testing.Results == "positive for a mutation of low penetrance in the CHEK2 gene" ~ "CHEK2",
                                          ###CHEK2 VUS
                                          Genetic.Testing.Results == "26 genes on 10/25/18 revealed possibly mosaic, likely pathogenic variant identified in CHEK2. Variants of Uncertain Significance were identified in the genes APC and MSH6 (APC gene c.2110G>A and MSH6 gene c.965C>A)." ~ "CHEK2 VUS",
                                          ###Unknown
                                          Genetic.Testing.Results == "Ashkenazi Jewish, no breast, ovarian, or colorectal cancers, mat cousin leukemia, father prostate ca" ~ "Unknown",
                                          TRUE ~ Genetic.Testing.Results)]

##------Reorder variables------
dat0[,First.Treatment.Modality:=factor(First.Treatment.Modality, levels=c("Surgery", "CTX", "XRT", "Declined All Tx", "Deemed Untreatable"))]
dat0[,Postop.Adjuvant.XRT:=factor(Postop.Adjuvant.XRT, levels=c("Breast XRT", "Breast/Regional XRT", "PMRT", "No"))]
# sapply(c("First.Treatment.Modality", "Postop.Adjuvant.XRT"), function(x) levels(dat0[[x]]))
```

```{r clean_data_continue, eval=F}
##------Recode survival related variables------
##------Create HR------
dat0[,HR:=if_else(condition=(ER=='Positive'|PR=='Positive'), 'Positive', 'Negative', missing=NA)]

##------Create T2SBE/T2Death/T2LFU/T2ER/T2PR/T2HER2/T2HR (in years)------
dat0[,T2SBE:=as.numeric(as.Date(Date.Subsequent.BC.Event)-as.Date(Date.of.Diagnosis))/365.25]
dat0[,T2Death:=as.numeric(as.Date(Date.Death)-as.Date(Date.of.Diagnosis))/365.25]
sum(!is.na(dat0$Date.Death))/length(dat0$Date.Death) # 7.54%
dat0[,T2LFU:=as.numeric(as.Date(Date.Last.Follow.up)-as.Date(Date.of.Diagnosis))/365.25]
dat0[ER=="Positive",T2ER:=as.numeric(as.Date(Date.Subsequent.BC.Event)-as.Date(Date.of.Diagnosis))/365.25]
dat0[PR=="Positive",T2PR:=as.numeric(as.Date(Date.Subsequent.BC.Event)-as.Date(Date.of.Diagnosis))/365.25]
dat0[HER2=="Positive",T2HER2:=as.numeric(as.Date(Date.Subsequent.BC.Event)-as.Date(Date.of.Diagnosis))/365.25]
dat0[HR=="Positive",T2HR:=as.numeric(as.Date(Date.Subsequent.BC.Event)-as.Date(Date.of.Diagnosis))/365.25]
dat0[HR=="Negative",T2HR_Neg:=as.numeric(as.Date(Date.Subsequent.BC.Event)-as.Date(Date.of.Diagnosis))/365.25]
# (sapply(c("T2SBE", "T2Death", "T2LFU", "T2ER", "T2PR", "T2HER2", "T2HR", "T2HR_Neg"), function(x) with(dat0, summary(get(x)))))
#               T2SBE     T2Death     T2LFU       T2ER       T2PR      T2HER2       T2HR
# Min.      0.1943874   0.2902122  0.000000   1.023956   1.859001   0.9144422   1.023956
# 1st Qu.   1.3018480   1.4688569  2.224504   2.648871   3.347707   1.5865845   2.785079
# Median    2.7488022   2.5598905  5.919233   3.644079   4.481862   2.2587269   3.760438
# Mean      4.5239847   3.9947029  6.871188   5.719370   6.950947   2.2587269   5.761514
# 3rd Qu.   5.3408624   5.9739904 10.603012   9.036277  12.196441   2.9308693   9.036277
# Max.     34.7542779  11.9069131 37.160849  13.412731  13.412731   3.6030116  13.412731
# NA's    508.0000000 558.0000000  8.000000 580.000000 586.000000 602.0000000 576.000000

##------Create SBE.3year, SBE.5year, SBE.10year, SBE.15year (binary variables, for univariate analysis) ------
# When to use which()?
# with(dat0, Overall.Subsequent.BC=="Yes"&T2SBE<3) |> is.na() |> sum()
# with(dat0, Overall.Subsequent.BC=="No"&T2LFU<3) |> is.na() |> sum()

# with(dat0, table(paste(Overall.Subsequent.BC, 
#                        ifelse(is.na(T2SBE), "noT2SBE", "T2SBE"), 
#                        ifelse(is.na(T2Death), "noT2Death", "T2Death"), 
#                        ifelse(is.na(T2LFU), "noT2LFU", "T2LFU"), 
#                        sep = ":")))

#   NA:noT2SBE:noT2Death:T2LFU     NA:noT2SBE:T2Death:T2LFU 
#                            4*                            1* 
# No:noT2SBE:noT2Death:noT2LFU   No:noT2SBE:noT2Death:T2LFU 
#                            8*                          460 
#     No:noT2SBE:T2Death:T2LFU  Yes:noT2SBE:noT2Death:T2LFU 
#                           33                            2* 
#    Yes:T2SBE:noT2Death:T2LFU      Yes:T2SBE:T2Death:T2LFU 
#                           84                           12 

dat0[, `:=`(SBE.3year = 0, SBE.5year = 0, SBE.10year = 0, SBE.15year = 0)]

# Define a function to update SBE.year columns
update_SBE_year <- function(year) {
  dat0[Overall.Subsequent.BC == "Yes" & T2SBE < year, paste0("SBE.", year, "year") := 1] 
  dat0[Overall.Subsequent.BC == "Yes" & is.na(T2SBE), paste0("SBE.", year, "year") := NA] #1
  dat0[Overall.Subsequent.BC == "No" & T2LFU < year, paste0("SBE.", year, "year") := NA] #2
  dat0[Overall.Subsequent.BC == "No" & is.na(T2LFU), paste0("SBE.", year, "year") := NA] #2
  dat0[is.na(Overall.Subsequent.BC), paste0("SBE.", year, "year") := NA] # 2
}

years <- c(3, 5, 10, 15)
for (year in years) {
  update_SBE_year(year)
}

# Test the function
# dat0[,SBE.3year.2:=rep(0,dim(dat0)[1])]
# dat0[which(Overall.Subsequent.BC=="Yes"&T2SBE<3),SBE.3year.2:=1]
# dat0[which(Overall.Subsequent.BC=="Yes"&is.na(T2SBE)),SBE.3year.2:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&T2LFU<3),SBE.3year.2:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&is.na(T2LFU)),SBE.3year.2:=NA]
# dat0[which(is.na(Overall.Subsequent.BC)),SBE.3year.2:=NA]
# 
# any(dat0$SBE.3year != dat0$SBE.3year.2, na.rm = T)
# 
# dat0[,SBE.5year:=rep(0,dim(dat0)[1])]
# dat0[which(Overall.Subsequent.BC=="Yes"&T2SBE<5),SBE.5year:=1]
# dat0[which(Overall.Subsequent.BC=="Yes"&is.na(T2SBE)),SBE.5year:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&T2LFU<5),SBE.5year:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&is.na(T2LFU)),SBE.5year:=NA]
# dat0[which(is.na(Overall.Subsequent.BC)),SBE.5year:=NA]
# 
# dat0[,SBE.10year:=rep(0,dim(dat0)[1])]
# dat0[which(Overall.Subsequent.BC=="Yes"&T2SBE<10),SBE.10year:=1]
# dat0[which(Overall.Subsequent.BC=="Yes"&is.na(T2SBE)),SBE.10year:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&T2LFU<10),SBE.10year:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&is.na(T2LFU)),SBE.10year:=NA]
# dat0[which(is.na(Overall.Subsequent.BC)),SBE.10year:=NA]
# 
# dat0[,SBE.15year:=rep(0,dim(dat0)[1])]
# dat0[which(Overall.Subsequent.BC=="Yes"&T2SBE<15),SBE.15year:=1]
# dat0[which(Overall.Subsequent.BC=="Yes"&is.na(T2SBE)),SBE.15year:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&T2LFU<15),SBE.15year:=NA]
# dat0[which(Overall.Subsequent.BC=="No"&is.na(T2LFU)),SBE.15year:=NA]
# dat0[which(is.na(Overall.Subsequent.BC)),SBE.15year:=NA]

# sapply(c("SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year"), function(x) with(dat0, table(get(x), useNA = "ifany")))
#      SBE.3year SBE.5year SBE.10year SBE.15year
# 0          385       313        146         39
# 1           51        71         82         94
# <NA>       168       220        376        471


# with(dat0, table(paste(Overall.Subsequent.BC, HR,
#                        ifelse(is.na(T2SBE), "noT2SBE", "T2SBE"),
#                        sep = ":")))
#      NA:NA:noT2SBE        No:NA:noT2SBE       Yes:NA:noT2SBE 
#                  5                  501                    1 
#       Yes:NA:T2SBE   Yes:Negative:T2SBE Yes:Positive:noT2SBE 
#                  3                   65                    1 
# Yes:Positive:T2SBE 
#                 28 

dat0[, `:=`(SBE.HR.3year = 0, SBE.HR.5year = 0, SBE.HR.10year = 0, SBE.HR.15year = 0)]
update_SBE_HR_year <- function(year) {
  dat0[Overall.Subsequent.BC == "Yes" & HR == "Positive" & T2SBE < year, paste0("SBE.HR.", year, "year") := 1] 
  dat0[Overall.Subsequent.BC == "Yes" & HR == "Positive" & is.na(T2SBE), paste0("SBE.HR.", year, "year") := NA] #1
  dat0[Overall.Subsequent.BC == "Yes" & HR == "Negative" & (T2SBE < year | is.na(T2SBE)), paste0("SBE.HR.", year, "year") := NA] #2
  dat0[Overall.Subsequent.BC == "No" & HR %in% c("Negative", "Positive") & (T2LFU < year | is.na(T2LFU)), paste0("SBE.HR.", year, "year") := NA] #2
  dat0[is.na(Overall.Subsequent.BC) | is.na(HR), paste0("SBE.HR.", year, "year") := NA] # 2
}

years <- c(3, 5, 10, 15)
for (year in years) {
  update_SBE_HR_year(year)
}

sapply(c("SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year"), function(x) with(dat0, table(get(x), useNA = "ifany")))
#      SBE.HR.3year SBE.HR.5year SBE.HR.10year SBE.HR.15year
# 0              45           25            14             2
# 1               9           19            22            28
# <NA>          550          560           568           574


##------Create SBE.event/SBE.time (for Kaplan-Meier estimator and Cox PH model)------

# with(dat0, table(paste(Overall.Subsequent.BC, 
#                        ifelse(is.na(T2SBE), "noT2SBE", "T2SBE"), 
#                        ifelse(is.na(T2Death), "noT2Death", "T2Death"), 
#                        ifelse(is.na(T2LFU), "noT2LFU", "T2LFU"), 
#                        sep = ":")))

#   NA:noT2SBE:noT2Death:T2LFU     NA:noT2SBE:T2Death:T2LFU 
#                            4*                            1* 
# No:noT2SBE:noT2Death:noT2LFU   No:noT2SBE:noT2Death:T2LFU 
#                            8*                          460 
#     No:noT2SBE:T2Death:T2LFU  Yes:noT2SBE:noT2Death:T2LFU 
#                           33                            2* 
#    Yes:T2SBE:noT2Death:T2LFU      Yes:T2SBE:T2Death:T2LFU 
#                           84                           12 

# dat0[SBE.event==0, T2Death <= T2LFU]
# dat0[SBE.event==1, T2SBE <= T2LFU]

dat0$SBE.event <- rep(NA, dim(dat0)[1])
dat0[Overall.Subsequent.BC=="Yes"&(!is.na(T2SBE))]$SBE.event <- 1
dat0[Overall.Subsequent.BC=="No"&((!is.na(T2LFU)) | (!is.na(T2Death)))]$SBE.event <- 0
dat0[,SBE.event:=gsub(TRUE,1,
                      gsub(FALSE,0,SBE.event)) |> as.numeric()] # SBE.event should be shown as a categorical variable in the summary table 1

dat0$SBE.time <- rep(NA, dim(dat0)[1])
dat0[,SBE.time:=case_when(SBE.event==0&is.na(T2Death)&(!is.na(T2LFU)) ~ T2LFU,
                          SBE.event==0&(!is.na(T2Death))&(!is.na(T2LFU))&(T2Death<T2LFU) ~ T2Death,
                          SBE.event==0&(!is.na(T2Death))&(!is.na(T2LFU))&(T2Death>=T2LFU) ~ T2LFU,
                          SBE.event==1 ~ T2SBE)]
sum(dat0$SBE.time == dat0$T2Death, na.rm = T) # 30
dat0$ID[which(dat0$SBE.time == dat0$T2Death)]
# summary(dat0$SBE.time)


##------Create SBE.ER/SBE.PR/SBE.HER2/SBE.HR (for cumulative incidence function)------
# dat0[,Death:=ifelse(!is.na(Date.Death),"Dead","Alive")]
dat0$SBE.ER = dat0$SBE.PR = dat0$SBE.HER2 = dat0$SBE.HR = rep(NA, dim(dat0)[1])

# lapply(c("ER", "PR", "HER2", "HR"), function(x) with(dat0, table(paste(Overall.Subsequent.BC, get(x), ifelse(is.na(T2SBE), "noT2SBE", "T2SBE"), ifelse(is.na(T2LFU), "noT2LFU", "T2LFU"), sep = ":"))))

# lapply(c("ER", "PR", "HER2"), function(x) with(dat0, table(paste(Overall.Subsequent.BC, get(x), ifelse(is.na(T2SBE), "noT2SBE", "T2SBE"), ifelse(is.na(T2LFU), "noT2LFU", "T2LFU"), sep = ":"))))
# [[1]]
# 
#        NA:NA:noT2SBE:T2LFU      No:NA:noT2SBE:noT2LFU 
#                          5*                          8* 
#        No:NA:noT2SBE:T2LFU       Yes:NA:noT2SBE:T2LFU 
#                        493                          1* 
#         Yes:NA:T2SBE:T2LFU   Yes:Negative:T2SBE:T2LFU 
#                          3*                         69 
# Yes:Positive:noT2SBE:T2LFU   Yes:Positive:T2SBE:T2LFU 
#                          1*                         24 
# 
# [[2]]
# 
#        NA:NA:noT2SBE:T2LFU      No:NA:noT2SBE:noT2LFU 
#                          5*                          8* 
#        No:NA:noT2SBE:T2LFU       Yes:NA:noT2SBE:T2LFU 
#                        493                          1* 
#         Yes:NA:T2SBE:T2LFU Yes:Negative:noT2SBE:T2LFU 
#                          3*                          1* 
#   Yes:Negative:T2SBE:T2LFU   Yes:Positive:T2SBE:T2LFU 
#                         75                         18 
# 
# [[3]]
# 
#        NA:NA:noT2SBE:T2LFU      No:NA:noT2SBE:noT2LFU 
#                          5*                          8* 
#        No:NA:noT2SBE:T2LFU       Yes:NA:noT2SBE:T2LFU 
#                        493                          1* 
#         Yes:NA:T2SBE:T2LFU Yes:Negative:noT2SBE:T2LFU 
#                         12*                          1* 
#   Yes:Negative:T2SBE:T2LFU   Yes:Positive:T2SBE:T2LFU 
#                         82                          2 

dat0[,SBE.ER:=case_when(SBE.event==1&ER=="Positive" ~ "SBEYesERPos",
                        SBE.event==1&ER=="Negative" ~ "SBEYesERNeg",
                        SBE.event==0&(!is.na(T2LFU)) ~ "censor") |> 
       factor(levels=c("censor","SBEYesERPos","SBEYesERNeg"))]

dat0[,SBE.PR:=case_when(SBE.event==1&PR=="Positive" ~ "SBEYesPRPos",
                        SBE.event==1&PR=="Negative" ~ "SBEYesPRNeg",
                        SBE.event==0&(!is.na(T2LFU)) ~ "censor") |> 
       factor(levels=c("censor","SBEYesPRPos","SBEYesPRNeg"))]

dat0[,SBE.HER2:=case_when(SBE.event==1&HER2=="Positive" ~ "SBEYesHER2Pos",
                          SBE.event==1&HER2=="Negative" ~ "SBEYesHER2Neg",
                          SBE.event==0&(!is.na(T2LFU)) ~ "censor") |> 
       factor(levels=c("censor","SBEYesHER2Pos","SBEYesHER2Neg"))]

dat0[,SBE.HR:=case_when(SBE.event==1&HR=="Positive" ~ "SBEYesHRPos",
                        SBE.event==1&HR=="Negative" ~ "SBEYesHRNeg",
                        SBE.event==0&(!is.na(T2LFU)) ~ "censor") |> 
       factor(levels=c("censor","SBEYesHRPos","SBEYesHRNeg"))]

# (sapply(c("SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR"), function(x) with(dat0, table(get(x), useNA = "ifany"))))
#             SBE.ER SBE.PR SBE.HER2 SBE.HR
# censor         493    493      493    493
# SBEYesERPos     24     18        2     28
# SBEYesERNeg     69     75       82     65
# <NA>            18     18       27     18


##------Create SBE.ER.time/SBE.PR.time/SBE.HER2.time/SBE.HR.time (for cumulative incidence function)------
# When to use which()?
# with(dat0, SBE.event==1&(is.na(ER)))

# Recap how to code SBE.time
# dat0$SBE.time <- rep(NA, dim(dat0)[1])
# dat0[,SBE.time:=case_when(SBE.event==0&is.na(T2Death)&(!is.na(T2LFU)) ~ T2LFU,
#                           SBE.event==0&(!is.na(T2Death))&(!is.na(T2LFU))&(T2Death<T2LFU) ~ T2Death,
#                           SBE.event==0&(!is.na(T2Death))&(!is.na(T2LFU))&(T2Death>=T2LFU) ~ T2LFU,
#                           SBE.event==1 ~ T2SBE)]

dat0[,SBE.ER.time:=SBE.time]
dat0[which(SBE.event==1&(is.na(ER))),SBE.ER.time:=NA]
# sum(is.na(dat0$SBE.ER.time))
                          
dat0[,SBE.PR.time:=SBE.time]
dat0[which(SBE.event==1&(is.na(PR))),SBE.PR.time:=NA]

dat0[,SBE.HER2.time:=SBE.time]
dat0[which(SBE.event==1&(is.na(HER2))),SBE.HER2.time:=NA]

dat0[,SBE.HR.time:=SBE.time]
dat0[which(SBE.event==1&(is.na(HR))),SBE.HR.time:=NA]

dat0$ID[which(dat0$SBE.ER.time == dat0$T2Death)]
dat0$ID[which(dat0$SBE.PR.time == dat0$T2Death)]
dat0$ID[which(dat0$SBE.HER2.time == dat0$T2Death)]
dat0$ID[which(dat0$SBE.HR.time == dat0$T2Death)]
dat0$ER[which(dat0$SBE.ER.time == dat0$T2Death)]
dat0$PR[which(dat0$SBE.PR.time == dat0$T2Death)]
dat0$HER2[which(dat0$SBE.HER2.time == dat0$T2Death)]
dat0$HR[which(dat0$SBE.HR.time == dat0$T2Death)]


sapply(c("SBE.ER", "SBE.ER.time", "SBE.PR", "SBE.PR.time", "SBE.HER2", "SBE.HER2.time", "SBE.HR", "SBE.HR.time"), function(x) with(dat0, sum(!is.na(get(x)))))
# SBE.ER   SBE.ER.time        SBE.PR   SBE.PR.time      SBE.HER2 SBE.HER2.time        SBE.HR   SBE.HR.time 
#    586           586           586           586           577           577           586           586 

# sapply(c("SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time"), function(x) with(dat0, summary(get(x))))
#         SBE.ER.time SBE.PR.time SBE.HER2.time SBE.HR.time
# Min.       0.000000    0.000000      0.000000    0.000000
# 1st Qu.    2.023956    2.023956      2.023272    2.023956
# Median     5.446954    5.446954      5.445585    5.446954
# Mean       6.413999    6.413999      6.413774    6.413999
# 3rd Qu.    9.971253    9.971253     10.028747    9.971253
# Max.      34.754278   34.754278     34.754278   34.754278
# NA's      18.000000   18.000000     27.000000   18.000000


##------Re-categorize clinical T/N stages------
dat0[,Clinical.T.Stage2:=gsub("T1a|T1b|T1c|T2","T1|T2",
                             gsub("T3|T4","T3|T4",Clinical.T.Stage)) |>
  factor(levels=c("T1|T2","T3|T4"))]
dat0[,Clinical.N.Stage2:=gsub("N1|N2|N3","N1|N2|N3",Clinical.N.Stage) |>
  factor(levels=c("N0","N1|N2|N3"))]


##------Re-categorize race/ethnicity------
##------5 categories------
dat0[,Race.Ethnicity2:=gsub(".*Asian.*","Asian",
                            gsub("^Hisp.*","Hisp/Latina",
                                 gsub("NHW|Arabic.*","WA",
                                      gsub("^American.*|^Pacific.*|^Other$","Other",Race.Ethnicity)))) |> factor(levels=c("WA","AA","Asian","Hisp/Latina","Other"))]
##------4 categories------
dat0.race1 <- dat0[Race.Ethnicity2!="Other"]
dat0.race1$Race.Ethnicity2 <- droplevels(dat0.race1$Race.Ethnicity2)
# table(dat0.race1$Race.Ethnicity2, useNA = "ifany")
##------2 categories------
dat0[,Race.Ethnicity3:=ifelse(Race.Ethnicity2=="WA","WA","All.Others") |> factor(levels=c("WA","All.Others"))]
##------2 categories------
dat0.race2 <- dat0[Race.Ethnicity2%in%c("WA","AA")]
dat0.race2$Race.Ethnicity2 <- droplevels(dat0.race2$Race.Ethnicity2)
# table(dat0.race2$Race.Ethnicity2, useNA = "ifany")


##------Check the categorical variables------
all.char <- sapply(names(dat0)[sapply(dat0, is.character)], function(x) with(dat0, table(get(x), useNA = "ifany"))) 
all.fac <- sapply(names(dat0)[sapply(dat0, is.factor)], function(x) with(dat0, table(get(x), useNA = "ifany"))) 
all.cat <- list(all.char, all.fac)
View(all.cat)
```


```{r save_data, eval=F}
##------Save the cleaned data------
date.analysis <- format(Sys.Date(), "%Y%b%d")
write_fst(dat0, path = paste0("../data/derived/", date.analysis, "_dat_TNBC.RData"), compress = 50)
fwrite(dat0, file = paste0("../data/derived/", date.analysis, "_dat_TNBC.csv"))

for (i in 1:2) {
  race.i.file <- sprintf("dat0.race%s", i)
  write_fst(get(race.i.file), path = paste0("../data/derived/", date.analysis, "_dat_", race.i.file, "_TNBC.RData"), compress = 50)
}
```


```{r}
dat.work <- read_fst(path = "../data/derived/2024Nov22_dat_TNBC.RData")
```


## Overall distributions of patient variables
```{r set_variable_0, results="hide"}
vars.all <- c(
  ###  Demographical variables   
  "Age.at.Diagnosis", 
  "Race.Ethnicity2", "Race.Ethnicity3", 
  
  ###  Disease history variables  
  "Clinical.T.Stage", "Clinical.T.Stage2", "Clinical.N.Stage", "Clinical.N.Stage2", 
  "Index.Tumor.Status",
  "Past.Ipsilateral.Br.CA", "Past.Contralateral.Br.CA",
  "Tumor.Size.by.Imaging",
  "Needle.biopsy.proven.nodal.metastases.at.Dx",
  "Histology.Primary", "Laterality.Primary", "Any.high.grade.Disease", "Any.LVI",

  ###. Screening/diagnosis/treatment variables
  "Mammo.Screen.Detected", "Mammo.Occult", "MRI.Screen.Detected", 
  "First.Treatment.Modality", "Any.Attempt.at.Lumpectomy", "Mastectomy.Surgery",
  "CPM", "Any.SLN.Biopsy", "ALND", # CPM: contralateral prophylactic mastectomy
  "If.Primary.Surgery.pT.Stage", "If.Primary.Surgery.pN.Stage",
  "Neoadjuvant.Chemotherapy",
  "In.NACT.did.pt.complete.all.planned.preop.CTX",
  "If.NACT.CTX.components.check.all.delivered.choice.A",
  "If.NACT.CTX.components.check.all.delivered.choice.T",
  "If.NACT.CTX.components.check.all.delivered.choice.C",
  "If.NACT.preop.Herceptin", "If.NACT.preop.Perjeta",
  "If.NACT.ypT.Stage", "If.NACT.ypN.Stage", "if.NACT.pCR", # pCR
  "If.NACT.postop.Capecitabine", "If.NACT.postop.Kadcyla.TDM.1",
  "Postop.Adjuvant.CTX",
  "If.Postop.Adjuvant.CTX.did.pt.complete.all.planned.postop.CTX",
  "Postop.Adjuvant.XRT",
  "Genetic.Testing.Results",
  "Genetic.Testing.Done",
  
  ### Outcome-related variables
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER.cat", "SBE.PR.cat", "SBE.HR.cat",
  "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time" # Time variables for survival analysis
)

# Create an indicator [0, 1] for all categorical variables 
vars.cat <- rep(x = 1, times = length(vars.all))
vars.cat[vars.all %in% names(dat.work)[sapply(dat.work, is.numeric)]] <- 0
vars.cat[vars.all %in% c("SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year", "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year")] <- 1
```


```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work,
                    vars = vars.all,
                    vars.cat = vars.cat,
                    by = NULL)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Distributions of variables in patients with HR-positivity
```{r}
dat.work.hrpos <- subset(x = dat.work, subset = HR == "Positive")
```


```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work.hrpos,
                    vars = vars.all,
                    vars.cat = vars.cat,
                    by = NULL)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```

# Distributions of patients manifesting the ER-positive/PR-positive/HER2-positive/HR-positive subsequent breast cancer events
Among `r sum(dat.work$Overall.Subsequent.BC == "Yes" & (!is.na(dat.work$Date.Subsequent.BC.Event)) & (!is.na(dat.work$Date.of.Diagnosis)), na.rm = T)` patients who are diagnosed with subsequent breast cancer events and who have both dates of initial TNBC diagnosis and subsequent breast cancer event diagnosis during the entire follow-up time: 

* There are `r round(sum(dat.work$SBE.event == 1 & dat.work$ER == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)`% (n = `r sum(dat.work$SBE.event == 1 & dat.work$ER == "Positive", na.rm = T)`) patients manifesting the ER-positive subsequent breast cancer events; 

* There are `r round(sum(dat.work$SBE.event == 1 & dat.work$PR == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)`% (n = `r sum(dat.work$SBE.event == 1 & dat.work$PR == "Positive", na.rm = T)`) patients manifesting the PR-positive subsequent breast cancer events; 

* There are `r round(sum(dat.work$SBE.event == 1 & dat.work$HER2 == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)`% (n = `r sum(dat.work$SBE.event == 1 & dat.work$HER2 == "Positive", na.rm = T)`) patients manifesting the HER2-positive subsequent breast cancer events.

* There are `r round(sum(dat.work$SBE.event == 1 & dat.work$HR == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)`% (n = `r sum(dat.work$SBE.event == 1 & dat.work$HR == "Positive", na.rm = T)`) patients manifesting the HR-positive subsequent breast cancer events.


```{r eval=F, include=F}
round(sum(dat.work$SBE.event == 1 & dat.work$ER == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)

round(sum(dat.work$SBE.event == 1 & dat.work$PR == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)

round(sum(dat.work$SBE.event == 1 & dat.work$HER2 == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)

round(sum(dat.work$SBE.event == 1 & dat.work$HR == "Positive", na.rm = T)/sum(dat.work$SBE.event == 1, na.rm = T) * 100, 2)
```


# Distributions of ER, PR, HER2, HR-specific subsequent breast cancer events in 1, 2, 3, 5, 10, and 15 years 
```{r competing_risk_analysis_marker, results='hide'}
##------Estimate the cumulative incidence rates of ER, PR, HER2, HR-specific SBE------
var.time.SBE <- c("SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time")
var.event.SBE <- c("SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR")
# Show the correct initial number of patients at risk for each biomarker-specific SBE
dat.SBE.ER <- dat.work[!is.na(dat.work$SBE.ER), ]
dat.SBE.PR <- dat.work[!is.na(dat.work$SBE.PR), ]
dat.SBE.HER2 <- dat.work[!is.na(dat.work$SBE.HER2), ]
dat.SBE.HR <- dat.work[!is.na(dat.work$SBE.HR), ]
dat.SBE.HRP <- subset(dat.SBE.HR, SBE.HR %in% c("SBEYesHRPos", "censor"))

var.dat.SBE <- c("dat.SBE.ER", "dat.SBE.PR", "dat.SBE.HER2", "dat.SBE.HR")


##------Check the distributions of each survival time------
sapply(var.time.SBE, function(x) with(dat.work, summary(get(x))))

##------Estimate CIF and make plots------
out.i.SBE <- mclapply(1:4,
  function(i){ 
    est.i.SBE <- cuminc( # Use Aalen-Johansen Estimator
      formula(paste0("Surv(", var.time.SBE[i], ", ", var.event.SBE[i], ") ~ 1")), 
      data = get(var.dat.SBE[i]), conf.level = 0.95)
    
    ##------Show cumulative incidences up to 24 years------
    # out.i.SBE[[1]][["est"]][["tidy"]]|>View()
    # out.i.SBE[[2]][["est"]][["tidy"]]|>View()
    # out.i.SBE[[3]][["est"]][["tidy"]]|>View()
    est.i.SBE.enhanced <- est.i.SBE
    temp <- rbind(est.i.SBE$tidy,
                  est.i.SBE$tidy[nrow(est.i.SBE$tidy)-1, ], 
                  est.i.SBE$tidy[(nrow(est.i.SBE$tidy)/2)-1, ])
    temp[(nrow(temp)-1):nrow(temp), 1] <- 24
    est.i.SBE.enhanced$tidy <- temp
    est.i.SBE.timepoint <- est.i.SBE %>% 
      tidy(time = c(1, 2, 3, 5, 10, 15)) %>% 
      mutate(across(.cols = c(estimate, std.error, conf.low, conf.high), num, digits = 4))
    plot.i.SBE <- ggcuminc(x = est.i.SBE.enhanced, 
                           outcome = names(table(dat.work[[var.event.SBE[i]]]))[-1],
                           color = c("#A73030FF"),
                           theme = list(theme.list)) +
                  add_risktable(stats_label = c('No. At Risk', 
                                                'No. Of Events')) + #
                   ggplot2::scale_x_continuous(limits = c(0, 24),
                                               breaks = c(seq(0, 24, 3)), 
                                               labels = c(seq(0, 24, 3))) +
                   ggplot2::scale_y_continuous(labels = scales::label_number(scale = 100), # Convert to %
                                               breaks = seq(0, 1, 0.2),
                                               limits = c(0, 1)) + 
                   ggplot2::labs(x = "Years Since TNBC Diagnosis", y = "SBE Probability, %")
    return(list(est = est.i.SBE, est.timepoint = est.i.SBE.timepoint, curve = plot.i.SBE)) 
    },
  mc.cores = 4)

dir.fig <- "../report/figs_2025Jan06/"
fig.name <- c("SBE_ER", "SBE_PR", "SBE_HER2", "SBE_HR")
plan(multisession)
future_lapply(1:4, function(i) {
  R.devices::suppressGraphics({
    ggsave(filename = dir.fig %0% fig.name[i] %0% ".eps", 
    plot = out.i.SBE[[i]][["curve"]],
    device = "eps", # agg_png
    width = 5, height = 4, units = "in", dpi = 300)})
})
```


## Cumulative incidence rates of ER-specific subsequent breast cancer events
```{r}
knitr::kable(out.i.SBE[[1]][["est.timepoint"]])
cat("\n")
knitr::include_graphics(dir.fig %0% "SBE_ER" %0% ".tiff")
```


## Cumulative incidence rates of PR-specific subsequent breast cancer events
```{r}
knitr::kable(out.i.SBE[[2]][["est.timepoint"]])
cat("\n")
knitr::include_graphics(dir.fig %0% "SBE_PR" %0% ".tiff")
```


## Cumulative incidence rates of HER2-specific subsequent breast cancer events
```{r}
knitr::kable(out.i.SBE[[3]][["est.timepoint"]])
cat("\n")
knitr::include_graphics(dir.fig %0% "SBE_HER2" %0% ".tiff")
```

## [Fig2] Cumulative incidence rates of HR-specific subsequent breast cancer events
```{r}
knitr::kable(out.i.SBE[[4]][["est.timepoint"]])
cat("\n")
knitr::include_graphics(dir.fig %0% "SBE_HR" %0% ".tiff")
```

## Odds ratio for patients with subsequent breast cancer events having HR-positive versus HR-negative
```{r}
data <- table(dat.work$Overall.Subsequent.BC, dat.work$HR)
colnames(data) <- c("HR Negative","HR Positive")
rownames(data) <- c("No subsequent BC","Have subsequent BC")
data |> knitr::kable() 
fisher.test(data, conf.level = 0.95, alternative = "two.sided") 
```

# Distributions of follow-up time, time to overall subsequent breast cancer events, and other time
The variables `T2LFU` (follow-up time), `T2SBE` (time to the overall subsequent breast cancer events), and `T2Death` (time to the death) are estimated by subtracting the date of diagnosis from the date of last follow-up, from the date of overall subsequent breast cancer events, or from the date of death, respectively. The variables `T2ER` (time to the ER-positive subsequent breast cancer events), `T2PR` (time to the PR-positive subsequent breast cancer events), `T2HER2` (time to the HER2-positive subsequent breast cancer events), `T2HR` (time to the HR-positive subsequent breast cancer events) are estimated by subtracting the date of diagnosis from the date of overall subsequent breast cancer events in the patients with ER-positive, PR-positive, HER2-positive, or HR-positive subsequent breast cancer events, respectively.

As the following table shows, the median of follow-up time in the entire cohort is `r median(dat.work$T2LFU, na.rm = T) |> round(2)` (in years), and the median time to the overall subsequent breast cancer events from the initial diagnosis of TNBC is `r median(dat.work$T2SBE, na.rm = T) |> round(2)` (in years).
```{r}
sapply(c("T2LFU", "T2SBE", "T2Death", "T2ER", "T2PR", "T2HER2", "T2HR"), function(x) with(dat.work, c(summary(get(x)), `SD` = sd(get(x), na.rm = T)) |> round(2))) |> knitr::kable()
```


# Distributions of overall subsequent breast cancer events
Among `r nrow(dat.work)` patients, a total of `r sum(dat.work$Overall.Subsequent.BC=="Yes"&(!is.na(dat.work$T2SBE)))` patients manifest the overall subsequent breast cancer events. These patients have both dates of initial TNBC diagnosis and subsequent breast cancer event diagnosis.


## [Fig1] Incidence rates of the overall subsequent breast cancer events in 1, 2, 3, 5, 10 and 15 years
```{r km_analysis_nomarker, results='hide'}
##------Estimate the incidence rates of SBE------
var.time <- c("SBE.time")
var.event <- c("SBE.event")
out.i <- mclapply(1:1,
  function(i){ 
    fit.i <- survfit2(formula(paste0("Surv(", var.time[i], ", ", var.event[i], ") ~ 1")), data = dat.work) 
    res.i <- summary(fit.i, times = c(1, 2, 3, 5, 10, 15)) 
    out.i <- data.frame(time = res.i$time, 
                        n.risk = res.i$n.risk, 
                        estimate = 1 - res.i$surv, 
                        std.error = res.i$std.err, 
                        conf.low = 1 - res.i$upper, 
                        conf.high = 1 - res.i$lower) %>% 
      mutate(across(.cols = c(estimate, std.error, conf.low, conf.high), num, digits = 4))

    ##------Show cumulative incidences up to 24 years------
    # out.i[[1]][["fit"]]|>View()
    fit.i.enhanced <- fit.i
    fit.i.enhanced$time <- c(fit.i.enhanced$time, 24)
    fit.i.enhanced$n.risk <- c(fit.i.enhanced$n.risk, fit.i.enhanced$n.risk[length(fit.i.enhanced$n.risk)-1])
    fit.i.enhanced$n.event <- c(fit.i.enhanced$n.event, fit.i.enhanced$n.event[length(fit.i.enhanced$n.event)-1])
    fit.i.enhanced$n.censor <- c(fit.i.enhanced$n.censor, fit.i.enhanced$n.censor[length(fit.i.enhanced$n.censor)-1])
    fit.i.enhanced$surv <- c(fit.i.enhanced$surv, fit.i.enhanced$surv[length(fit.i.enhanced$surv)-1])
    fit.i.enhanced$std.err <- c(fit.i.enhanced$std.err, fit.i.enhanced$std.err[length(fit.i.enhanced$std.err)-1])
    fit.i.enhanced$cumhaz <- c(fit.i.enhanced$cumhaz, fit.i.enhanced$cumhaz[length(fit.i.enhanced$cumhaz)-1])
    fit.i.enhanced$std.chaz <- c(fit.i.enhanced$std.chaz, fit.i.enhanced$std.chaz[length(fit.i.enhanced$std.chaz)-1])
    fit.i.enhanced$lower <- c(fit.i.enhanced$lower, fit.i.enhanced$lower[length(fit.i.enhanced$lower)-1])    
    fit.i.enhanced$upper <- c(fit.i.enhanced$upper, fit.i.enhanced$upper[length(fit.i.enhanced$upper)-1])
    plot.i <- ggsurvfit(x = fit.i.enhanced,
                        type = "risk",
                        color = c("#A73030FF"),
                        theme = list(theme.list)) +
                add_risktable(stats_label = c('No. At Risk', 
                                              'No. Of Events')) + #
                ggplot2::scale_x_continuous(limits = c(0, 24),
                                            breaks = c(seq(0, 24, 3), 24), 
                                            labels = c(seq(0, 24, 3), 24)) +
                ggplot2::scale_y_continuous(labels = scales::label_number(scale = 100), # Convert to %
                                            breaks = seq(0, 1, 0.2),
                                            limits = c(0, 1)) + 
                ggplot2::labs(x = "Years Since TNBC Diagnosis", y = "SBE Probability, %") #
    return(list(fit = fit.i, res = res.i, out = out.i, plot = plot.i))},
  mc.cores = 4)

dir.fig <- "../report/figs_2025Jan06/" #
fig.name <- c("SBE")
plan(multisession) 
future_lapply(1:1, function(i) {
  R.devices::suppressGraphics({
    ggsave(filename = dir.fig %0% fig.name[i] %0% ".eps", 
    plot = out.i[[i]][["plot"]],
    device = "eps", # agg_png
    width = 5, height = 4, units = "in", dpi = 300)})
})
```


```{r}
knitr::kable(out.i[[1]][["out"]])
cat("\n")
knitr::include_graphics(dir.fig %0% "SBE" %0% ".tiff")
```


# Distributions of overall survival
## Probability of the overall survival in 1, 2, 3, 5, 10 and 15 years
```{r km_os_analysis, results='hide'}
##------Summarize OS probability estimates and 95%CIs at years 1, 2, 3, 5, 10, and 15------
# 0=alive, 1=dead
dat.work$OS.death <- case_when(!is.na(dat.work$T2Death) ~ 1, 
                               is.na(dat.work$T2Death) & !is.na(dat.work$T2LFU) == 1 ~ 0,
                               is.na(dat.work$T2LFU) & is.na(dat.work$T2Death) ~ NA) |> as.numeric() 
dat.work$OS.time <- case_when(dat.work$OS.death==1 ~ dat.work$T2Death, 
                              dat.work$OS.death==0 ~ dat.work$T2LFU,
                              is.na(dat.work$OS.death) ~ NA) |> as.numeric() 

var.time <- c('OS.time')
var.event <- c('OS.death')
out.i <- mclapply(1:1,
                  function(i){ 
                    fit.i <- survfit2(formula(paste0("Surv(", var.time[i], ", ", var.event[i], ") ~ 1")), data = dat.work) 
                    res.i <- summary(fit.i, times = c(1, 2, 3, 5, 10, 15)) 
                    out.i <- data.frame(time = res.i$time, 
                                        n.risk = res.i$n.risk, 
                                        estimate = res.i$surv, # 
                                        std.error = res.i$std.err, 
                                        conf.low = res.i$lower, # 
                                        conf.high = res.i$upper) %>% #
                      mutate(across(.cols = c(estimate, std.error, conf.low, conf.high), num, digits = 4))
                    
                    ##------Show survival probability up to 24 years------
                    # out.i[[1]][["fit"]]|>View()
                    fit.i.enhanced <- fit.i
                    fit.i.enhanced$time <- c(fit.i.enhanced$time, 24)
                    fit.i.enhanced$n.risk <- c(fit.i.enhanced$n.risk, fit.i.enhanced$n.risk[length(fit.i.enhanced$n.risk)-1])
                    fit.i.enhanced$n.event <- c(fit.i.enhanced$n.event, fit.i.enhanced$n.event[length(fit.i.enhanced$n.event)-1])
                    fit.i.enhanced$n.censor <- c(fit.i.enhanced$n.censor, fit.i.enhanced$n.censor[length(fit.i.enhanced$n.censor)-1])
                    fit.i.enhanced$surv <- c(fit.i.enhanced$surv, fit.i.enhanced$surv[length(fit.i.enhanced$surv)-1])
                    fit.i.enhanced$std.err <- c(fit.i.enhanced$std.err, fit.i.enhanced$std.err[length(fit.i.enhanced$std.err)-1])
                    fit.i.enhanced$cumhaz <- c(fit.i.enhanced$cumhaz, fit.i.enhanced$cumhaz[length(fit.i.enhanced$cumhaz)-1])
                    fit.i.enhanced$std.chaz <- c(fit.i.enhanced$std.chaz, fit.i.enhanced$std.chaz[length(fit.i.enhanced$std.chaz)-1])
                    fit.i.enhanced$lower <- c(fit.i.enhanced$lower, fit.i.enhanced$lower[length(fit.i.enhanced$lower)-1])    
                    fit.i.enhanced$upper <- c(fit.i.enhanced$upper, fit.i.enhanced$upper[length(fit.i.enhanced$upper)-1])
                    plot.i <- ggsurvfit(x = fit.i.enhanced,
                                        type = "survival", #
                                        color = c("#A73030FF"),
                                        theme = list(theme.list)) +
                      add_risktable() +
                      ggplot2::scale_x_continuous(limits = c(0, 24),
                                                  breaks = c(seq(0, 24, 3), 24), 
                                                  labels = c(seq(0, 24, 3), 24)) +
                      ggplot2::scale_y_continuous(labels = scales::label_number(scale = 100),
                                                  breaks = seq(0, 1, 0.2),
                                                  limits = c(0, 1)) + 
                      ggplot2::labs(x = "Years Since Triple Negative Breast Cancer Diagnosis", y = "Survival probability, %")
                    return(list(fit = fit.i, res = res.i, out = out.i, plot = plot.i))},
                  mc.cores = 4)

dir.fig <- "../report/figs_2025Jan06/"
fig.name <- c('OS_Death')
plan(multisession) 
future_lapply(1:1, function(i) {
  R.devices::suppressGraphics({
    ggsave(filename = dir.fig %0% fig.name[i] %0% ".eps", 
           plot = out.i[[i]][["plot"]],
           device = "eps", # agg_png
           width = 5, height = 4, units = "in", dpi = 300)})
})
```


```{r}
knitr::include_graphics(dir.fig %0% "OS_Death" %0% ".tiff")
knitr::kable(out.i[[1]][["out"]])
cat("\n")
```


# Univariate analysis between patient factors and overall subsequent breast cancer events
## Factors associated with 3-year overall subsequent breast cancer events (column proportion)
```{r set_variable_1, results="hide"}
vars.all <- c(
  ###  Demographical variables   
  "Age.at.Diagnosis", 
  "Race.Ethnicity2", "Race.Ethnicity3", 
  
  ###  Disease history variables  
  "Clinical.T.Stage", "Clinical.T.Stage2", "Clinical.N.Stage", "Clinical.N.Stage2", 
  "Index.Tumor.Status",
  "Past.Ipsilateral.Br.CA", "Past.Contralateral.Br.CA",
  "Tumor.Size.by.Imaging",
  "Needle.biopsy.proven.nodal.metastases.at.Dx",
  "Histology.Primary", "Laterality.Primary", "Any.high.grade.Disease", "Any.LVI",

  ###. Screening/diagnosis/treatment variables
  "Mammo.Screen.Detected", "Mammo.Occult", "MRI.Screen.Detected", 
  "First.Treatment.Modality", "Any.Attempt.at.Lumpectomy", "Mastectomy.Surgery",
  "CPM", "Any.SLN.Biopsy", "ALND",
  "If.Primary.Surgery.pT.Stage", "If.Primary.Surgery.pN.Stage",
  "Neoadjuvant.Chemotherapy",
  "In.NACT.did.pt.complete.all.planned.preop.CTX",
  "If.NACT.CTX.components.check.all.delivered.choice.A",
  "If.NACT.CTX.components.check.all.delivered.choice.T",
  "If.NACT.CTX.components.check.all.delivered.choice.C",
  "If.NACT.preop.Herceptin", "If.NACT.preop.Perjeta",
  "If.NACT.ypT.Stage", "If.NACT.ypN.Stage", "if.NACT.pCR", # pCR
  "If.NACT.postop.Capecitabine", "If.NACT.postop.Kadcyla.TDM.1",
  "Postop.Adjuvant.CTX",
  "If.Postop.Adjuvant.CTX.did.pt.complete.all.planned.postop.CTX",
  "Postop.Adjuvant.XRT",
  "Genetic.Testing.Done",
  
  ### Outcome-related variables
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time" # Time variables for survival analysis
)

# Create an indicator [0, 1] for all categorical variables 
vars.cat <- rep(1, length(vars.all))
vars.cat[vars.all %in% c("Age.at.Diagnosis", "Tumor.Size.by.Imaging",
                         "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
                         "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time")] <- 0
```


```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.3year))
vars.cat.rm <- which(vars.all %in% c(
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "ER", "PR", "HER2", "SBE.ER", "SBE.PR", "SBE.HER2",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time" 
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.3year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 3-year overall subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.3year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 5-year overall subsequent breast cancer events (column proportion)
```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.5year))
vars.cat.rm <- which(vars.all %in% c(
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "ER", "PR", "HER2", "SBE.ER", "SBE.PR", "SBE.HER2",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time" 
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.5year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 5-year overall subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.5year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 10-year overall subsequent breast cancer events (column proportion)
```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.10year))
vars.cat.rm <- which(vars.all %in% c(  
  "If.NACT.postop.Kadcyla.TDM.1",
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "ER", "PR", "HER2", "SBE.ER", "SBE.PR", "SBE.HER2",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time" 
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.10year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 10-year overall subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.10year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 15-year overall subsequent breast cancer events (column proportion)
```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.10year))
vars.cat.rm <- which(vars.all %in% c(  
  "If.NACT.postop.Kadcyla.TDM.1",
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "ER", "PR", "HER2", "SBE.ER", "SBE.PR", "SBE.HER2",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time" 
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.15year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 15-year overall subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.15year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```

# Univariate analysis between patient factors and HR-specific subsequent breast cancer events
## Factors associated with 3-year HR-specific subsequent breast cancer events (column proportion)
```{r set_variable_2, results="hide"}
vars.all <- c(
  ###  Demographical variables   
  "Age.at.Diagnosis", 
  "Race.Ethnicity2", "Race.Ethnicity3", 
  
  ###  Disease history variables  
  "Clinical.T.Stage", "Clinical.T.Stage2", "Clinical.N.Stage", "Clinical.N.Stage2", 
  "Index.Tumor.Status",
  "Past.Ipsilateral.Br.CA", "Past.Contralateral.Br.CA",
  "Tumor.Size.by.Imaging",
  "Needle.biopsy.proven.nodal.metastases.at.Dx",
  "Histology.Primary", "Laterality.Primary", "Any.high.grade.Disease", "Any.LVI",

  ###. Screening/diagnosis/treatment variables
  "Mammo.Screen.Detected", "Mammo.Occult", "MRI.Screen.Detected", 
  "First.Treatment.Modality", "Any.Attempt.at.Lumpectomy", "Mastectomy.Surgery",
  "CPM", "Any.SLN.Biopsy", "ALND",
  "If.Primary.Surgery.pT.Stage", "If.Primary.Surgery.pN.Stage",
  "Neoadjuvant.Chemotherapy",
  "In.NACT.did.pt.complete.all.planned.preop.CTX",
  "If.NACT.CTX.components.check.all.delivered.choice.A",
  "If.NACT.CTX.components.check.all.delivered.choice.T",
  "If.NACT.CTX.components.check.all.delivered.choice.C",
  "If.NACT.preop.Herceptin", "If.NACT.preop.Perjeta",
  "If.NACT.ypT.Stage", "If.NACT.ypN.Stage", 
  "if.NACT.pCR", # pCR
  "If.NACT.postop.Capecitabine", 
  # "If.NACT.postop.Kadcyla.TDM.1", # 'x' must have at least 2 rows and columns
  "Postop.Adjuvant.CTX",
  "If.Postop.Adjuvant.CTX.did.pt.complete.all.planned.postop.CTX",
  "Postop.Adjuvant.XRT", 
  "Genetic.Testing.Done", 
  
  ### Outcome-related variables
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time" # Time variables for survival analysis
)

# Create an indicator [0, 1] for all categorical variables 
vars.cat <- rep(1, length(vars.all))
vars.cat[vars.all %in% c("Age.at.Diagnosis", "Tumor.Size.by.Imaging",
                         "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
                         "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time")] <- 0
```

```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.HR.3year))
vars.cat.rm <- which(vars.all %in% c(
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time"
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.3year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 3-year HR-specific subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.3year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 5-year HR-specific subsequent breast cancer events (column proportion)
```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.HR.5year))
vars.cat.rm <- which(vars.all %in% c(
  "Histology.Primary", # 'x' must have at least 2 rows and columns
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time"
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.5year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 5-year HR-specific subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.5year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 10-year HR-specific subsequent breast cancer events (column proportion)
```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.HR.10year))
vars.cat.rm <- which(vars.all %in% c(
  "Histology.Primary", # 'x' must have at least 2 rows and columns
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time"
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.10year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 10-year HR-specific subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.10year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 15-year HR-specific subsequent breast cancer events (column proportion)
```{r, results="hide"}
id.keep <- which(!is.na(dat.work$SBE.HR.15year))
vars.cat.rm <- which(vars.all %in% c(
  "Mammo.Occult", "Histology.Primary", # 'x' must have at least 2 rows and columns
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2HR", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time"
))

out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.15year")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with 15-year HR-specific subsequent breast cancer events (row proportion)
```{r, results="hide"}
out <- fsmry.dmgrph(dat = dat.work[id.keep, ],
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "SBE.HR.15year",
                    prop.by.row = T)
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Factors associated with the survival time till overall subsequent breast cancer events
```{r, results="hide"}
vars.ana <- vars.all[!vars.all %in% c(
  "Histology.Subsequent", "Laterality.Subsequent",
  "SBE.event", "SBE.3year", "SBE.5year", "SBE.10year", "SBE.15year",
  "SBE.HR.3year", "SBE.HR.5year", "SBE.HR.10year", "SBE.HR.15year",
  "ER", "PR", "HER2", "HR", "SBE.ER", "SBE.PR", "SBE.HER2", "SBE.HR",
  "T2SBE", "T2ER", "T2PR", "T2HER2", "T2LFU", "T2Death", 
  "SBE.time", "SBE.ER.time", "SBE.PR.time", "SBE.HER2.time", "SBE.HR.time")]

out <- uni.coxph(surv.time = "SBE.time", surv.event = "SBE.event")

table(unlist(lapply(out, length)))

out.stat <- lapply(out, function(x) lapply(x, fcphuni.stat))
out.tbl <- lapply(out.stat, fcphuni.tbl)
out.tbl <- do.call(rbind, out.tbl)
row.names(out.tbl) <- NULL
```


```{r}
knitr::kable(out.tbl)
```


## Factors associated with the survival time till HR-specific subsequent breast cancer events
```{r, results="hide"}
# Recap
# dat0$SBE.event <- rep(NA, dim(dat0)[1])
# dat0[Overall.Subsequent.BC=="Yes"&(!is.na(T2SBE))]$SBE.event <- 1
# dat0[Overall.Subsequent.BC=="No"&((!is.na(T2LFU)) | (!is.na(T2Death)))]$SBE.event <- 0
# dat0[,SBE.event:=gsub(TRUE,1,
#                       gsub(FALSE,0,SBE.event)) |> as.numeric()]
# 
# dat0[,SBE.HR:=case_when(SBE.event==1&HR=="Positive" ~ "SBEYesHRPos",
#                         SBE.event==1&HR=="Negative" ~ "SBEYesHRNeg",
#                         SBE.event==0&(!is.na(T2LFU)) ~ "censor") |> 
#        factor(levels=c("censor","SBEYesHRPos","SBEYesHRNeg"))]

dat.work$SBE.HR.event <- gsub("censor|SBEYesHRNeg", 0, gsub("SBEYesHRPos", 1, dat.work$SBE.HR)) |> as.numeric()

# table(dat.work$SBE.HR.event, useNA = "ifany")

out <- uni.coxph(surv.time = "SBE.HR.time", surv.event = "SBE.HR.event")

table(unlist(lapply(out, length)))

out.stat <- lapply(out, function(x) lapply(x, fcphuni.stat))
out.tbl <- lapply(out.stat, fcphuni.tbl)
out.tbl <- do.call(rbind, out.tbl)
row.names(out.tbl) <- NULL
```

```{r}
knitr::kable(out.tbl)
```

# Distributions of patient characteristics across the race/ethnicity groups 
## Summary of patient characteristics by 5 categories of race/ethnicity (WA, AA, Asian, Hisp/Latina, Other)
```{r, results="hide"}
vars.cat.rm <- which(vars.all %in% c("Race.Ethnicity2", "Race.Ethnicity3", "T2HER2"))
# tapply(dat.work$T2HER2, dat.work$Race.Ethnicity2, mean, na.rm = T)
#          WA          AA       Asian Hisp/Latina       Other 
#    2.258727         NaN         NaN         NaN         NaN 
# tapply(dat.work$T2HER2, dat.work$Race.Ethnicity2, summary)

out <- fsmry.dmgrph(dat = dat.work,
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "Race.Ethnicity2")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Summary of patient characteristics by 4 categories of race/ethnicity excluding the "Other" (WA, AA, Asian, Hisp/Latina)
```{r, results="hide"}
dat0.race1 <- read_fst(path = "../data/derived/2024Aug08_dat_dat0.race1_TNBC.RData")

vars.cat.rm <- which(vars.all %in% c("Race.Ethnicity2", "Race.Ethnicity3", "T2HER2"))
out <- fsmry.dmgrph(dat = dat0.race1,
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "Race.Ethnicity2")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Summary of patient characteristics by 2 categories of race/ethnicity (WA, All other races/ethnicities)
```{r, results="hide"}
vars.cat.rm <- which(vars.all %in% c("Race.Ethnicity2", "Race.Ethnicity3"))
out <- fsmry.dmgrph(dat = dat.work,
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "Race.Ethnicity3")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


## Summary of patient characteristics by 2 categories of race/ethnicity (WA, AA)
```{r, results="hide"}
dat0.race2 <- read_fst(path = "../data/derived/2024Aug08_dat_dat0.race2_TNBC.RData")

vars.cat.rm <- which(vars.all %in% c("Race.Ethnicity2", "Race.Ethnicity3", "T2HER2"))
out <- fsmry.dmgrph(dat = dat0.race2,
                    vars = vars.all[-vars.cat.rm],
                    vars.cat = vars.cat[-vars.cat.rm],
                    by = "Race.Ethnicity2")
```


```{r, results="asis"}
knitr::kable(out[[1]], row.names = F)
```


# Session info
```{r}
sessionInfo()
```