MDC/MDC_v1.Rmd

---
title: "Multidisciplinary breast program in New York City"
author: "Anni Liu"
date: "October 3, 2022"
output:
  word_document:
    fig_height: 4.5
    fig_width: 4.5
  html_document:
    df_print: paged
  pdf_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  options(scipen = 999, digits = 4),
  cache = TRUE,
  error = FALSE,
  message = FALSE,
  warning = FALSE,
  tidy.opts = list(width.cutoff = 60),
  tidy = TRUE,
  fig.width = 12,
  fig.height = 8
)
```

# Summary
**Distributional characteristics of PR/ER/HER2/Tumor subtype:**
•	Significantly greater proportion of the positive progesterone receptor (PR) status was detected in patients at NYPQ compared to WCM (overall: P < 0.001; Asian: P = 0.010; Chinese: P = 0.022). Chinese patients diagnosed at NYPQ appeared to have the highest PR positive rate (84%, 203/241) in comparison with Non-Chinese patients at NYPQ (81%, 509/626), Chinese patients at WCM (75%, 113/151), and non-Chinese patients at WCM (74%, 1142/1533) (P < 0.001).  
•	The HER2 positive rate appeared to be highest in Chinese patients treated at WCM (23%, 27/115) in comparison with Chinese patients treated at Queens (15%, 23/156), non-Chinese patients treated at Queens (13%, 64/481), and non-Chinese patients treated at WCM (17%, 202/1210) (P = 0.047). Between-sites differences (Queens for Chinese: 15%, 23/156 vs Cornell for Chinese: 23%, 27/115; Queens for Non-Chinese: 13%, 64/481 vs Cornell for Non-Chinese: 17%, 202/1210) appeared to be greater than the difference between Chinese and Non-Chinese within a site (Queens: 15% vs 13%; Cornell: 23% vs 17%).
•	For Asian, Chinese, and Non-Chinese Asian patients, there was no significant difference in the proportion of the positive estrogen receptor (ER) status between the two sites. The distributions of ER statuses appeared to be similar among Asian patients at NYPQ, non-Asian patients at NYPQ, Asian patients at WCM, and non-Asian patients at WCM; and among East-Asian patients at NYPQ, non-East-Asian patients at NYPQ, East-Asian patients at WCM, and non-East-Asian patients at WCM.
•	Significantly greater proportion of the HR positive and HER2 negative status was detected in all patients at NYPQ compared to WCM (P = 0.039).  However, the distributions of tumor subtypes appeared to be similar among Asian patients between the two sites, among Chinese patients between the two sites, and among non-Chinese Asian patients between the two sites. 

**Distributional characteristics of clinical T/N stages:**
•	Greater proportion of the higher T stages (T3/T4) was detected in all patients at WCM compared to NYPQ although the difference was not statistically significant (P = 0.080). However, the distributions of T stages appeared to be similar among Asian patients between the two sites, among Chinese patients between the two sites, and among non-Chinese Asian patients between the two sites. Similar insignificant results are observed among Asian patients at NYPQ, non-Asian patients at NYPQ, Asian patients at WCM, and non-Asian patients at WCM; and among East-Asian patients at NYPQ, non-East-Asian patients at NYPQ, East-Asian patients at WCM, and non-East-Asian patients at WCM.
•	Significantly greater proportion of the higher N stages (N1/N2/N3) was detected in Chinese patients at WCM compared to NYPQ (P = 0.004). However, there was no significant difference in the distributions of N stages among overall patients between the two sites, among Asian patients between the two sites, among Chinese patients between the two sites, and among non-Chinese Asian patients between the two sites.

**Distributional characteristics of node-negative breast cancers:**
•	Marginally significant lower proportion of node-negative breast cancers was detected in Chinese patients at WCM compared to non-Chinese patients at WCM (P = 0.059). However, there was no significant difference in the proportion of node-negative breast cancers in Chinese patients at NYPQ compared to non-Chinese patients at NYPQ (P = 0.259).

**Distributional characteristics of mammogram/MRI screening:**
•	Significantly greater proportion of patients at NYPQ was detected by mammogram screening compared to WCM (P < 0.001). Similar results were observed for Asian patients between the two sites, for Chinese patients between the two sites (P < 0.001 for both), and for non-Chinese Asian patients between the two sites (P = 0.060). The distributions of patients detected by mammogram screening were significantly different among Asian patients at NYPQ, non-Asian patients at NYPQ, Asian patients at WCM, and non-Asian patients at WCM, where greater proportions of mammogram screening detection were observed in Asian and non-Asian patients at NYPQ (P < 0.001). Similar results were observed in East-Asian patients at NYPQ, non-East-Asian patients at NYPQ, East-Asian patients at WCM, and non-East-Asian patients at WCM, where greater proportions of mammogram screening detection were observed in East-Asian and non-East-Asian patients at NYPQ (P < 0.001).
•	However, significantly greater proportion of patients at WCM was detected by MRI screening compared to NYPQ (P = 0.003). The distributions of patients detected by MRI screening were significantly different among Asian patients at NYPQ, non-Asian patients at NYPQ, Asian patients at WCM, and non-Asian patients at WCM, where greater proportions of MRI screening detection were observed in Asian and non-Asian patients at WCM compared to NYPQ (P = 0.002). Similar results were observed for East-Asian patients at NYPQ, non-East-Asian patients at NYPQ, East-Asian patients at WCM, and non-East-Asian patients at WCM where greater proportions of MRI screening detection were observed in East-Asian and non-East-Asian patients at WCM compared to NYPQ (P < 0.001). For Asian, Chinese, and Non-Chinese Asian patients, there was no significant difference in the proportion of patients detected by MRI screening between the two sites. 

**Distributional characteristics of type of insurances:**
•	The distributions of types of insurances used were significantly different between all WCM patients and all NYPQ patients where greater proportion of NYPQ patients had Medicaid or Medicare (P < 0.001).  Similarly, significant differences were observed among Asian patients, Chinese patients, and non-Chinese Asian patient between the two sites (P < 0.001 for all).  

**Distributional characteristics of age at diagnosis:**
•	Patients at NYPQ appeared to be diagnosed at an older age compared to patients at WCM (overall, P < 0.001; Asians: P < 0.001; Chinese: P < 0.001; Non-Chinese Asian: P < 0.001). 


# Code appendix 
## Goal 1: Combine the dataset from two hospital sites (Queens vs Cornell)

```{r include=FALSE, eval=FALSE}
## Load image
load(file = "2022Oct3.RData")

## Save image
save.image(file = "2022Oct3.RData")
```

```{r}
## Load libraries
easypackages::libraries("tidyverse", "readxl", "gtsummary", "flextable", "stringi")

## Load WCM 2019-2021 data
wcm_data <- read_xlsx("Asian MDC_Stats_Final_recoded_9_23_2.xlsx", 
                      sheet = "WCM 2019-2021_Stats_Final_recod", 
                      range = c("A1:Q1719"), 
                      col_names = TRUE) %>% 
  select(-c("Date of Diagnosis", "Year")) %>%
  rename("StudyID" = "Site",
         "RaceEthnicity.1" = "Race/Ethnicity (if mixed, circle more than one) 2", 
         "RaceEthnicity.2" = "Race/Ethnicity (if mixed, circle more than one) 2 2",
         "Race" = "Race (Brief) 2", 
         "Insure.type" = "Type of insurance",
         "Age.Dx" = "Age at Diagnosis",
         "MMGSD" = "Mammo Screen-Detected",
         "MMGO" = "Mammo-Occult",
         "MRISD" = "MRI Screen-Detected",
         "T.stage" = "Clinical T Stage",
         "N.stage" = "Clinical N Stage",
         "ER" = "ER Status",
         "PR" = "PR Status",
         "HER2" = "HER2 Status") %>%
  mutate("Hospital" = "Cornell")

## Load Queens 19-21 data
queens_data <- read_xlsx("Asian MDC_Stats_Final_recoded_9_23_2.xlsx", 
                         sheet = "Queens 19-21_Stats_Final_recode", 
                         range = c("A1:Q871"), 
                         col_names = TRUE) %>% 
  select(-c("Month", "Year")) %>%
  rename("StudyID" = "ID",
         "RaceEthnicity.1" = "Race/Ethnicity", 
         "RaceEthnicity.2" = "Race/Ethnicity 2 2 2",
         "Race" = "Race (Brief) 2", 
         "Insure.type" = "Type of Insurance",
         "Age.Dx" = "Age at Diagnosis",
         "MMGSD" = "Mammo Screen-Detected",
         "MMGO" = "Mammmo-Occult",
         "MRISD" = "MRI Screen-Detected",
         "T.stage" = "Clinical T Stage 2",
         "N.stage" = "Clinical N Stage",
         "ER" = "ER Status",
         "PR" = "PR Status",
         "HER2" = "HER2 Status") %>%
  mutate("Hospital" = "Queens")

## Merge two data
data_full <- rbind(wcm_data, queens_data)

## Recode variables
### RaceEthnicity.1 - include Chinese
# table(data_full$RaceEthnicity.1, useNA = "always")
data_full$RaceEthnicity.1 <- gsub("Unknown", NA, data_full$RaceEthnicity.1)

### RaceEthnicity2 - include Asian subgroups
data_full$RaceEthnicity.2 <- with(data_full,
                                  ifelse(RaceEthnicity.2 == "Unknown", NA,
                                         ifelse(RaceEthnicity.2 %in% c("Southeast Asia", "Southeast Asian"), "Southeast Asian", RaceEthnicity.2))) %>%
  factor(levels = c("East Asian", "South Asian", "Southeast Asian", "Asian Unknown", "Hispanic", "NHB", "NHW", "Multiracial", "Other"),
         labels = c("East Asian", "South Asian", "Southeast Asian", "Asian Unknown", "Hispanic", "Non-hispanic black", "Non-hispanic white", "Multiracial", "Other"))

### Race
data_full$Race <- gsub("UNK", NA, data_full$Race) %>% 
  factor(levels = c("AS", "HIS", "NHB", "NHW", "MIX", "OTH"),
         labels = c("Asian", "Hispanic", "Non-hispanic black", "Non-hispanic white", "Multiracial", "Other"))
### Fix one observation
data_full$Race[which(data_full$StudyID %in% c("WCM731"))] <- "Other" ## RaceEthnicity.1 and RaceEthnicity.2 of this patient are Other

### BMI
data_full$BMI <- ifelse(data_full$BMI %in% c("?", "*", "Unknown"), NA, as.numeric(data_full$BMI))

### Insure.type
data_full <- data_full %>% 
  mutate(Insure.type = case_when(Insure.type %in% c("Medcaid", "Medicad", "Medicaid") ~ "Medicaid", 
                                 Insure.type %in% c("private", "Private", "PrIvate") ~ "Private",
                                 TRUE ~ Insure.type))
data_full$Insure.type <- with(data_full, 
                              factor(Insure.type, 
                              levels = c("Medicaid", "Medicare", "Private", "Self Pay", "Other", "None"),
                              ordered = TRUE))

### MMGSD
data_full$MMGSD <- gsub("Unknown", NA, data_full$MMGSD) %>% 
  factor(levels = c("Yes", "No"))

### MMGO
data_full <- data_full %>% 
  mutate(MMGO = case_when(MMGO %in% c("no", "No") ~ "No", 
                          MMGO == "Unknown" ~ as.character(NA),
                          TRUE ~ MMGO))
data_full$MMGO <- with(data_full, 
                       factor(MMGO, 
                       levels = c("Yes", "No", NA)))

### MRISD
data_full <- data_full %>% 
  mutate(MRISD = case_when(MRISD %in% c("no", "No") ~ "No", 
                           MRISD == "Unknown" ~ as.character(NA),
                           TRUE ~ MRISD))
data_full$MRISD <- with(data_full, 
                        factor(MRISD, 
                        levels = c("Yes", "No")))

### T.stage
data_full <- data_full %>% 
  mutate(T.stage = case_when(T.stage %in% c("NA", NA) ~ as.character(NA), 
                             T.stage %in% c("rcT4c", "T4", "T4a", "T4b", "T4c", "T4d") ~ "T4",
                             T.stage %in% c("rcTx", "Tx") ~ "TX",
                             T.stage %in% c("T1c", "T1C", "T1mi", "T1mic", "T1", "T1a", "Tia", "T1b") ~ "T1",
                             T.stage %in% c("Recurrence pathologic stage T2N3a", "T2") ~ "T2",
                             T.stage %in% c("Tis", "TIs") ~ "Tis",
                             TRUE ~ T.stage))
data_full$T.stage <- with(data_full, 
                          factor(T.stage, 
                          levels = c("TX", "T0", "Tis", "T1", "T2", "T3", "T4")))

### Create a new variable - T.stage.2
data_full$T.stage.2 <- stri_replace_all_regex(
  data_full$T.stage,
  pattern = c("TX", "T0|Tis|T1|T2", "T3|T4"),
  replacement = c("TX", "T0/Tis/T1/T2", "T3/T4"),
  vectorize = FALSE) %>%
  factor(levels = c("TX", "T0/Tis/T1/T2", "T3/T4"))


### N.stage
data_full <- data_full %>% 
  mutate(N.stage = case_when(N.stage %in% c("1a", "N1", "N1a", "N1mic") ~ "N1",
                             N.stage %in% c("N2", "N2a", "N2b") ~ "N2",
                             N.stage %in% c("N3", "N3a", "N3b", "N3c") ~ "N3",
                             N.stage %in% c("NA", NA) ~ as.character(NA),
                             N.stage %in% c("Nx") ~ "NX",
                             TRUE ~ N.stage))
data_full$N.stage <- with(data_full, 
                          factor(N.stage, 
                          levels = c("NX", "N0", "N1", "N2", "N3")))
### Create a new variable - N.stage.2
data_full$N.stage.2 <- stri_replace_all_regex(
  data_full$N.stage,
  pattern = c("NX", "N0", "N1|N2|N3"),
  replacement = c("NX", "N0", "N1/N2/N3"),
  vectorize = FALSE) %>%
  factor(levels = c("NX", "N0", "N1/N2/N3"))

### ER
data_full <- data_full %>% 
  mutate(ER = case_when(ER %in% c("NA", "Unknown") ~ as.character(NA),
                        ER %in% c("Positive", "Posituve") ~ "Positive",
                        TRUE ~ ER))
data_full$ER <- with(data_full, 
                     factor(ER, 
                     levels = c("Positive", "Negative")))

### PR
data_full <- data_full %>% 
  mutate(PR = case_when(PR %in% c("NA", "Unknown") ~ as.character(NA),
                        PR %in% c("Positive", "Posituve") ~ "Positive",
                        TRUE ~ PR))
data_full$PR <- with(data_full, 
                     factor(PR, 
                     levels = c("Positive", "Negative")))

### HER2
data_full$HER2 <- with(data_full,
                       ifelse(HER2 %in% c("Equivocal", "NA", "Not done", "Not Done", "Unknown"), NA, HER2)) %>%
  factor(levels = c("Positive", "Negative"))

### Hospital
data_full$Hospital <- factor(data_full$Hospital,
                             levels = c("Queens", "Cornell"),
                             ordered = TRUE)

### Create a tumor subtype variable
data_full <- data_full %>% 
  mutate(Subtype = case_when(
    (ER == "Positive" | PR == "Positive") & HER2 == "Negative" ~ "HR+, HER2-",
    HER2 == "Positive" ~ "HER2+",
    ER == "Negative" & PR == "Negative" & HER2 == "Negative" ~ "Triple negative",
    TRUE ~ as.character(NA))) 
data_full$Subtype <- with(data_full, 
                          factor(Subtype, 
                          levels = c("HR+, HER2-", "HER2+", "Triple negative")))
```


## Goal 2: Check if Asian is coded correctly using the values of the variable RaceEthnicity
```{r}
## Tabulate Asians and RaceEthnicity
data_full$Race.2 <- with(data_full, ifelse(Race != "Asian", "Non-Asian", "Asian"))
with(data_full, table(RaceEthnicity = RaceEthnicity.1, Race = Race.2)) %>%
  knitr::kable()
### Highlight 3 observations - Other|Asian

## Convert a tibble into a data frame
data_full_analyze <- data.frame(data_full)
saveRDS(data_full_analyze, "data_full_analyze_2022Sep30.RDS")
```

## Goal 3: Create a table one to compare patient characteristics between two hospitals

```{r warning=FALSE}
## Table one of all patients
table_one <- tbl_summary(data = data_full %>% 
                           select(c("Age.Dx",
                                    "RaceEthnicity.1", "RaceEthnicity.2", "Race", "Race.2",
                                    "BMI", 
                                    "Insure.type", 
                                    "MMGSD", "MMGO", "MRISD",
                                    "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                    "ER", "PR", "HER2", "Subtype",
                                    "Hospital")), 
                         by = "Hospital",
                         label = list(RaceEthnicity.1 ~ "Race/Ethnicity 1, n (%)",
                                      RaceEthnicity.2 ~ "Race/Ethnicity 2, n (%)",
                                      Race ~ "Race, n (%)",
                                      Race.2 ~ "Race 2, n (%)",
                                      BMI ~ "Body Mass Index",
                                      Insure.type ~ "Type of Insurance, n (%)", 
                                      Age.Dx ~ "Age at Diagnosis",
                                      MMGSD ~ "Mammo Screen-Detected, n (%)",
                                      MMGO ~ "Mammo-Occult, n (%)",
                                      MRISD ~ "MRI Screen-Detected, n (%)",
                                      T.stage ~ "Clinical T Stage, n (%)",
                                      T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                      N.stage ~ "Clinical N Stage, n (%)",
                                      N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                      ER ~ "Estrogen Receptor, n (%)",
                                      PR ~ "Progesterone Receptor, n (%)",
                                      HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                      Subtype ~ "Tumor Subtype, n (%)"),
                         type = all_continuous() ~ "continuous2",
                         statistic = list(all_continuous() ~ c("{median} ({min}, {max})",
                                                               "{mean}+/-{sd}")),
                         missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% # Remove unnecessary footnotes
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 1. Patient characteristics between two hospitals (Cornell vs Queens).") 
table_one
```

## Goal 4: Create a table two to compare Asian patient characteristics between two hospital sites (Queens vs Cornell)

```{r}
## Table two of asian patients
table_two_asian <- tbl_summary(data = data_full %>% 
                                 filter(Race == "Asian") %>% 
                                 select(c("Age.Dx",
                                          "RaceEthnicity.1", "RaceEthnicity.2", "Race",
                                          "BMI", 
                                          "Insure.type", 
                                          "MMGSD", "MMGO", "MRISD",
                                          "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                          "ER", "PR", "HER2", "Subtype",
                                          "Hospital")), 
                               by = "Hospital",
                               label = list(
                                 RaceEthnicity.1 ~ "Race/Ethnicity 1, n (%)",
                                 RaceEthnicity.2 ~ "Race/Ethnicity 2, n (%)",
                                 Race ~ "Race, n (%)",
                                 BMI ~ "Body Mass Index",
                                 Insure.type ~ "Type of Insurance, n (%)", 
                                 Age.Dx ~ "Age at Diagnosis",
                                 MMGSD ~ "Mammo Screen-Detected, n (%)",
                                 MMGO ~ "Mammo-Occult, n (%)",
                                 MRISD ~ "MRI Screen-Detected, n (%)",
                                 T.stage ~ "Clinical T Stage, n (%)",
                                 T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                 N.stage ~ "Clinical N Stage, n (%)",
                                 N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                 ER ~ "Estrogen Receptor, n (%)",
                                 PR ~ "Progesterone Receptor, n (%)",
                                 HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                 Subtype ~ "Tumor Subtype, n (%)"),
                               type = all_continuous() ~ "continuous2",
                               statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                               missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 2. Asian patient characteristics between two hospitals (Cornell vs Queens).") 
table_two_asian
```

## Goal 5: Create a table three to compare Chinese patient characteristics between two hospital sites (Queens vs Cornell)
```{r}
## Table three of chinese patients
table_three_chinese <- tbl_summary(data = data_full %>% 
                                    filter(RaceEthnicity.1 == "Chinese") %>% 
                                    select(c("Age.Dx",
                                             "BMI", 
                                             "Insure.type", 
                                             "MMGSD", "MMGO", "MRISD",
                                             "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                             "ER", "PR", "HER2", "Subtype",
                                             "Hospital")), 
                                   by = "Hospital",
                                   label = list(
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})",
                                                               "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 3. Chinese patient characteristics between two hospitals (Cornell vs Queens).") 
table_three_chinese
```

## Goal 6: Create a table four to compare patient characteristics across race and hospital site (Queens Asian vs Queens Non-Asian vs Cornell Asian vs Cornell Non-Asian)

```{r}
## Create a new variable to record the combined category of Race.2 and Hospital
# data_full <- data_full %>%
#   mutate(RaceSite.1 = case_when(
#     Race.2 == "Asian" & Hospital == "Queens" ~ "Asian Queens",
#     Race.2 == "Asian" & Hospital == "Cornell" ~ "Asian Cornell",
#     Race.2 == "Non-Asian" & Hospital == "Queens" ~ "Non-Asian Queens",
#     Race.2 == "Non-Asian" & Hospital == "Cornell" ~ "Non-Asian Cornell"))

## Another fast way
data_full$RaceSite.1 <- with(data_full, paste0(Hospital, ":", Race.2))

table(data_full$RaceSite.1, useNA = "always")
data_full$RaceSite.1 <- gsub("Cornell:NA", NA, data_full$RaceSite.1) %>%
  factor(levels = c("Queens:Asian", "Queens:Non-Asian", "Cornell:Asian", "Cornell:Non-Asian"),
         ordered = TRUE)

## Table four
table_four <- tbl_summary(data = data_full %>% 
                            select(c("Age.Dx",
                                     "RaceEthnicity.2",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "RaceSite.1")), 
                                   by = "RaceSite.1",
                                   label = list(
                                     RaceEthnicity.2 ~ "Race/Ethnicity 2, n (%)",
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 4. Patient characteristics across race and hospital site (Queens Asian vs Queens Non-Asian vs Cornell Asian vs Cornell Non-Asian).") 
table_four
```


## Goal 7: Create a table five to compare patient characteristics across race and hospital site (Queens East-Asian vs Queens Non-East-Asian vs Cornell East-Asian vs Cornell Non-East-Asian)

```{r}
## Create a new variable to record the combined category of RaceEthnicity.2 and Hospital
# data_full <- data_full %>%
#   mutate(Race.3 = ifelse(RaceEthnicity.2 == "East Asian", "East-Asian", "Non-East-Asian")) %>%
#   mutate(RaceSite.2 = case_when(
#     Race.3 == "East-Asian" & Hospital == "Queens" ~ "East-Asian Queens",
#     Race.3 == "East-Asian" & Hospital == "Cornell" ~ "East-Asian Cornell",
#     Race.3 == "Non-East-Asian" & Hospital == "Queens" ~ "Non-East-Asian Queens",
#     Race.3 == "Non-East-Asian" & Hospital == "Cornell" ~ "Non-East-Asian Cornell"))

## Another fast way
data_full$Race.3 <- with(data_full, ifelse(RaceEthnicity.2 == "East Asian", "East-Asian", "Non-East-Asian"))
### Fix three observations
data_full$Race.3[which(data_full$StudyID %in% c("WCM735", "WCM737", "WCM738"))] <- "Non-East-Asian" ## Race.2 of these patients are Non-Asian
data_full$RaceSite.2 <- with(data_full, paste0(Hospital, ":", Race.3)) 
table(data_full$RaceSite.2, useNA = "always")
data_full$RaceSite.2 <- gsub("Cornell:NA", NA, data_full$RaceSite.2) %>%
    factor(levels = c("Queens:East-Asian", "Queens:Non-East-Asian", "Cornell:East-Asian", "Cornell:Non-East-Asian"),
         ordered = TRUE)

## Table five
table_five <- tbl_summary(data = data_full %>% 
                            select(c("Age.Dx",
                                     "RaceEthnicity.1",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "RaceSite.2")), 
                                   by = "RaceSite.2",
                                   label = list(
                                     RaceEthnicity.1 ~ "Race/Ethnicity 1, n (%)",
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 5. Patient characteristics across race and hospital site (Queens East-Asian vs Queens Non-East-Asian vs Cornell East-Asian vs Cornell Non-East-Asian).") 
table_five
```

## Goal 8: Compare the node-negative breast cancers between non-Chinese patients at Cornell and Chinese patients at Cornell

```{r}
# node_neg_cornell <- matrix(c(112, 125-112, 126, 154-126), ncol = 2, byrow = TRUE)
# colnames(node_neg_cornell) <- c("N0 = Yes", "N0 = No")
# rownames(node_neg_cornell) <- c("Non-Chinese patients at Cornell","Chinese patients at Cornell")
# node_neg_cornell %>%
#   knitr::kable()
# # chisq.test(node_neg_cornell)$p.value 0.09779; write to the table

## Create a new variable to record the combined category of RaceEthnicity.2 and Hospital
data_full$Race.4 <- with(data_full, ifelse(RaceEthnicity.1 != "Chinese", "Non-Chinese", RaceEthnicity.1))
data_full$RaceSite.3 <- with(data_full, paste0(Hospital, ":", Race.4)) 
data_full$RaceSite.3 <- gsub("Cornell:NA", NA, data_full$RaceSite.3) %>%
    factor(levels = c("Queens:Chinese", "Queens:Non-Chinese", "Cornell:Chinese", "Cornell:Non-Chinese"),
         ordered = TRUE)

easypackages::libraries("BTKR", "multcomp")
## Create two categorical variables
data_full$CornelC <- with(data_full, 
                          case_when(RaceSite.3 == "Cornell:Chinese" ~ "Chinese patients at Cornell", 
                                    RaceSite.3 == "Cornell:Non-Chinese" ~ "Non-Chinese patients at Cornell",
                                    TRUE ~ as.character(NA))) %>%
  factor()
data_full$N.stage.3 <- with(data_full, ifelse(N.stage.2 == "N0", "N0 = Yes", "N0 = No")) %>%
  factor(levels = c("N0 = Yes", "N0 = No"))

out.CornelC <- fsmry2.by.grp(y = data_full %>% filter(!is.na(CornelC)) %>% dplyr::select("CornelC"),
                             grp = data_full %>% filter(!is.na(CornelC)) %>% dplyr::select("N.stage.3"), 
                             cmp.method = "chisq")
```

```{r, results="asis"}
knitr::kable(out.CornelC)
```

## Goal 9: Compare the node-negative breast cancers between between non-Chinese patients at Queens and Chinese patients at Queens

```{r}
# node_neg_queens <- matrix(c(137, 161-137, 221, 242-221), ncol = 2, byrow = TRUE)
# colnames(node_neg_queens) <- c("N0 = Yes", "N0 = No")
# rownames(node_neg_queens) <- c("Non-Chinese patients at Queens","Chinese patients at Queens")
# node_neg_queens %>%
#   knitr::kable()
# # chisq.test(node_neg_queens)$p.value # 0.07455; write to the table

## Create one categorical variable
data_full$QueenC <- with(data_full, 
                         case_when(RaceSite.3 == "Queens:Chinese" ~ "Chinese patients at Queens", 
                                   RaceSite.3 == "Queens:Non-Chinese" ~ "Non-Chinese patients at Queens",
                                   TRUE ~ as.character(NA))) %>%
  factor()

out.QueenC <- fsmry2.by.grp(y = data_full %>% filter(!is.na(QueenC)) %>% dplyr::select("QueenC"),
              grp = data_full %>% filter(!is.na(QueenC)) %>% dplyr::select("N.stage.3"), 
              cmp.method = "chisq")

# Using queen.index to subset the observations is not useful for fsmry2.by.grp
# queen.index <- which(data_full$RaceSite.3 %in% c("Queens:Chinese", "Queens:Non-Chinese"))
# fsmry2.by.grp(y = data_full$RaceSite.3[queen.index],
#               grp = data_full$N.stage.3[queen.index],
#               cmp.method = "chisq")
# 
#                       n     N0...Yes     N0...No p.value
# Queens:Chinese      241 221 (28.48%) 20 (22.22%)        
# Queens:Non-Chinese  625 555 (71.52%) 70 (77.78%)        
# Cornell:Chinese       0       0 (0%)      0 (0%)        
# Cornell:Non-Chinese   0       0 (0%)      0 (0%)    <NA>
```

```{r, results="asis"}
knitr::kable(out.QueenC)
```

## Goal 10: Create a table six to compare patient characteristics across race and hospital site (Queens Chinese vs Queens Non-Chinese vs Cornell Chinese vs Cornell Non-Chinese)
```{r}
## Create a new variable to record the combined category of RaceEthnicity.2 and Hospital
# data_full$Race.4 <- with(data_full, ifelse(RaceEthnicity.1 != "Chinese", "Non-Chinese", RaceEthnicity.1))
# data_full$RaceSite.3 <- with(data_full, paste0(Hospital, ":", Race.4)) 
# data_full$RaceSite.3 <- gsub("Cornell:NA", NA, data_full$RaceSite.3) %>%
#     factor(levels = c("Queens:Chinese", "Queens:Non-Chinese", "Cornell:Chinese", "Cornell:Non-Chinese"),
#          ordered = TRUE)
# table(data_full$RaceSite.3, useNA = "always")

## Table six
table_six <- tbl_summary(data = data_full %>% 
                           dplyr::select(c("Age.Dx",
                                       "RaceEthnicity.2",
                                       "BMI", 
                                       "Insure.type", 
                                       "MMGSD", "MMGO", "MRISD",
                                       "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                       "ER", "PR", "HER2", "Subtype",
                                       "RaceSite.3")), 
                           by = "RaceSite.3",
                           label = list(
                                     RaceEthnicity.2 ~ "Race/Ethnicity 2, n (%)",
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 6. Patient characteristics across race and hospital site (Queens Chinese vs Queens Non-Chinese vs Cornell Chinese vs Cornell Non-Chinese).") 
table_six
```


## Goal 11: Create a table seven to compare Non-Chinese Asian patient characteristics between two hospital sites (Queens vs Cornell)

```{r}
## Table seven
table_seven <- tbl_summary(data = data_full %>% 
                              filter(RaceEthnicity.1 != "Chinese" & Race == "Asian") %>%
                              dplyr::select(c("Age.Dx",
                                     "RaceEthnicity.1",
                                     "RaceEthnicity.2",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "Hospital")), 
                                   by = "Hospital",
                                   label = list(
                                     RaceEthnicity.1 ~ "Race/Ethnicity 1, n (%)",
                                     RaceEthnicity.2 ~ "Race/Ethnicity 2, n (%)",
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 7. Non-Chinese Asian patient characteristics between two hospitals (Cornell vs Queens).") 
table_seven
```


## Goal 12: Create a table eight to compare patient characteristics between Chinese and Non-Chinese
```{r}
## Table eight
table_eight <- tbl_summary(data = data_full %>%
                              dplyr::select(c("Age.Dx",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "Race.4")), 
                                   by = "Race.4",
                                   label = list(
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 8. Patient characteristics between Chinese and Non-Chinese.") 
table_eight
```

## Goal 13: Create a table nine to compare East-Asian patient characteristics between the two sites.
```{r}
table_nine <- tbl_summary(data = data_full %>%
                             filter(RaceEthnicity.2 == "East Asian") %>%
                              dplyr::select(c("Age.Dx",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "Hospital")), 
                                   by = "Hospital",
                                   label = list(
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 9. East-Asian patient characteristics between Queens and Cornell.") 
table_nine
```


## Goal 14: Create a table ten to compare Chinese and Non-Chinese patient characteristics at Cornell.

```{r}
table_ten <- tbl_summary(data = data_full %>%
                           filter(Hospital == "Cornell") %>%
                           dplyr::select(c("Age.Dx",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "Race.4")), 
                                   by = "Race.4",
                                   label = list(
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 10. Chinese and Non-Chinese patient characteristics at Cornell.") 
table_ten
```


## Goal 15: Create a table eleven to compare Chinese and Non-Chinese patient characteristics at Queens.

```{r}
table_eleven <- tbl_summary(data = data_full %>%
                           filter(Hospital == "Queens") %>%
                           dplyr::select(c("Age.Dx",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "Race.4")), 
                                   by = "Race.4",
                                   label = list(
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 11. Chinese and Non-Chinese patient characteristics at Queens.") 
table_eleven
```

## Goal 16: Create a table twelve to compare Non-Chinese patient characteristics between two hospital sites (Queens vs Cornell).

```{r}
table_twelve <- tbl_summary(data = data_full %>%
                           filter(Race.4 == "Non-Chinese") %>%
                           dplyr::select(c("Age.Dx",
                                     "BMI", 
                                     "Insure.type", 
                                     "MMGSD", "MMGO", "MRISD",
                                     "T.stage", "T.stage.2", "N.stage", "N.stage.2",
                                     "ER", "PR", "HER2", "Subtype",
                                     "Hospital")), 
                                   by = "Hospital",
                                   label = list(
                                     BMI ~ "Body Mass Index",
                                     Insure.type ~ "Type of Insurance, n (%)", 
                                     Age.Dx ~ "Age at Diagnosis",
                                     MMGSD ~ "Mammo Screen-Detected, n (%)",
                                     MMGO ~ "Mammo-Occult, n (%)",
                                     MRISD ~ "MRI Screen-Detected, n (%)",
                                     T.stage ~ "Clinical T Stage, n (%)",
                                     T.stage.2 ~ "Clinical T Stage 2, n (%)",
                                     N.stage ~ "Clinical N Stage, n (%)",
                                     N.stage.2 ~ "Clinical N Stage 2, n (%)",
                                     ER ~ "Estrogen Receptor, n (%)",
                                     PR ~ "Progesterone Receptor, n (%)",
                                     HER2 ~ "Human Epidermal Growth Factor Receptor 2, n (%)",
                                     Subtype ~ "Tumor Subtype, n (%)"),
                                   type = all_continuous() ~ "continuous2",
                                   statistic = list(all_continuous() ~ c("{median} ({min}, {max})", "{mean}+/-{sd}")),
                                   missing_text = "Missing, n") %>%
   modify_table_body(
    dplyr::mutate,
    label = case_when(label == "Median (Range)" ~ "Median (range)",
                      label == "Mean+/-SD" ~ "Mean+/-sd",
                      TRUE ~ label)) %>%
  modify_footnote(update = everything() ~ NA) %>% 
  bold_labels() %>%
  add_p(pvalue_fun = function(x) style_pvalue(x, digits = 3),
        test.args = all_tests("fisher.test") ~ list(simulate.p.value = TRUE, B = 5000)) %>%
  as_flex_table() %>%
  bold(bold = TRUE, part = "header") %>%
  set_caption(caption = "Table 12. Non-Chinese patient characteristics between two hospital sites (Queens vs Cornell).") 
table_twelve
```