Skip to content

Fix wrong p value ggcuminc#261

Closed
samrickman wants to merge 3 commits intopharmaverse:mainfrom
samrickman:fix-wrong-p-value-ggcuminc
Closed

Fix wrong p value ggcuminc#261
samrickman wants to merge 3 commits intopharmaverse:mainfrom
samrickman:fix-wrong-p-value-ggcuminc

Conversation

@samrickman
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

  1. Fix issue in ggsurvfit() where it shows same p-value for both events.
  2. Add test to ensure that p-values created by ggsurvfit() are different (using tidycmprsk::trial data, where they should be).

I still get five "failed" tests where vdiffr::expect_doppelganger() has different results. This happened last time I pushed to this repo and all was fine. I think it's related to my system and how it draws plots (they look fine to me), so I've marked all tests as passed and I'll see what happens in the CI pipeline.

If there is an GitHub issue associated with this pull request, please provide link.

#239 (comment)


Reviewer Checklist (if item does not apply, mark as complete)

  • Ensure all package dependencies are installed by running renv::install()
  • PR branch has pulled the most recent updates from master branch. Ensure the pull request branch and your local version match and both have the latest updates from the master branch.
  • If a new function was added, function included in _pkgdown.yml
  • If a bug was fixed, a unit test was added for the bug check
  • Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • Overall code coverage remains >99.5%. Review coverage with withr::with_envvar(list(CI = TRUE), code = devtools::test_coverage()). Begin in a fresh R session without any packages loaded.
  • R CMD Check runs without errors, warnings, and notes
  • usethis::use_spell_check() runs with no spelling errors in documentation

When the branch is ready to be merged into master:

  • Update NEWS.md with the changes from this pull request under the heading "# ggsurvfit (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • Increment the version number using usethis::use_version(which = "dev")
  • Run usethis::use_spell_check() again
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge".

@ShreyaSreeram27
Copy link
Copy Markdown
Collaborator

thanks for submitting this PR and looking into the p-value issue @samrickman ! just looking through it on my end as well

Comment thread R/add_pvalue.R
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @samrickman

I tried running a few examples and tests with your proposed fix, and I believe the current fix works for the first outcome but completely fails for the 2nd and 3rd outcomes in a 3+ competing events scenario - add_pvalue() gets ignored entirely for these cases. Please let me know if I'm mistaken!

I used this example as a starting point for 3+ competing events:

# Reproducible example showing p-value bug with 3+ competing events
library(ggsurvfit)
#> Loading required package: ggplot2
library(tidycmprsk)
library(dplyr)

# Create test data with 3 competing events
set.seed(123)
dat <- tidycmprsk::trial %>%
  mutate(
    death_cr_3way = dplyr::case_when(
      death_cr == "censor" ~ "censor",
      death_cr == "death other causes" ~ "death other causes", 
      death_cr == "death from cancer" & dplyr::row_number() %% 2 == 1 ~ "cancer type A",
      death_cr == "death from cancer" & dplyr::row_number() %% 2 == 0 ~ "cancer type B",
      TRUE ~ "censor"
    ),
    death_cr_3way = factor(death_cr_3way, 
                          levels = c("censor", "cancer type A", "cancer type B", "death other causes"))
  )

# Fit competing risks model
cif <- tidycmprsk::cuminc(Surv(ttdeath, death_cr_3way) ~ trt, data = dat)

# Check what p-values should be shown
tidycmprsk::glance(cif)
#> # A tibble: 1 × 12
#>   outcome_1    statistic_1  df_1 p.value_1 outcome_2 statistic_2  df_2 p.value_2
#>   <chr>              <dbl> <dbl>     <dbl> <chr>           <dbl> <dbl>     <dbl>
#> 1 cancer type…       0.988     1     0.320 cancer t…       0.504     1     0.478

# Create plots for each event
plot1 <- cif %>% ggcuminc(outcome = "cancer type A") + add_pvalue()
plot2 <- cif %>% ggcuminc(outcome = "cancer type B") + add_pvalue() 
#> ! `add_pvalue()` works with objects created with `survfit2()` or `tidycmprsk::cuminc()`.
#> ℹ `add_pvalue()` has been ignored.
plot3 <- cif %>% ggcuminc(outcome = "death other causes") + add_pvalue()
#> ! `add_pvalue()` works with objects created with `survfit2()` or `tidycmprsk::cuminc()`.
#> ℹ `add_pvalue()` has been ignored.

print(paste("Plot 1:", plot1$labels$caption))
#> [1] "Plot 1: p=0.3"
print(paste("Plot 2:", plot2$labels$caption))
#> [1] "Plot 2: "
print(paste("Plot 3:", plot3$labels$caption))
#> [1] "Plot 3: "

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShreyaSreeram27 I'm not going to have a chance to look at this for a while but I'm sure you're correct. I thought that ggcuminc() could only handle competing risks models (i.e. two states excluding censoring: death, death from another cause). If it can handle models with multiple states then what I wrote will not work. The only thing I'm confused about is that tidycmprsk::glance(cif) also seems to be assuming that it's a two state model (i.e. there are only two outcomes and two p-values). I'm pretty sure that tidycmprsk is just a nice API for cmprsk which implements Fine & Gray models. As far as I'm aware, these can only have two states (excluding censoring). So is it supposed to work with more than this? If it is then it's back to the drawing board for this PR I'm afraid.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both! @samrickman @ShreyaSreeram27 !

@ShreyaSreeram27 can you confirm whether cmprsk supports 2+ competing events?

  • if it does, can you then confirm tidycmprsk (and the pvalue calcualtion therein) also supports 2+ competing events
  • if it doesn't, we'll need to update the messaging to users when this is passed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ddsjoberg i have confirmed that both cmprsk and tidycmprsk support 2+ competing events. I believe the following code example helps us see this!

library(tidycmprsk)
library(dplyr)

#create test data with 3 competing events
set.seed(123)
n <- 200
test_data <- data.frame(
  time = rexp(n, rate = 0.1),
  status = sample(c("censor", "event1", "event2", "event3"), 
                  n, replace = TRUE, 
                  prob = c(0.4, 0.2, 0.2, 0.2)),
  group = sample(c("A", "B"), n, replace = TRUE)
)


test_data$status <- factor(test_data$status, 
                          levels = c("censor", "event1", "event2", "event3"))

print("Test data structure:")
print(table(test_data$status))
print(table(test_data$group, test_data$status))

# tidycmprsk with 3 events
print("\n===tidycmprsk::cuminc with 3 competing events ===")
tryCatch({
  cif_3events <- tidycmprsk::cuminc(Surv(time, status) ~ group, data = test_data)
  print("SUCCESS")
  print(cif_3events)
}, error = function(e) {
  print("FAILED")
  print(paste("Error:", e$message))
})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShreyaSreeram27 OK back to stats class for me then! Clearly this PR shouldn't be merged. If I get the chance I will have a think about how to generalise this to >2 events.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @ShreyaSreeram27 and @samrickman !

@ShreyaSreeram27 perhaps a first step to support is better messaging, e.g. can we check if more than one outcome is being presented or the single outcome being presented is not the first? If either of those conditions, print a warning to the users?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @samrickman , happy to collaborate on this feature as well!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddsjoberg yes i think that's a good way to go about this! ill work on improving the messaging

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddsjoberg i was thinking about the following potential scenarios from a user's perspective:

Scenario 1: Multi-outcome model, plotting 1st outcome
and somethihg like: ! add_pvalue() has limited support for competing risks models with multiple outcomes.
Your model has 3 competing events: cancer type A, cancer type B, death other causes
You are plotting outcome 'cancer type A' (position #1)
P-values may not be correctly calculated for models with multiple competing events.

✓ Using p-value for 'cancer type A' (the first outcome in your 3-event competing risks model).

Scenario 2: Multi-outcome model, plotting 2nd+ outcome

! add_pvalue() has limited support for competing risks models with multiple outcomes.
Your model has 3 competing events: cancer type A, cancer type B, death other causes
You are plotting outcome 'cancer type B' (position #2)
Currently, p-values are only reliable for the first outcome in competing risks models.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model can have any number of competing events (or outcomes), but we cannot present multiple p-values.

Something like this should work:

all_competing_events # this is a character vector of all the competing events from the model
plot_competing_events # character vector of the competing events in the figure

if (length(plot_competing_events) > 1L || plot_competing_events != all_competing_events[1]) {
  cli::cli_warn(
      c("{.fun add_pvalue} supports reporting a single competing event display for the {.emph first} competing event {.val {plot_competing_events[1]}}.")
   )
}

@ddsjoberg
Copy link
Copy Markdown
Collaborator

Thank you @samrickman for the PR and making the competing events comparisons more clear!

I think we can close this PR for now. The current functionality only allows us to present the pvalue from figures that show a single competing event (regardless of whether is the 1st, 2nd, or n-th competing event). But we do not report multiple pvalues with add_pvalue(). When it's attempted, users will see this message.

image

@ddsjoberg ddsjoberg closed this Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants