Skip to content

Adding confidence interval option to stat_cor #418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gumeo
Copy link

@gumeo gumeo commented Jun 20, 2021

This is a small PR to add an option to visualize 95% confidence interval instead of p-value with the stat_cor geom.

Here is a minimal example:

library(ggpubr)

set.seed(123)
n <- 1000
x_vec <- runif(n)
y_vec <- 3 + 1.3*x_vec + rnorm(n)
df <- data.frame(x= x_vec,y=y_vec)

ggplot(df, aes(x,y))+geom_point()+stat_cor(conf.int=TRUE, p.accuracy = 0.00001)

And the resulting plot:
image

I think this is a useful addition to stat_cor. There are possibly some things that can be improved, and I have only tested this on a simple example.

@gumeo
Copy link
Author

gumeo commented Jun 22, 2021

@kassambara are you merging or reviewing PRs?

@aphalo
Copy link

aphalo commented Jul 21, 2021

@gumeo Quite much of the code in 'ggpubr' has been copied from other packages, sometimes without acknowledgement, including from my own 'ggpmisc'. At the moment there are 118 open issues for 'ggpubr', and very little activity from the owner of the repo.

@gumeo
Copy link
Author

gumeo commented Jul 21, 2021

@gumeo Quite much of the code in 'ggpubr' has been copied from other packages, sometimes without acknowledgement, including from my own 'ggpmisc'. At the moment there are 118 open issues for 'ggpubr', and very little activity from the owner of the repo.

@aphalo I can see that is the case. It is a shame that code is copied without acknowledgement. I think ggpubr is a good effort in many ways, maybe it is time to create a new actively maintained branch.

@aphalo
Copy link

aphalo commented Jul 21, 2021

@gumeo The most recent commit in this repo was nine months ago, but the owner has been active in GitHub about one month ago in another repo.

Lack of acknowledgement is annoying but I haven't traced the commit, so it could have been a pull request by someone else than the owner of this repo. The problem I see with the copied code in 'ggpubr' is its maintenance. At least at first sight it seems difficult to maintain and keep in sync with the original sources code copied from many different packages and written by different people. It is also a duplication of effort that I think is wasteful. It is more efficient to import from the original packages and if necessary write wrapper functions to change the user interface, rather than copying almost unchanged code and reexport it with a new name. I noticed recently this is happening with stat_poly_eq() from my package 'ggpmisc. It was copied some time ago and renamed stat_regline_equation() and added to 'ggpubr' almost unchanged. There are now at this repo, issues and pull requests for problems and requests for enhancements that I have implemented in 'ggpmisc' time ago.

In 'ggpmisc' I have used a different approach than used for 'ggpubr': to stick to the Grammar of Graphics (to be consistent with 'ggplot2') and to avoid repackaging other extensions to 'ggplot2'. I have only copied, with acknowledgement, a few lines of code from private utility functions of 'ggplot2' and adapted some code to create new functionality. I maintain 13 R packages in CRAN, including 'ggpmisc', and it takes consistency in coding, many good test cases, and good documentation to keep the task manageable in my spare time.

Of course, by now, 'ggpubr' has many reverse dependencies, so it needs to be kept alive. If a new branch is created, I think the way to go would be to try to replace as much as possible of the copied source code with imports and to get the maintainers of the imported-from packages involved when enhancements are needed. As you say, 'ggpubr' is a useful effort, but I think it is more of a collection of useful bits from other packages with some glue added than an original piece of software.

Change of maintainer for CRAN packages needs to be approved by the original maintainer, unless the package has been already declared orphaned and removed from CRAN because of unresolved failures to pass checks. At the moment 'ggpubr' cleanly passes all CRAN checks in spite of all the open issues.

@gumeo
Copy link
Author

gumeo commented Jul 21, 2021

@aphalo I wholeheartedly agree. R has a great packaging system, and should be used as such. ggpubr has made it clear to me, that there is considerable interest in having an alternative interface for creating plots. I.e. more wide (lots of different types of plots), so people can easier get to the desired results faster, compared to starting from a base ggplot.

I think that the interface of ggpubr is the sweet spot for many users, that are not visualization experts. Helping people getting work done faster is enough reason to keep this project alive imo, (or create something similar...).

I just looked at the reverse deps for reference at this time:

image

I'm not sure how many of these projects are still active, but this is further argument for the need of a tool like this. It is a long time since I last published an R package, maybe I need to get back into this.

@aphalo
Copy link

aphalo commented Jul 21, 2021

I would suspect quite many of them are active. 'ggpubr' is indeed very popular with nearly 200 000 downloads per month. see https://www.r-pkg.org/pkg/ggpubr

It would be interesting to find out which functions are most frequently imported from 'ggpubr' by these other packages.

Recent versions of 'ggplot2' define generics for ggplot(), autoplot() and autolayer(), and these give quite a lot of room for writing extensions with a consistent user interface. For example I use autoplot() and ggplot() specializations in 'ggspectra'.

@karl-an
Copy link

karl-an commented Jul 20, 2022

@aphalo Is there a way to add confidence intervals like in the above example in 'ggpmisc'?

@aphalo
Copy link

aphalo commented Jul 20, 2022

@karl-an Not in the version of 'ggpmisc' now in CRAN, but I can add this feature to stat_correlation() easily, at least for method="pearson". The edits will be soon in GitHub. I am already testing the new code. Thanks for the useful suggestion!

@aphalo
Copy link

aphalo commented Jul 20, 2022

@karl-an I just pushed to GitHub the updated code. I also added a couple of examples to the help page of stat_correlation(). Before release I may tweak the formatting of the new conf.int.label returned. Numeric values are also now returned as conf.int.low and conf.int.hight. Please, let me know if it works as you expect and any suggestions for imrpovements.

The most recent 'ggpmisc' version (under development) can be installed from GitHub using package 'remotes'.

remotes::install_github("aphalo/ggpmisc")

Package 'remotes' can be installed from CRAN.

@karl-an
Copy link

karl-an commented Jul 22, 2022

works like a charm, thanks a lot! calling it "95% CI" seems to be a little more standard, but it's already very usable in its current form.

@aphalo
Copy link

aphalo commented Jul 22, 2022

The numeric values are returned as conf.int.high and conf.int.low, so it is possible to assemble a different label within the aes() call. Currently the 0.95 is coming from conf.level which is also returned, so it will also work if users pass conf.int = 0.99, etc. (not yet tested...)

@aphalo
Copy link

aphalo commented Aug 3, 2022

@karl-an I made some additional progress with the implementation of CIs for correlation in 'ggpmisc'. Now CIs can also be computed by bootstrapping (using a function imported from package 'confintr'), so they are now available for all three methods: pearson, kendall and spearman. The default formatting of the labels is now according to APA and APS recomendations, which agrees with your wishes except for using square brackets. A new options lets you substitute the square brackets by any others of your choice. I pushed the last of these commits to GitHub last night.

@aphalo
Copy link

aphalo commented Aug 24, 2022

'ggpmisc' 0.5.0 is now in CRAN, including support for CIs.

@karl-an
Copy link

karl-an commented Aug 24, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants