Skip to content

Reorganize Tests against R #906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

shapiromh
Copy link

@shapiromh shapiromh commented May 18, 2025

This is a PR related to #828.

Summary of Code Changes:

  • R and conda-forge available R packages ("core") are now installed as part of the dev environment. Other R packages ("extended") are still installed via the "r_test_requirements.R" method and pixi task. The doc environment was also updated to install R via conda so users (I think) do not need their own out-of-project R installation.
  • All tests that call R packages (even if the package is only called on a path that cannot be taken) are labeled as part of "against_r_core" or "against_r_extended.
  • Added a small utility to check for installed R packages. Tests that rely on extended R packages will be skipped with an informative message if the related package is not installed. This is similar to how the visualization tests are handled if dependencies are not available.
  • Updated the contribution docs to reflect the changes above.
  • [!] Updated the github actions, but I'm not sure if that was done correctly.

Some Notes:

  1. I am not sure if the doc environment also relies on "extended R packages". If so the updates may break that code. When installing the extra R packages, I am fairly sure they will only be available in the dev and not doc environment.
  2. I think utils/set_rpy2_path.py could be deprecated. The updates actually remove all references to this code, but then there is no built in way to extend the check R module libraries beyond the project install.
  3. I had a very tough time installing "ivDiag" as it has a dependency on "lfe" that has a configuration not updated for modern Macs. The configuration can be adjusted, but this requires a lot of system-specific configuration.
  4. I found it was almost feasible to have a completely self-contained R environment for the user without dependencies outside of the project. The issue is some of the extended R packages that rely on gcc or cmake. Cmake can also be installed via conda-forge. Gcc can as well, but requires pulling different versions for different systems (this seems doable in pixi, but I can only test the Mac version). Finally, there is still the issue of installing ivDiag.

@s3alfisc s3alfisc self-requested a review May 18, 2025 19:05
@s3alfisc
Copy link
Member

pre-commit.ci autofix

@s3alfisc
Copy link
Member

Hi @shapiromh, thanks so much for this! Looks good at first sight but I have to / will spend some more time on this tomorrow morning.

Re your comments:

I am not sure if the doc environment also relies on "extended R packages". If so the updates may break that code. When installing the extra R packages, I am fairly sure they will only be available in the dev and not doc environment.

Yes, some of the docs do - the "fixest vs pyfixest" vignette depends on the R core dependencies. So we'd have to add all of these to the docs deps.

I think utils/set_rpy2_path.py could be deprecated. The updates actually remove all references to this code, but then there is no built in way to extend the check R module libraries beyond the project install.

Yeah, I think it was an util I set up because R deps needed to be installed in the global R env and this should ensure that python could talk to the global R env? Though not 100% sure anymore. Let's see if we can make things run without (I would think so) and then I'd be happy to delete the function from the code base.

I had a very tough time installing "ivDiag" as it has a dependency on "lfe" that has a configuration not updated for modern Macs. The configuration can be adjusted, but this requires a lot of system-specific configuration.

Hm, I think for the ivDiag tests, could we simply run them once, store results in a csv file, and then drop it as a dependency? I think this would be the general strategy for all other non-conda packages as well - we could provide a script that calls R and produces a csv with "R results", which we store and test against. This way, all results would be reproducible (though not perfectly) and users / testers would not have to install ivDiag, fwildclusterboot, wildrwolf, ritest etc? This should then also solve the issue you mention in your last point?

I found it was almost feasible to have a completely self-contained R environment for the user without dependencies outside of the project. The issue is some of the extended R packages that rely on gcc or cmake. Cmake can also be installed via conda-forge. Gcc can as well, but requires pulling different versions for different systems (this seems doable in pixi, but I can only test the Mac version). Finally, there is still the issue of installing ivDiag.

@shapiromh
Copy link
Author

Yes, some of the docs do - the "fixest vs pyfixest" vignette depends on the R core dependencies. So we'd have to add all of these to the docs deps.

Sorry, I was asking if non core R dependencies (those not in conda) are used in the docs. I did up the docs environment to install R in the toml, but I probably messed up the github actions if any changes around that were needed.

Hm, I think for the ivDiag tests, could we simply run them once, store results in a csv file, and then drop it as a dependency? I think this would be the general strategy for all other non-conda packages as well - we could provide a script that calls R and produces a csv with "R results", which we store and test against. This way, all results would be reproducible (though not perfectly) and users / testers would not have to install ivDiag, fwildclusterboot, wildrwolf, ritest etc? This should then also solve the issue you mention in your last point?

ivDiag is the only one that caused me problems, but generally this strategy makes sense to me. I saw this was the approach already taken with some of the tests (and why some of the test codes never went down paths that called the R packages). I would think the only argument against is if any of these non-conda packages are under active development with known bugs , but then you or other core maintainers should probably hold the single source of truth on what other contributors should be matching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants