Skip to content

Reorganize Tests against R #906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 20, 2025
Merged

Conversation

shapiromh
Copy link
Contributor

@shapiromh shapiromh commented May 18, 2025

This is a PR related to #828.

Summary of Code Changes:

  • R and conda-forge available R packages ("core") are now installed as part of the dev environment. Other R packages ("extended") are still installed via the "r_test_requirements.R" method and pixi task. The doc environment was also updated to install R via conda so users (I think) do not need their own out-of-project R installation.
  • All tests that call R packages (even if the package is only called on a path that cannot be taken) are labeled as part of "against_r_core" or "against_r_extended.
  • Added a small utility to check for installed R packages. Tests that rely on extended R packages will be skipped with an informative message if the related package is not installed. This is similar to how the visualization tests are handled if dependencies are not available.
  • Updated the contribution docs to reflect the changes above.
  • [!] Updated the github actions, but I'm not sure if that was done correctly.

Some Notes:

  1. I am not sure if the doc environment also relies on "extended R packages". If so the updates may break that code. When installing the extra R packages, I am fairly sure they will only be available in the dev and not doc environment.
  2. I think utils/set_rpy2_path.py could be deprecated. The updates actually remove all references to this code, but then there is no built in way to extend the check R module libraries beyond the project install.
  3. I had a very tough time installing "ivDiag" as it has a dependency on "lfe" that has a configuration not updated for modern Macs. The configuration can be adjusted, but this requires a lot of system-specific configuration.
  4. I found it was almost feasible to have a completely self-contained R environment for the user without dependencies outside of the project. The issue is some of the extended R packages that rely on gcc or cmake. Cmake can also be installed via conda-forge. Gcc can as well, but requires pulling different versions for different systems (this seems doable in pixi, but I can only test the Mac version). Finally, there is still the issue of installing ivDiag.

@s3alfisc s3alfisc self-requested a review May 18, 2025 19:05
@s3alfisc
Copy link
Member

pre-commit.ci autofix

@s3alfisc
Copy link
Member

Hi @shapiromh, thanks so much for this! Looks good at first sight but I have to / will spend some more time on this tomorrow morning.

Re your comments:

I am not sure if the doc environment also relies on "extended R packages". If so the updates may break that code. When installing the extra R packages, I am fairly sure they will only be available in the dev and not doc environment.

Yes, some of the docs do - the "fixest vs pyfixest" vignette depends on the R core dependencies. So we'd have to add all of these to the docs deps.

I think utils/set_rpy2_path.py could be deprecated. The updates actually remove all references to this code, but then there is no built in way to extend the check R module libraries beyond the project install.

Yeah, I think it was an util I set up because R deps needed to be installed in the global R env and this should ensure that python could talk to the global R env? Though not 100% sure anymore. Let's see if we can make things run without (I would think so) and then I'd be happy to delete the function from the code base.

I had a very tough time installing "ivDiag" as it has a dependency on "lfe" that has a configuration not updated for modern Macs. The configuration can be adjusted, but this requires a lot of system-specific configuration.

Hm, I think for the ivDiag tests, could we simply run them once, store results in a csv file, and then drop it as a dependency? I think this would be the general strategy for all other non-conda packages as well - we could provide a script that calls R and produces a csv with "R results", which we store and test against. This way, all results would be reproducible (though not perfectly) and users / testers would not have to install ivDiag, fwildclusterboot, wildrwolf, ritest etc? This should then also solve the issue you mention in your last point?

I found it was almost feasible to have a completely self-contained R environment for the user without dependencies outside of the project. The issue is some of the extended R packages that rely on gcc or cmake. Cmake can also be installed via conda-forge. Gcc can as well, but requires pulling different versions for different systems (this seems doable in pixi, but I can only test the Mac version). Finally, there is still the issue of installing ivDiag.

@shapiromh
Copy link
Contributor Author

Yes, some of the docs do - the "fixest vs pyfixest" vignette depends on the R core dependencies. So we'd have to add all of these to the docs deps.

Sorry, I was asking if non core R dependencies (those not in conda) are used in the docs. I did up the docs environment to install R in the toml, but I probably messed up the github actions if any changes around that were needed.

Hm, I think for the ivDiag tests, could we simply run them once, store results in a csv file, and then drop it as a dependency? I think this would be the general strategy for all other non-conda packages as well - we could provide a script that calls R and produces a csv with "R results", which we store and test against. This way, all results would be reproducible (though not perfectly) and users / testers would not have to install ivDiag, fwildclusterboot, wildrwolf, ritest etc? This should then also solve the issue you mention in your last point?

ivDiag is the only one that caused me problems, but generally this strategy makes sense to me. I saw this was the approach already taken with some of the tests (and why some of the test codes never went down paths that called the R packages). I would think the only argument against is if any of these non-conda packages are under active development with known bugs , but then you or other core maintainers should probably hold the single source of truth on what other contributors should be matching.

Copy link
Member

@s3alfisc s3alfisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except for two small comments - one data set is not available at runtime in the CI & the car package can be installed from conda directly. Though, maybe it is cleaner to ask users to download it manually as it is not part of the "core" R test dependencies?

@@ -17,6 +16,7 @@ def data(local=False):


# function retrieved from Harvard Dataverse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is clear from the description in the pyproject toml, but the extended tests are generally turned off as they run for way too long (some of them more than 10 minutes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this was pulled because of a warning from pytest that marks on fixtures do nothing. If I didn't move the mark to the functions that may call "data", I should have.

install.packages(
c('fixest', 'broom','clubSandwich', 'did2s', 'wildrwolf', 'reticulate', 'ivDiag', 'stats', 'base', 'car'),
c('did2s', 'reticulate', 'ivDiag', 'car'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be able to install r-car via conda-forge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is my miss. We may as well include this as a "core r" package too then, no?

@@ -33,6 +34,8 @@ def data():
return df_het


@pytest.mark.skipif(import_check is False, reason="R package did2s not installed.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent solution, big fan =)

@@ -355,6 +361,7 @@ def test_fully_interacted(unit, cluster):
)


@pytest.mark.against_r_core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might fail in the CI as I don't think the data is available on github? It's from the did R repo: https://github.com/bcallaway11/did/tree/master/data When I wrote the test, I did not want to think about how to load dta files into pandas, and I did not simply want to copy over the file as I wasn't sure about the license. did uses GPL-2 which I think (?) does not allow reuse under any other license then GPL-2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a check on the data and skip if decorator here as well?

@s3alfisc
Copy link
Member

The CI currently fails because we have to rebuild the pixi lock file after updating the pyproject toml. Will do this, one moment =)

@s3alfisc
Copy link
Member

s3alfisc commented May 19, 2025

Now let's see if the CI runs - I recall that I struggled in the past: I've been installing r and packages from CRAN via r2u, which provides ubuntu binaries of R packages (so no need for manual installation!). But I recall that the conda env and the R installation provided by r2u had problems talking to each other; that's why I eventually settled for the solution to "install all R depcs via r2u and none via conda".

I think that all "r core" tests will run, as all required deps can be found in the conda environment, but would assume that all r-extended tests will not find the dependencies installed into the global env (via r2u) and therefore be skipped.

The solution here would be to ask r2u to install into the conda env, and I recall that I tried and failed 😅

Copy link

codecov bot commented May 19, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyfixest/utils/check_r_install.py 87.50% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (019edf5) and HEAD (86e5867). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (019edf5) HEAD (86e5867)
tests-extended 1 0
Flag Coverage Δ
core-tests 78.47% <87.50%> (-1.88%) ⬇️
tests-extended ?
tests-vs-r 15.41% <87.50%> (-31.69%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
pyfixest/utils/check_r_install.py 87.50% <87.50%> (ø)

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@s3alfisc
Copy link
Member

@all-contributors please add @shapiromh for code

Copy link
Contributor

@s3alfisc

I've put up a pull request to add @shapiromh! 🎉

@s3alfisc
Copy link
Member

Sorry, I was asking if non core R dependencies (those not in conda) are used in the docs. I did up the docs environment to install R in the toml, but I probably messed up the github actions if any changes around that were needed.

Sorry, I got it wrong then! But I think no changes are needed, only fixest and broom need to be available (and we could drop this requirement as well).

ivDiag is the only one that caused me problems, but generally this strategy makes sense to me. I saw this was the approach already taken with some of the tests (and why some of the test codes never went down paths that called the R packages). I would think the only argument against is if any of these non-conda packages are under active development with known bugs , but then you or other core maintainers should probably hold the single source of truth on what other contributors should be matching.

Yes, I agree - imo there are two arguments for running code via rpy2 - the first is to make sure that any bugs fixed via dependencies are eventually caught; I'd also want to know in case fixest syntax or defaults changed. The other (weaker) reason is that adjusting code that is directly run via rpy2 might be easier, so adding / adjusting tests might be less work?

For the tests where we currently follow this strategy, I think the main reason was time - checking ritest and the bootstrap based multiple testing methods simply took too much time.

Copy link
Contributor Author

@shapiromh shapiromh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some changes here I could make and resubmit:

  1. (Maybe) Add a check on whether the data is available and skip the test if not
  2. Add cars as a core package
  3. Add back "extended" test marks I may have inadvertently removed.

I'm not sure what the conclusion was on the failing CI because of R...

@@ -17,6 +16,7 @@ def data(local=False):


# function retrieved from Harvard Dataverse
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this was pulled because of a warning from pytest that marks on fixtures do nothing. If I didn't move the mark to the functions that may call "data", I should have.

install.packages(
c('fixest', 'broom','clubSandwich', 'did2s', 'wildrwolf', 'reticulate', 'ivDiag', 'stats', 'base', 'car'),
c('did2s', 'reticulate', 'ivDiag', 'car'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is my miss. We may as well include this as a "core r" package too then, no?

@@ -355,6 +361,7 @@ def test_fully_interacted(unit, cluster):
)


@pytest.mark.against_r_core
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a check on the data and skip if decorator here as well?

@s3alfisc
Copy link
Member

All makes sense! If you address all three we can merge. On the CI- I think it's just a CI failure, these sometimes happen. I think it will go away by itself.

@shapiromh
Copy link
Contributor Author

On the 3:

  • I added a check on the availability of the local mpdata. There's probably a more elegant way than hard coding the path, but this doesn't cause any errors...
  • Add r-car as a core package and changed the tests to core.
  • In CCV all the tests are labeled extended. I had already added the mark to one of the tests after removing it from the fixture.

@s3alfisc
Copy link
Member

pre-commit.ci autofix

@s3alfisc
Copy link
Member

Looks very good and I can merge now. Thanks for your first PR to pyfixest @shapiromh!

@s3alfisc s3alfisc merged commit abb423a into py-econometrics:master May 20, 2025
1 check passed
damandhaliwal pushed a commit to damandhaliwal/pyfixest that referenced this pull request Jun 17, 2025
* Update project toml with conda-forge available R packages

* Added new pytest marks for r_against_core and r_against_extended

* Updated packages to install for extended R environment

* Added test markers in pytest init and related pixi dev tasks

* Moved the extended mark of a fixture to the relevant function so pytest stops complaining

* Adjusted extended R test scripts to skip over modules not properly installed

* Updated R requirements to correct install issues.

* Added skip summary on tasks that may cover R tests

* Updated the documentation around changes to R tests.

* Added R as dependency to docs as well to avoid need for global install

* UNTESTED: Updated git workflow actions to reflect new R install?

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pixi lock

* Made changes to make car a core R package

* Added check on mpdata availability

* Fix: forgot to label car tests as core instead of extended

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Matthew Shapiro <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Alexander Fischer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants