Reorganize Tests against R #906

shapiromh · 2025-05-18T06:49:19Z

This is a PR related to #828.

Summary of Code Changes:

R and conda-forge available R packages ("core") are now installed as part of the dev environment. Other R packages ("extended") are still installed via the "r_test_requirements.R" method and pixi task. The doc environment was also updated to install R via conda so users (I think) do not need their own out-of-project R installation.
All tests that call R packages (even if the package is only called on a path that cannot be taken) are labeled as part of "against_r_core" or "against_r_extended.
Added a small utility to check for installed R packages. Tests that rely on extended R packages will be skipped with an informative message if the related package is not installed. This is similar to how the visualization tests are handled if dependencies are not available.
Updated the contribution docs to reflect the changes above.
[!] Updated the github actions, but I'm not sure if that was done correctly.

Some Notes:

I am not sure if the doc environment also relies on "extended R packages". If so the updates may break that code. When installing the extra R packages, I am fairly sure they will only be available in the dev and not doc environment.
I think utils/set_rpy2_path.py could be deprecated. The updates actually remove all references to this code, but then there is no built in way to extend the check R module libraries beyond the project install.
I had a very tough time installing "ivDiag" as it has a dependency on "lfe" that has a configuration not updated for modern Macs. The configuration can be adjusted, but this requires a lot of system-specific configuration.
I found it was almost feasible to have a completely self-contained R environment for the user without dependencies outside of the project. The issue is some of the extended R packages that rely on gcc or cmake. Cmake can also be installed via conda-forge. Gcc can as well, but requires pulling different versions for different systems (this seems doable in pixi, but I can only test the Mac version). Finally, there is still the issue of installing ivDiag.

…st stops complaining

…stalled

s3alfisc · 2025-05-18T19:05:43Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

s3alfisc · 2025-05-18T19:19:44Z

Hi @shapiromh, thanks so much for this! Looks good at first sight but I have to / will spend some more time on this tomorrow morning.

Re your comments:

I am not sure if the doc environment also relies on "extended R packages". If so the updates may break that code. When installing the extra R packages, I am fairly sure they will only be available in the dev and not doc environment.

Yes, some of the docs do - the "fixest vs pyfixest" vignette depends on the R core dependencies. So we'd have to add all of these to the docs deps.

I think utils/set_rpy2_path.py could be deprecated. The updates actually remove all references to this code, but then there is no built in way to extend the check R module libraries beyond the project install.

Yeah, I think it was an util I set up because R deps needed to be installed in the global R env and this should ensure that python could talk to the global R env? Though not 100% sure anymore. Let's see if we can make things run without (I would think so) and then I'd be happy to delete the function from the code base.

I had a very tough time installing "ivDiag" as it has a dependency on "lfe" that has a configuration not updated for modern Macs. The configuration can be adjusted, but this requires a lot of system-specific configuration.

Hm, I think for the ivDiag tests, could we simply run them once, store results in a csv file, and then drop it as a dependency? I think this would be the general strategy for all other non-conda packages as well - we could provide a script that calls R and produces a csv with "R results", which we store and test against. This way, all results would be reproducible (though not perfectly) and users / testers would not have to install ivDiag, fwildclusterboot, wildrwolf, ritest etc? This should then also solve the issue you mention in your last point?

I found it was almost feasible to have a completely self-contained R environment for the user without dependencies outside of the project. The issue is some of the extended R packages that rely on gcc or cmake. Cmake can also be installed via conda-forge. Gcc can as well, but requires pulling different versions for different systems (this seems doable in pixi, but I can only test the Mac version). Finally, there is still the issue of installing ivDiag.

shapiromh · 2025-05-19T00:10:24Z

Yes, some of the docs do - the "fixest vs pyfixest" vignette depends on the R core dependencies. So we'd have to add all of these to the docs deps.

Sorry, I was asking if non core R dependencies (those not in conda) are used in the docs. I did up the docs environment to install R in the toml, but I probably messed up the github actions if any changes around that were needed.

Hm, I think for the ivDiag tests, could we simply run them once, store results in a csv file, and then drop it as a dependency? I think this would be the general strategy for all other non-conda packages as well - we could provide a script that calls R and produces a csv with "R results", which we store and test against. This way, all results would be reproducible (though not perfectly) and users / testers would not have to install ivDiag, fwildclusterboot, wildrwolf, ritest etc? This should then also solve the issue you mention in your last point?

ivDiag is the only one that caused me problems, but generally this strategy makes sense to me. I saw this was the approach already taken with some of the tests (and why some of the test codes never went down paths that called the R packages). I would think the only argument against is if any of these non-conda packages are under active development with known bugs , but then you or other core maintainers should probably hold the single source of truth on what other contributors should be matching.

s3alfisc

Looks good to me, except for two small comments - one data set is not available at runtime in the CI & the car package can be installed from conda directly. Though, maybe it is cleaner to ask users to download it manually as it is not part of the "core" R test dependencies?

s3alfisc · 2025-05-18T19:08:52Z

tests/test_ccv.py

@@ -17,6 +16,7 @@ def data(local=False):


 # function retrieved from Harvard Dataverse


Not sure if it is clear from the description in the pyproject toml, but the extended tests are generally turned off as they run for way too long (some of them more than 10 minutes).

Oh, this was pulled because of a warning from pytest that marks on fixtures do nothing. If I didn't move the mark to the functions that may call "data", I should have.

s3alfisc · 2025-05-19T19:41:53Z

r_test_requirements.R

 install.packages(
-    c('fixest', 'broom','clubSandwich', 'did2s', 'wildrwolf', 'reticulate', 'ivDiag', 'stats', 'base', 'car'),
+    c('did2s', 'reticulate', 'ivDiag', 'car'),


We should also be able to install r-car via conda-forge.

Oh, this is my miss. We may as well include this as a "core r" package too then, no?

s3alfisc · 2025-05-19T19:43:13Z

tests/test_did.py

@@ -33,6 +34,8 @@ def data():
    return df_het


+@pytest.mark.skipif(import_check is False, reason="R package did2s not installed.")


This is an excellent solution, big fan =)

s3alfisc · 2025-05-19T19:49:57Z

tests/test_did.py

@@ -355,6 +361,7 @@ def test_fully_interacted(unit, cluster):
    )


+@pytest.mark.against_r_core


This might fail in the CI as I don't think the data is available on github? It's from the did R repo: https://github.com/bcallaway11/did/tree/master/data When I wrote the test, I did not want to think about how to load dta files into pandas, and I did not simply want to copy over the file as I wasn't sure about the license. did uses GPL-2 which I think (?) does not allow reuse under any other license then GPL-2.

Could add a check on the data and skip if decorator here as well?

s3alfisc · 2025-05-19T19:54:39Z

The CI currently fails because we have to rebuild the pixi lock file after updating the pyproject toml. Will do this, one moment =)

s3alfisc · 2025-05-19T20:03:19Z

Now let's see if the CI runs - I recall that I struggled in the past: I've been installing r and packages from CRAN via r2u, which provides ubuntu binaries of R packages (so no need for manual installation!). But I recall that the conda env and the R installation provided by r2u had problems talking to each other; that's why I eventually settled for the solution to "install all R depcs via r2u and none via conda".

I think that all "r core" tests will run, as all required deps can be found in the conda environment, but would assume that all r-extended tests will not find the dependencies installed into the global env (via r2u) and therefore be skipped.

The solution here would be to ask r2u to install into the conda env, and I recall that I tried and failed 😅

codecov · 2025-05-19T20:07:16Z

Codecov Report

Attention: Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pyfixest/utils/check_r_install.py	87.50%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (019edf5) and HEAD (86e5867). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (019edf5) HEAD (86e5867)

tests-extended 1 0

Flag	Coverage Δ
core-tests	`78.47% <87.50%> (-1.88%)`	⬇️
tests-extended	`?`
tests-vs-r	`15.41% <87.50%> (-31.69%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
pyfixest/utils/check_r_install.py	`87.50% <87.50%> (ø)`

... and 16 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

s3alfisc · 2025-05-19T20:08:48Z

@all-contributors please add @shapiromh for code

allcontributors · 2025-05-19T20:08:58Z

@s3alfisc

I've put up a pull request to add @shapiromh! 🎉

s3alfisc · 2025-05-19T20:13:47Z

Sorry, I was asking if non core R dependencies (those not in conda) are used in the docs. I did up the docs environment to install R in the toml, but I probably messed up the github actions if any changes around that were needed.

Sorry, I got it wrong then! But I think no changes are needed, only fixest and broom need to be available (and we could drop this requirement as well).

ivDiag is the only one that caused me problems, but generally this strategy makes sense to me. I saw this was the approach already taken with some of the tests (and why some of the test codes never went down paths that called the R packages). I would think the only argument against is if any of these non-conda packages are under active development with known bugs , but then you or other core maintainers should probably hold the single source of truth on what other contributors should be matching.

Yes, I agree - imo there are two arguments for running code via rpy2 - the first is to make sure that any bugs fixed via dependencies are eventually caught; I'd also want to know in case fixest syntax or defaults changed. The other (weaker) reason is that adjusting code that is directly run via rpy2 might be easier, so adding / adjusting tests might be less work?

For the tests where we currently follow this strategy, I think the main reason was time - checking ritest and the bootstrap based multiple testing methods simply took too much time.

shapiromh

I think there are some changes here I could make and resubmit:

(Maybe) Add a check on whether the data is available and skip the test if not
Add cars as a core package
Add back "extended" test marks I may have inadvertently removed.

I'm not sure what the conclusion was on the failing CI because of R...

shapiromh · 2025-05-20T03:28:13Z

tests/test_ccv.py

@@ -17,6 +16,7 @@ def data(local=False):


 # function retrieved from Harvard Dataverse


Oh, this was pulled because of a warning from pytest that marks on fixtures do nothing. If I didn't move the mark to the functions that may call "data", I should have.

shapiromh · 2025-05-20T03:29:01Z

r_test_requirements.R

 install.packages(
-    c('fixest', 'broom','clubSandwich', 'did2s', 'wildrwolf', 'reticulate', 'ivDiag', 'stats', 'base', 'car'),
+    c('did2s', 'reticulate', 'ivDiag', 'car'),


Oh, this is my miss. We may as well include this as a "core r" package too then, no?

shapiromh · 2025-05-20T03:30:44Z

tests/test_did.py

@@ -355,6 +361,7 @@ def test_fully_interacted(unit, cluster):
    )


+@pytest.mark.against_r_core


Could add a check on the data and skip if decorator here as well?

s3alfisc · 2025-05-20T09:18:29Z

All makes sense! If you address all three we can merge. On the CI- I think it's just a CI failure, these sometimes happen. I think it will go away by itself.

shapiromh · 2025-05-20T14:44:16Z

On the 3:

I added a check on the availability of the local mpdata. There's probably a more elegant way than hard coding the path, but this doesn't cause any errors...
Add r-car as a core package and changed the tests to core.
In CCV all the tests are labeled extended. I had already added the mark to one of the tests after removing it from the fixture.

s3alfisc · 2025-05-20T18:23:52Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

s3alfisc · 2025-05-20T18:25:04Z

Looks very good and I can merge now. Thanks for your first PR to pyfixest @shapiromh!

* Update project toml with conda-forge available R packages * Added new pytest marks for r_against_core and r_against_extended * Updated packages to install for extended R environment * Added test markers in pytest init and related pixi dev tasks * Moved the extended mark of a fixture to the relevant function so pytest stops complaining * Adjusted extended R test scripts to skip over modules not properly installed * Updated R requirements to correct install issues. * Added skip summary on tasks that may cover R tests * Updated the documentation around changes to R tests. * Added R as dependency to docs as well to avoid need for global install * UNTESTED: Updated git workflow actions to reflect new R install? * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pixi lock * Made changes to make car a core R package * Added check on mpdata availability * Fix: forgot to label car tests as core instead of extended * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Matthew Shapiro <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Alexander Fischer <[email protected]>

Matthew Shapiro added 11 commits May 17, 2025 11:09

Update project toml with conda-forge available R packages

e3f37b5

Added new pytest marks for r_against_core and r_against_extended

19e6411

Updated packages to install for extended R environment

992d289

Added test markers in pytest init and related pixi dev tasks

3fa3f8a

Moved the extended mark of a fixture to the relevant function so pyte…

8ed74df

…st stops complaining

Adjusted extended R test scripts to skip over modules not properly in…

b2d206a

…stalled

Updated R requirements to correct install issues.

20ae811

Added skip summary on tasks that may cover R tests

2c1de9a

Updated the documentation around changes to R tests.

fa50b3d

Added R as dependency to docs as well to avoid need for global install

e0e2d06

UNTESTED: Updated git workflow actions to reflect new R install?

234412a

s3alfisc self-requested a review May 18, 2025 19:05

[pre-commit.ci] auto fixes from pre-commit.com hooks

836a696

for more information, see https://pre-commit.ci

s3alfisc reviewed May 19, 2025

View reviewed changes

pixi lock

8baa1e6

allcontributors bot mentioned this pull request May 19, 2025

docs: add shapiromh as a contributor for code #907

Merged

shapiromh commented May 20, 2025

View reviewed changes

shapiromh and others added 3 commits May 20, 2025 22:17

Merge branch 'py-econometrics:master' into master

250cb55

Made changes to make car a core R package

8477d40

Added check on mpdata availability

96e3363

Fix: forgot to label car tests as core instead of extended

86e5867

[pre-commit.ci] auto fixes from pre-commit.com hooks

6e234e5

for more information, see https://pre-commit.ci

s3alfisc merged commit abb423a into py-econometrics:master May 20, 2025
1 check passed

		@@ -17,6 +16,7 @@ def data(local=False):


		# function retrieved from Harvard Dataverse

		@@ -33,6 +34,8 @@ def data():
		return df_het


		@pytest.mark.skipif(import_check is False, reason="R package did2s not installed.")

		@@ -355,6 +361,7 @@ def test_fully_interacted(unit, cluster):
		)


		@pytest.mark.against_r_core

Reorganize Tests against R #906

Reorganize Tests against R #906

Uh oh!

Conversation

shapiromh commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s3alfisc commented May 18, 2025

Uh oh!

s3alfisc commented May 18, 2025

Uh oh!

shapiromh commented May 19, 2025

Uh oh!

s3alfisc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s3alfisc commented May 19, 2025

Uh oh!

s3alfisc commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

s3alfisc commented May 19, 2025

Uh oh!

allcontributors bot commented May 19, 2025

Uh oh!

s3alfisc commented May 19, 2025

Uh oh!

shapiromh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s3alfisc commented May 20, 2025

Uh oh!

shapiromh commented May 20, 2025

Uh oh!

s3alfisc commented May 20, 2025

Uh oh!

s3alfisc commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

shapiromh commented May 18, 2025 •

edited

Loading

s3alfisc commented May 19, 2025 •

edited

Loading

codecov bot commented May 19, 2025 •

edited

Loading