Skip to content

Conversation

@luisheb
Copy link
Contributor

@luisheb luisheb commented May 12, 2025

Describe the proposed changes

Functionality to plot Mixed data objects represented as pandas DataFrame.

Checklist before requesting a review

  • I have performed a self-review of my code
  • The code conforms to the style used in this package
  • The code is fully documented and typed (type-checked with Mypy)
  • I have added thorough tests for the new/changed functionality

@luisheb
Copy link
Contributor Author

luisheb commented Jun 16, 2025

I'm not entirely satisfied with how the graphs look in the final example, but the core plotting logic is complete and ready for review.

@luisheb luisheb marked this pull request as ready for review June 16, 2025 18:37
@codecov
Copy link

codecov bot commented Jun 23, 2025

Codecov Report

Attention: Patch coverage is 16.23932% with 98 lines in your changes missing coverage. Please review.

Project coverage is 86.58%. Comparing base (e44d8e5) to head (a877ad4).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
skfda/exploratory/visualization/representation.py 16.23% 98 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #687      +/-   ##
===========================================
- Coverage    87.04%   86.58%   -0.47%     
===========================================
  Files          199      199              
  Lines        14639    14732      +93     
===========================================
+ Hits         12742    12755      +13     
- Misses        1897     1977      +80     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@luisheb
Copy link
Contributor Author

luisheb commented Jun 23, 2025

Any reference on how to write plot tests? Should I even do it? @vnmabus

@vnmabus
Copy link
Member

vnmabus commented Jun 23, 2025

Any reference on how to write plot tests? Should I even do it? @vnmabus

We do not have any infrastructure for that at the moment, unfortunately. So they are not mandatory.

@eliegoudout
Copy link
Contributor

Any reference on how to write plot tests? Should I even do it? @vnmabus

We do not have any infrastructure for that at the moment, unfortunately. So they are not mandatory.

My company will soon release a work project of mine, in which I developed an easy plot testing fixture. Would you be interested that I ping you here upon effective opensourcing?
I implemented a match_plots fixture which captures all calls to plt.show() and compares with expectations already saved. Updatig expectations occurs with pytest --update-plots.

@vnmabus
Copy link
Member

vnmabus commented Jul 3, 2025

My company will soon release a work project of mine, in which I developed an easy plot testing fixture. Would you be interested that I ping you here upon effective opensourcing? I implemented a match_plots fixture which captures all calls to plt.show() and compares with expectations already saved. Updatig expectations occurs with pytest --update-plots.

I planned to have a look to the tools that Matplotlib itself uses for testing. For example image_comparison looks similar to what you propose except that you have to manually decorate each test and move the reference images manually to a particular location. Your approach sounds more comfortable to use.

However, I am a bit worried about the impact of storing a lot of images in the git repository. I think it would be better if the images could somehow be stored in a separate location (maybe in the same place as the latest version of the documentation?). Then we would not need to store a lot of slightly different versions of images in the repository, caused by changes in the plotting code (images are much more expensive than text in terms of space).

In any way, please feel free to ping us, as it would be good to know all the alternatives that we have before committing to one.

@eliegoudout
Copy link
Contributor

eliegoudout commented Jul 3, 2025

My fixture heavily rlies on matplotlib.testing indeed. I personally chose to store plots in /test/expected/ (so, next to test/project/) an indeed it adds up if you have too many plots. I will deinitely update you then!
@ego-thales

@ego-thales
Copy link
Contributor

ego-thales commented Jul 25, 2025

@vnmabus @luisheb
As promised, I come back to you now that my work project is opensourced 🎉

Here's a direct link to match_plots
https://github.com/ThalesGroup/scio/blob/837e5c328e2faa4c154bda1146a017cbfe654a11/test/conftest.py#L261C1-L277C8

The use is detailed in the docstring. It has two CLI options (disable or update) and one runtime ignore() method to disable locally.

What it does

It

  • monkeypatches plt.show to instead save figures with a normalized name (depending on the test and the number of plots found before in the test), e.g. blablabla.fig0.observed.png;
  • collects expected plots located saved test/expected/<test_name>/<normalized names>.png
  • compares all blablabla.fig0.observed.png with blablabla.fig0.png
  • Upon mismatch (within very small tolerance I found necessary despite every reproducibility hack), fails the test and also creates the blablabla.fig0.observed.diff.png (implemented by matplotlib directly). If --ignore-plots this is skipped, and if --update-plots, then the old blablabla.fig0.png is replaced and test is skipped with an update log.

Implementation

It is a bit "dirty" in some ways as I did not want to have an infinitely long conftest.py 'already long enough..), but should stilll be quite robust.

Notes

  • If you update multiple plots in the same test, the multiple pytest.skip will eventually fail the test (see Double pytest.skip leads to failure pytest-dev/pytest#13537), but this does not hurt logic, only report.
  • If you KeyboardInterrupt during the test, you might end up with an extra blablabla.fig0.observed.png file.
  • You can also check out the match_plots_torch fixture which I use to automatically ignore() unless dtype is torch.float and device == torch.device("cpu").
  • You can also check out analoguous match_outerr for stdout/stderr captures and match_arrays which works for numpy ad torch arrays, and can be extended to any class of arrays.
  • You can browse my test/expected folder to get an idea of what it concretely does. But in terms of file size, since test/ is not shipped with my releases, I do not care too much. Also, it is still quite reasonable.

Finally, I would like to say thanks for you work, since your repository was a good inspiration regarding my own developments. You might find a lot of similarities in the presentation of scio in general, it's not by chance!

All the best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants