Skip to content

Conversation

@ymzhang0
Copy link
Collaborator

@ymzhang0 ymzhang0 commented Nov 6, 2025

Add parsing functions for: electron dos, phonon dos, epw interpolated bands, max_eigenvalues, isotropic/fsr anisotropic gap functions in tools/parser.py

Fixed a small bug in the original parse_epw_a2f function that it should skip the first line since it's not commented by #.

I've tested those functions, and they work normally for my test files. If you want, I can also upload those files to tests folder, since they are also useful for the future tests of these functions

Add parsing functions for: electron dos, phonon dos, epw interpolated bands, max_eigenvalues, isotropic/fsr anisotropic gap functions in tools/parser.py
@ymzhang0 ymzhang0 requested a review from mbercx November 7, 2025 12:41
@mbercx
Copy link
Collaborator

mbercx commented Nov 26, 2025

Thanks @ymzhang0! I'm wondering if we can add some minimal data_regression tests for these functions? As always, we should try to create the smallest test fixtures possible. ^^

Other than that, I would merge these outright, since you're using them now they will probably do what is expected. Once the tests are also set up, we can always come back and refactor things.

@ymzhang0
Copy link
Collaborator Author

Hi Marnik,
Was planning to firstly add the parsing functions then a separate test in another PR. But no problem for me to also do write the tests here. I've created another folder /tests/tools/ where I put the tests for these functions? I've also put the lengthy output files in the /tests/files folder.

add tests for the parsing functions in tools/parsers.py
@mbercx
Copy link
Collaborator

mbercx commented Nov 26, 2025

Was planning to firstly add the parsing functions then a separate test in another PR.

Hehe, I know I go on a lot about splitting up PRs/commits into smaller ones, but for tests (and also documentation), they should go in along with the changes made to the code, often even without mentioning them in the commit message. I sometimes have PRs that only make changes to tests, but that's often because I add a new fixture, refactor things, or add tests that were missed in the past.

Thanks a lot for adding the test here! I would still try to reduce the number of line changes, e.g. the test_parse_epw_a2f.yml data regression file is 10k lines. A few ways of doing so:

  1. Reduce the number of test files. I.e. do we need to add 14 files for the full_iso_eliashberg test? Sometimes you need all files, but if you only need one to test the functionality of the parser, just add that one.
  2. Reduce the size of the files. This can be done by either (1) running with very low precision or (2) trimming down the files manually. For e.g. pw.x I run with a ridiculous k-grid (2x1x1). In the end it still tests the parsing just fine, and all the output files are much smaller. I also typically remove things from the stdout that we are not parsing (or doesn't otherwise affect the parser). I'm wondering if we need all 500 energies in the .a2f file, for example. I think I had to keep it previously, because our parser wasn't very robust, but maybe we can improve that?
  3. Reduce the data you pass to data_regression. E.g. only take the first ten elements from a list. (or take evenly spaced elements. I wrote a little function that I added to a custom serializer here). The regression doesn't have to test every datapoint to be effective.

It's a bit of extra work to pay attention to this, but over time adding large files will make your repository very bloated. I'm sure you've noticed some scientific repositories where it takes a while to clone them because they are huge. That's what we want to avoid. :)

Let me know if you need a hand! Once the test files are smaller, the review is also easier. E.g. on GitHub I need to review file per file now because of the number of changes. When I try to review in VSCode directly (which I typically do), I can't open certain files without pulling the changes. No big deal, but all in all it's good to try and keep test files minimal. 🙏

Reduce the size of data folder, trim regression data
@ymzhang0
Copy link
Collaborator Author

Thanks for the review.

I deleted most of the isotropic/anisotropic gap function files keeping only 3 of them. I'll probably write a fitting function for Tc so I think it's reasonable to keep 3 temperature points.

For the data regression, I only keep the top 10 items in each list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants