-
Notifications
You must be signed in to change notification settings - Fork 64
Add capability to parse nested suites in capgen #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pp_suite.py, scripts/parse_tools/xml_tools.py, and test/nested_suite_test/test_host.F90
… feature/nested_suites
mkavulich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions
scripts/parse_tools/xml_tools.py
Outdated
| ... </suites> | ||
| ... ''' | ||
| >>> root = ET.fromstring(xml) | ||
| >>> expand_nested_suites(root, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if you've tested the behavior of infinite recursion, is that something we should have a test for to make sure it fails gracefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, and I recall that we discussed this two weeks ago. If I remember correctly, we assumed that Python would throw an error about the recursion (but we didn't try). We could make a doctest for this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder as part of this algorithm if it would make sense to keep a list of nested suites that have been processed that contain another nested suite. Couldn't you avoid infinite recursion by stopping if you're currently trying to processes a nested suite that already occurs in the list of already-processed nested suites that contain nested suites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in 0d144ae
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not happy with the current solution, it gives no hints as to where the problem might lie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we use the current solution (since it is the safest and easiest one), but collect the names of the suites and just spit out that list when we throw the exception?
grantfirl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slick. I tried to review in order to understand how to use the new functionality and the implementation in the python scripts. I didn't really review the added test.
peverwhee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of requests. I haven't done a full test review yet but will get to that in my second review!
… feature/nested_suites
… absolute or relative, always write expanded SDF
…ccpp-framework into feature/nested_suites
I agree - one suite per SDF sounds simplest. |
…ion of nested suites
|
@peverwhee @gold2718 I reverted back to only one suite per SDF. The code is so much cleaner! The only things that got a little messy are the doctests, because I need to read nested suites from files (I could have tried Python mock etc, but this seemed easy enough). I also implemented a - what I believe - robust way of catching recursions. |
|
@climbfuji So if I'm understanding right, the difference between this latest implementation and the original is that we allow nested suites, but only if they are defined in another file? |
Correct. That eliminates the need for |
mkavulich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (again)
peverwhee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @climbfuji !
SIMA will eventually need an offline way to generate the expanded SDF because of the way the build system works, but I'll open an issue and make a PR for that in the future!
gold2718
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not done a complete code review but I wrote some tests instead. I will submit my changes and new test files as a PR to this branch.
I think all the tests should pass, we should discuss any with which you disagree.
|
Is there a purpose to this branch? |
No, I must have accidentally pushed to the wrong remote. I deleted it. |
@peverwhee. if my new tests get accepted, there are examples of generating the expanded suites. Something like: Can the CAM-SIMA build system just wrap this in a function if it needs an expanded suite file? All the routines are in xml_tools.py |
|
Interesting, after merging @gold2718's PR into my branch (all tests passed for that PR in my fork), I am now seeing this (https://github.com/NCAR/ccpp-framework/actions/runs/19524388803/job/55894181408): |
|
@climbfuji You could modify the |
db06414 to
e0d375b
Compare
Thanks. I was able to reproduce this locally, too. But I ended up backing out the PR from @gold2718 entirely, because I am running out of time before the Thanksgiving break and I really want to get this in. Because I force-pushed the older commit, @gold2718 can simply direct his branch/PR to NCAR develop after my PR went in and there won't be any merge conflicts etc. Besides, the changes in @gold2718 didn't really have anything to do with my changes, therefore it's better to keep them separate anyway. I will address the other questions/issue directly related to my PR before requesting a final review from @gold2718. |
…when maximum number of iterations is exceeded
8603d2c to
cfc6090
Compare
and with additional bug fixes and updates. Added test/unit_tests to GitHub actions.
cfc6090 to
1b80a4b
Compare
climbfuji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gold2718 I pulled in your changes from climbfuji#6, then backed out the xml_tools.py updates except formatting and typos (thanks for fixing those). Then I fixed the failing unit tests from test_sdf.py. Now, all unit tests pass (not just test_sdf.py).
Your changes + partially reverting xml_tools.py + fixing unit tests are all combined in one commit to make it easier to review for you:
1b80a4b#diff-a6e210acb1613559c602b201c64b9d210eeee59c1deeba26db33c60a7e88c78e
You had one more comment about an unnecessary file argument being carried around or tested for? If you tell me what to look for, then I can address this and HOPEFULLY we can finally merge this PR and move on to fixing the xmllint inconsistencies in a separate PR.
Thanks!
| git \ | ||
| libxml2-utils | ||
| pip install --user pytest | ||
| python -m pip install --upgrade pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this consistent with the install commands in python.yaml
| test/advection_test/advection_test_reports.py \ | ||
| test/ddthost_test/ddthost_test_reports.py \ | ||
| test/var_compatibility_test/var_compatibility_test_reports.py | ||
| pytest \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligning the indentation to match the rest of the file
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y \ | ||
| libxml2-utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now require xmllint to run the pytests - we should have done so much earlier imo.
| res = validate_xml_file(sdf, 'suite', schema_version, run_env.logger) | ||
| if not res: | ||
| raise CCPPError(f"Invalid suite definition file, '{sdf}'") | ||
| _ = validate_xml_file(sdf, 'suite', schema_version, run_env.logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on xmllint further below (test_sdf.py)
| # Exercise | ||
| _, xml_root = read_xml_file(source, logger) | ||
| schema_version = find_schema_version(xml_root) | ||
| # Some versions of xmllint return an exit code 0 even if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I captured this correctly from my memory. I also don't know what the best way is to deal with that. I asked chatgpt for good (exit code /= 0 if xml validation fails) and bad (exit code == 0 if validation fails) versions of xmllint, but the answer isn't simple (see below).
It seems that the best way to deal with those xmllint differences is to not use call_command as it is but check for (a) the return code /= 0 and (b) text in the output stderr/stdout that suggest that the validation failed. This is definitely beyond the scope of this PR and should be addressed separately. The PR that addresses these xmllint issues must ensure that we have both good and bad versions of xmllint installed in GitHub actions so that we can test the solution. For now, since GitHub actions and my laptop both have xmllint versions that behave correctly (rc /= 0), the code in this PR works correctly.
chatgpt
Good question. I dug through public bug reports, mailing-lists and forum posts; unfortunately there is not a clean, authoritative “matrix” of every version of xmllint / libxml2 ⇢ exit-code behavior. But I can summarise the known confirmed cases — plus what is unknown. Use this as a working “table of suspicion” (not a guarantee).
✅ What we know (sources / reports)
| libxml2 / xmllint version (or build) | Observed “bad” behavior (validation or namespace error → exit code 0) | Source / context |
|---|---|---|
| “libxml version 20912” (on Ubuntu 20.04) | Namespace error (undefined prefix) — but exit code 0 | Mailing-list report: user on Ubuntu 20.04 said xmllint --noout text.xml; echo $? produced 0 despite a “namespace error : Namespace prefix v … not defined” message. (GNOME Mail) |
| “libxml version 20510” (on some older distro) | Schema validation failure (invalid content per XSD) — but exit code 0 | A report on Unix.com: user ran xmllint --schema … myinput.xml and saw “fails to validate” message, but $? was 0. Their xmllint reported “using libxml version 20510” in that session. (Unix Linux Community) |
| (Various versions — unspecified) | Inconsistent exit codes: sometimes non-zero, sometimes zero — depending on error type or build | The upstream bug tracker has an issue titled “xmllint doesn’t always terminate with a non-zero exit status in case of bad usage”. (GitLab GNOME) |
| Unspecified builds (e.g. older Debian / SGML-/DTD-based validation) | HTML / DTD validation sometimes fails but exit code remains 0 | A Debian bug report “XHTML 1.0 validation is broken” includes commentary that “the return code of xmllint is 0 … don’t look at it; it is not reliable.” (Google Groups) |
Summary of what’s reliably documented: there are definitely some builds of libxml2 / xmllint — including version 20510 and 20912 — that have been observed to treat validation or namespace errors as “no error” as far as the exit status is concerned.
⚠ What we don’t know (lack of evidence / no systematic listing)
-
There is no publicly maintained, comprehensive registry that maps every libxml2 release (or distro build) to “good” vs “bad” exit-code behavior.
-
Many bug reports or forum threads omit the exact version (or build metadata) of xmllint, or only mention a generic “xmllint” — so you can’t reliably attribute behavior to a version number.
-
There may well be additional “bad” versions beyond 20510 / 20912; but absent a public report with version info, they remain suspected, not confirmed.
-
Fixes or regressions may occur in particular platform/distribution builds (e.g. Debian, Ubuntu, Gentoo, MSYS2) — so even a version number doesn’t guarantee consistent behavior if compilation flags, patches, packaging differ.
Finally, note that some “errors” that a user considers as “validation failure” (e.g. namespace prefix undefined, missing DTD, missing schema import) may be treated internally differently — perhaps as “warnings” rather than validation errors — which may also affect exit codes.
🎯 What this means in practice
-
You cannot rely solely on
xmllint’s exit code to guarantee that validation (schema, namespace, DTD) succeeded — at least not on all systems. -
If you're writing scripts or CI-checks that must catch invalid XML, you should combine exit-code checking and parse stdout / stderr (look for “error”, “failed to validate”, etc.).
-
If you control your toolchain, it may be wise to upgrade to a recent libxml2 build, test that exit code works as expected, and possibly patch or wrap xmllint.
-
Before trusting xmllint in a new environment, you should run a minimal test: intentionally invalid XML (e.g. undefined namespace, schema violation), check exit code. That empirical test is more reliable than trusting version numbers.
If you like — I can try to survey (via open-source repositories and distributions) what the “latest” libxml2 versions (2024–2025) do — i.e. test whether exit-code behavior has been fixed upstream. That might help you decide if updating helps. Do you want me to start such a survey?
Good question. I dug through public bug reports, mailing-lists and forum posts; unfortunately there is *not* a clean, authoritative “matrix” of every version of xmllint / libxml2 ⇢ exit-code behavior. But I *can* summarise the **known confirmed cases** — plus what is *unknown*. Use this as a working “table of suspicion” (not a guarantee).✅ What we know (sources / reports)
| libxml2 / xmllint version (or build) | Observed “bad” behavior (validation or namespace error → exit code 0) | Source / context |
|---|---|---|
| “libxml version 20912” (on Ubuntu 20.04) | Namespace error (undefined prefix) — but exit code 0 | Mailing-list report: user on Ubuntu 20.04 said xmllint --noout text.xml; echo $? produced 0 despite a “namespace error : Namespace prefix v … not defined” message. ([GNOME Mail]1) |
| “libxml version 20510” (on some older distro) | Schema validation failure (invalid content per XSD) — but exit code 0 | A report on Unix.com: user ran xmllint --schema … myinput.xml and saw “fails to validate” message, but $? was 0. Their xmllint reported “using libxml version 20510” in that session. ([Unix Linux Community]2) |
| (Various versions — unspecified) | Inconsistent exit codes: sometimes non-zero, sometimes zero — depending on error type or build | The upstream bug tracker has an issue titled “xmllint doesn’t always terminate with a non-zero exit status in case of bad usage”. ([GitLab GNOME]3) |
| Unspecified builds (e.g. older Debian / SGML-/DTD-based validation) | HTML / DTD validation sometimes fails but exit code remains 0 | A Debian bug report “XHTML 1.0 validation is broken” includes commentary that “the return code of xmllint is 0 … don’t look at it; it is not reliable.” ([Google Groups]4) |
Summary of what’s reliably documented: there are definitely some builds of libxml2 / xmllint — including version 20510 and 20912 — that have been observed to treat validation or namespace errors as “no error” as far as the exit status is concerned.
⚠ What we don’t know (lack of evidence / no systematic listing)
- There is no publicly maintained, comprehensive registry that maps every libxml2 release (or distro build) to “good” vs “bad” exit-code behavior.
- Many bug reports or forum threads omit the exact version (or build metadata) of xmllint, or only mention a generic “xmllint” — so you can’t reliably attribute behavior to a version number.
- There may well be additional “bad” versions beyond 20510 / 20912; but absent a public report with version info, they remain suspected, not confirmed.
- Fixes or regressions may occur in particular platform/distribution builds (e.g. Debian, Ubuntu, Gentoo, MSYS2) — so even a version number doesn’t guarantee consistent behavior if compilation flags, patches, packaging differ.
Finally, note that some “errors” that a user considers as “validation failure” (e.g. namespace prefix undefined, missing DTD, missing schema import) may be treated internally differently — perhaps as “warnings” rather than validation errors — which may also affect exit codes.
🎯 What this means in practice
- You cannot rely solely on
xmllint’s exit code to guarantee that validation (schema, namespace, DTD) succeeded — at least not on all systems. - If you're writing scripts or CI-checks that must catch invalid XML, you should combine exit-code checking and parse stdout / stderr (look for “error”, “failed to validate”, etc.).
- If you control your toolchain, it may be wise to upgrade to a recent libxml2 build, test that exit code works as expected, and possibly patch or wrap xmllint.
- Before trusting xmllint in a new environment, you should run a minimal test: intentionally invalid XML (e.g. undefined namespace, schema violation), check exit code. That empirical test is more reliable than trusting version numbers.
If you like — I can try to survey (via open-source repositories and distributions) what the “latest” libxml2 versions (2024–2025) do — i.e. test whether exit-code behavior has been fixed upstream. That might help you decide if updating helps. Do you want me to start such a survey?
gold2718
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a complete review but I think I identified code that is no longer required.
…te in scripts/parse_tools/xml_tools.py
gold2718
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, thanks!!
Thanks for your thorough review and the contributions to testing the SFDs etc. Definitely improved the new code by a lot. We have enough approvals to merge this - will do now. |
Description
This PR adds the capability to parse nested suites in capgen, as discussed in #275.
Notes/Features
<suite>root element can exist in each file.group=attribute; the content of that group in the referenced suite is merged into the existing group while preserving the order. This is the first example (aka the merge=true example) described in Add suite keyword to suite definition file? #275 (comment).group=attribute; if they do, only this group is imported, otherwise all groups are imported in the correct location in the main suite. This is the second example (aka the merge=false example) described in Add suite keyword to suite definition file? #275 (comment).PrettyElementTreeclass was replaced by a much shorter functionwrite_xml_filethat uses features available in the already-used XML python library. And since it containeddomtwice in the name, I simply couldn't resist! The output of the new implementation is almost identical to the previousPrettyElementTreeoutput, see screenshot at the bottom of the PR description.nested_suite_testand the new unit test suitetest_sdf.py, thus the actual changes to the code are quite small and hopefully easy to review.User interface changes?: Yes, but these are optional. In order to use the new functionality, users have to update their suite definition file to the new XML schema version 2.0 and use the new syntax for
nested_suiteelements. There are no user interface changes for the previous XML schema 1.0, which remains valid, and there are no user interface changes for invoking capgen or in the auto-generated code.Issues
Fixes #275
Testing
Test removed: none
Test added:
- added
test/nested_suite_test- seeREADME.mdin that directory for more information- added doctests for the new functions in
xml_tools.py- added unit tests in
test/unit_test/test_sdf.py(credit @gold2718)Unit tests: all pass
System tests: all pass
Manual testing:
nested_suite_test,test_sdf.py, new docstring testsAdditional information
Difference in
datatable.xmloutput of the newwrite_xml_filefunction (left) and the oldPrettyElementTreeclass (right):In a nutshell, the differences are no white spaces before the closing characters of XML elements, and the additional
<? xml ...?>line at the top. Both of these seem reasonable to me.