Skip to content

docs: Switch from 'breathe' to 'docleaf' for API documentation generation #59570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 3, 2023

Conversation

michaeljones
Copy link
Collaborator

@michaeljones michaeljones commented Jun 21, 2023

This PR switches the Zephyr documentation from using Breathe to using Docleaf for generating API documentation from Doxygen XML output.

Docleaf is a re-implementation of Breathe with the aim to improve on Breathe's performance and memory consumption issues. For reference, I am the original creator of Breathe. Docleaf is largely implemented in Rust with a Python layer to implement the Sphinx extension interface. Builds for Windows, MacOS and Linux and for Python 3.7 to 3.11 are available on PyPI. Docleaf is published under the Parity Public License.

I have been working on Docleaf using the Zephyr docs as a test bed and I think the output is at a sufficient standard to replace Breathe. A clean build of the docs on my Ubuntu 22.04 laptop is 2.1x faster than with Breathe and Docleaf performs content checks on the XML files to minimise incremental rebuild times which means that some of the XML copying logic in the doxyrunner code can be removed if you want. I confess I have not done detailed performance analysis of Docleaf to discover further opportunities for improvement. I have not done memory comparisons of the two solutions but assume that the Rust structures holding the XML information are more efficient than the problematic minidom module relied upon by Breathe.

Further notes:

  • I have tested on Windows 11 and Ubuntu 22.04. I don't have a MacOS device available.
  • When testing on Window 11, either this line (https://github.com/sphinx-doc/sphinx/blob/d3c91f951255c6729a53e38c895ddc0af036b5b9/sphinx/domains/c.py#L3285) or a line like it, passes an Exception straight to logger.warning which results in the warnings_filter.py code attempting to process an exception object with a regex which then failed so I have adapted the warnings_filter.py code to work around that though the might be better approaches. I'm not sure why it wasn't triggered before.
  • There are still some warnings printed during the build though less than with Breathe. It might still be sensible to attempt to address them before merging.
  • I've used an exact match for the Docleaf version in requirements-doc.txt as the project is still young and updates should be taken with some care for the moment. Though no significant breaking changes are planned.
  • I've removed :members: from all the directive sites as that isn't currently respected by Docleaf which will output all content from the named group which I assume is the desired behaviour. If that is an issue, I can look into adjusting it.
  • I've updated some of the general that referred to Breathe to now refer to Docleaf as it still seems relevant for the moment.
  • I have done my best to fix issues in the output but the Zephyr docs are extensive so I may have missed parts.

I hope this all seems reasonable.

Checklist

  • Test on MacOS

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @michaeljones, and thank you very much for your first pull request to the Zephyr project!

A project maintainer just triggered our CI pipeline to run it against your PR and ensure it's compliant and doesn't cause any issues. You might want to take this opportunity to review the project's Contributor Expectations and make any updates to your pull request if necessary. 😊

This switches the Sphinx conf.py file over to using the
'docleaf.doxygen' module and the associated docleaf configuration
entries, replacing the breathe module and config.

Signed-off-by: Michael Jones <[email protected]>
@michaeljones michaeljones force-pushed the docleaf branch 2 times, most recently from b610d99 to 4bfd6ff Compare June 21, 2023 20:44
@michaeljones
Copy link
Collaborator Author

The "Documentation Build (HTML)" test is failing as there is a new warning that isn't covered by the current known-warnings.txt patterns. I don't know the preferred way forward here. I can explore getting Docleaf to handle that better (and I assume there might be others if this is failing fast) or we can expand the patterns. I realise the former approach is preferable.

@nashif
Copy link
Member

nashif commented Jun 21, 2023

I realise the former approach is preferable.

yes, if it can be fixed at the root, that is better than excluding via file, but to move forward you might want to use the exclusion path I guess.

@michaeljones
Copy link
Collaborator Author

I will work through them and post progress here.

The current error contains enum bt_conn_oob_info::@54 bt_conn_oob_info.type and which is an nested anonymous enum field which Doxygen doesn't do the best job of representing in its output so it is hard to piece together a clear representation but I should at least try to match the Breathe output (https://docs.zephyrproject.org/latest/connectivity/bluetooth/api/connection_mgmt.html#c.bt_conn_oob_info) or at least not trigger an error.

@carlescufi carlescufi requested review from gmarull and kartben June 22, 2023 08:45
@kartben
Copy link
Collaborator

kartben commented Jun 22, 2023

This is very cool, thanks for the PR!

  • I have tested on Windows 11 and Ubuntu 22.04. I don't have a MacOS device available.

FWIW it works like a charm on macOS (M2) :)

Ya not sure either but I've had a similar error recently when tinkering with the sphinx_immaterial theme.

@michaeljones
Copy link
Collaborator Author

I've marked the PR as 'ready for review' as the tests that ran on the draft seemed to be passing.

Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: try to preserve bisectability with the changes

docleaf~=0.5.0
docleaf==0.6.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you squash original commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squashed.

doc/conf.py Outdated
Comment on lines 222 to 224
# Filters out any 'function' or 'variable' members that have 'all caps' names as
# they are likely unprocessed macro calls
docleaf_doxygen_skip = ["members:all_caps"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixup original commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1,6 +1,6 @@
# DOC: used to generate docs

docleaf==0.6.0
docleaf==0.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -85,7 +85,7 @@ jobs:
DOC_TARGET="html"
fi

DOC_TAG=${DOC_TAG} SPHINXOPTS="-q -W -t publish" make -C doc ${DOC_TARGET}
DOC_TAG=${DOC_TAG} SPHINXOPTS="-q -W --keep-going -t publish" make -C doc ${DOC_TARGET}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop? I wonder if we should actually default to this, I was not aware of this option!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the commit that removed this and amended the commit message to indicate your approval.

Comment on lines +19 to +21
.*Duplicate C declaration.*\n.*'\.\. c:member:: enum *bt_mesh_dfd_upload_phase bt_mesh_dfd_srv.phase'.*
.*Duplicate C declaration.*\n.*'\.\. c:member:: struct *bt_mesh_blob_xfer bt_mesh_dfu_cli.blob'.*
.*Duplicate C declaration.*\n.*'\.\. c:member:: struct *net_if *\* net_if_mcast_monitor.iface'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why didn't these fail in breathe?

Copy link
Collaborator Author

@michaeljones michaeljones Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It would also be sensible to ask how many of the other patterns in this file we still need.

Docleaf isn't a line-for-line translation of Breathe into Rust. It is a new attempt to solve the same problem in a similar fashion. If it was easier to follow the logic the Breathe code base then a from-scratch re-write might not have been so appealing. All this is to say, I don't know why these didn't fail for Breathe and I'm not exactly sure why they are required for Docleaf.

Would you like clear answers to this before progressing? I can see why that would be appealing and feel like the correct approach. It'll take a bit of digging around though.

Edit: I'll dig into them. It'll be good to understand them more. Maybe a system we were can feed the names in as some config and docleaf can flag where it sees them in the rendering process or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial impression is that the root of the issue at times seems to be that the Sphinx C-domain does not have good handling for when structs, functions and other entities have the same name. This is allowed in C and present a few times in the Zephyr code base. This leads to duplicate targets being generated and triggering warnings from the Sphinx C Domain code.

It is mentioned here: sphinx-doc/sphinx#8241 (comment) in an issue which goes on to reference this pull request: sphinx-doc/sphinx#8313 as the solution but that seems to have been open for a long time.

As the references are parsed and stored by the C Domain which then generates the warnings for clashes, I don't see there being a clear way to work around them other than the current regexes. The question then becomes the difference between Breathe and Docleaf and that comes down to how much of the Doxygen XML the two systems attempt to render. In various edge cases, the XML becomes harder to make clear sense of and so the two systems potentially draw the lines at different points in different places. Neither are perfect.

I will work through the various cases (as highlighted by deleting all the lines in known-warnings.txt file) and document why each appears to be happening and attempt to improve Docleaf where possible. Any due to name clashes are potentially going to have to stay but I can see at least one case where field name is clashing with a field in a nested structure and that might be possible to avoid.

Copy link
Collaborator Author

@michaeljones michaeljones Jul 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new commit which adds categories to the known-warnings.txt file attempting to indicate the source of the various issues. Some are due to entities sharing the same name and the Sphinx issues indicated before. Some are due to fields with the same name appearing in multiple anonymous structs. Some are due to Doxygen not indicating extern declarations in the XML output and so Docleaf and Breathe are unable to filter them out in any fashion. I have raised a ticket for this: doxygen/doxygen#10163 (Edit - and created a PR: doxygen/doxygen#10164)

It is possible to use the Doxygen @cond syntax to have it ignore parts of the file and so we could use that to explicit stop it processing the extern declarations but that would introduce more changes than the current known-warnings.txt patterns.

Finally the new lines are due to Docleaf attempting to render fields from anonymous structs which then clash with fields in the parent struct. I believe this didn't happen in Breathe as it doesn't attempt to render the members in some of these situations. For example, the breathe output for bt_mesh_dfd_srv is:

image

Whilst the docleaf output is:

image

With the actual code being:

struct bt_mesh_dfd_srv {
const struct bt_mesh_dfd_srv_cb *cb;
struct bt_mesh_model *mod;
struct bt_mesh_dfu_cli dfu;
struct bt_mesh_dfu_target targets[CONFIG_BT_MESH_DFD_SRV_TARGETS_MAX];
struct bt_mesh_blob_target_pull pull_ctxs[CONFIG_BT_MESH_DFD_SRV_TARGETS_MAX];
const struct bt_mesh_blob_io *io;
uint16_t target_cnt;
uint16_t slot_idx;
bool apply;
enum bt_mesh_dfd_phase phase;
struct bt_mesh_blob_cli_inputs inputs;
struct {
enum bt_mesh_dfd_upload_phase phase;
const struct bt_mesh_dfu_slot *slot;
const struct flash_area *area;
struct bt_mesh_blob_srv blob;
} upload;
};

Which shows that the anonymous struct members incorrectly appear in the body of the main struct. Unfortunately this is because the Doxygen xml output (structbt__mesh__dfd__srv.zip) doesn't distinguish between the two levels. It flattens everything. I'm not sure what is better or worse. Leaving out the details or having them poorly represented. I haven't yet looked to see if there is a doxygen issue for this but I fear it might involve changes to their XML structures to better support this and that seems like it is possibly unlikely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would welcome input as to what we might want to address or account for at this stage or if the current state is acceptable in some fashion for moving forwards.

Instead of breathe.

I'm not sure of the best version comparison to use. There is no
intention to break the API for docleaf but it is also still young
in some ways and there might be adjustments to the configuration values
that would need to be reflected in the version number as it is still
pre-1.0 that would only be a minor version change.

Signed-off-by: Michael Jones <[email protected]>
Where valid.

Signed-off-by: Michael Jones <[email protected]>
We make sure that we have the 'root' and 'xml' defined for the project
which is necessary at the moment for docleaf though the 'root' is only
really used for the 'linkcode' integration.

Signed-off-by: Michael Jones <[email protected]>
As that is not supported by docleaf. We get everything by default.

Signed-off-by: Michael Jones <[email protected]>
There are times when the log message is an exception object and that
causes the regex to fail with a TypeError. Here we work around that
by converting the input to a string.

Signed-off-by: Michael Jones <[email protected]>
So that we see all the potential errors in CI and not just the first
and can assess how much work there is to do.

This was introduced for debuggin but gmarull has suggested keeping it
in.

Signed-off-by: Michael Jones <[email protected]>
We should probably clear out the old warnings and see what still
applies for docleaf now that we have that instead of breathe.

However for this moment, these are new duplication declarations that
we're seeing errors for and so we'd like to silence them.

Signed-off-by: Michael Jones <[email protected]>
This allows us to identify any patterns that we can remove or that
we didn't realise are no longer in use. This might happen if issues
within doxygen or docleaf are resolved.

This allows us to remove the pattern:

  .*Duplicate C declaration.*\n.*'\.\. c:.*:: uint16_t id'.*

which does not match anything in the current set up.

We also split the filter patterns in known-warnings out into different
sections depending on their cause.

Also extend the pattern parser to ignore empty lines so that we can
have some formatting in the known-warnings file.

Signed-off-by: Michael Jones <[email protected]>
Copy link
Member

@gmarull gmarull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@fabiobaltieri fabiobaltieri merged commit 34e5003 into zephyrproject-rtos:main Jul 3, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @michaeljones!
Congratulations on getting your very first Zephyr pull request merged 🎉🥳. This is a fantastic achievement, and we're thrilled to have you as part of our community!

To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge.

Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants