-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
C domain, fix performance regression since 3.0.0 #12162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(I will have comments on that tomorrow, so I'll ask other maintainers not to merge this) |
Just waana say thanks for the great work! but I'll leave the review in the capable hands of @picnixz 😄 |
It's great to see a PR addressing this. If you give me a week I should be able to review it as well. I'm a bit concerned with the simplified lookup logic as there are quite some corner cases. |
I'll wait for Jakob's review because I am not that well-versed in the C/C++ domain. |
Signed-off-by: Donald Hunter <[email protected]>
@jakobandersen As I understand it, the code came from the C++ domain where lookup complexity is higher due to namespaces and classes. I have tried my best to verify the lookup logic required for C. I verified the before and after generated output is the same for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpicks.
Signed-off-by: Donald Hunter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately my gut feeling was right, there is some handling of duplicate symbols which changes in this PR.
Test case:
.. c:struct:: A
.. c:var:: int i
- :c:struct:`A`
- :c:var:`A.i`
- :c:var:`A.j`
.. c:struct:: A
.. c:var:: int j
- :c:struct:`A`
- :c:var:`A.i`
- :c:var:`A.j`
- :c:struct:`A`
- :c:var:`A.i`
- :c:var:`A.j`
When a duplicate symbol is detected, then the duplicate should not overwrite the original in the dictionaries. I think it is safe to throw the duplicate symbol away, as long as the old behaviour of setting the current scope to be the original symbol is kept.
By the way, I fixed so the symbol debugging now works again, so a rebase onto master might be nice. |
Sphinx 3.0.0 onwards have a significant performance regression from 2.4.4 for the C domain. The cProfile output shows that this is due to _find_named_symbols using a linear search. Here's the profile information for the Linux kernel 'make htmldocs': ncalls tottime percall cumtime percall filename:lineno(function) 770364892 165.102 0.000 165.102 0.000 sphinx/domains/c.py:153(__eq__) 104124 163.968 0.002 544.788 0.005 sphinx/domains/c.py:1731(_find_named_symbols) 543888397 123.767 0.000 176.685 0.000 sphinx/domains/c.py:1679(children_recurse_anon) 4292 74.081 0.017 74.081 0.017 {method 'poll' of 'select.poll' objects} 631233096 69.389 0.000 246.017 0.000 sphinx/domains/c.py:1746(candidates) 121406721/3359598 65.689 0.000 76.762 0.000 docutils/nodes.py:202(_fast_findall) 3477076 64.387 0.000 65.758 0.000 sphinx/util/nodes.py:633(_copy_except__document) 544032973 52.950 0.000 52.950 0.000 sphinx/domains/c.py:156(is_anon) 79012597/3430 36.395 0.000 36.395 0.011 sphinx/domains/c.py:1656(clear_doc) 286882978 31.271 0.000 31.279 0.000 {built-in method builtins.isinstance} Function was called by... ncalls tottime cumtime sphinx/domains/c.py:153(__eq__) <- 631153346 134.803 134.803 sphinx/domains/c.py:1731(_find_named_symbols) 154878 0.041 0.041 sphinx/domains/c.py:2085(find_identifier) 139056533 30.259 30.259 sphinx/domains/c.py:2116(direct_lookup) 135 0.000 0.000 sphinx/util/cfamily.py:89(__eq__) Replace Symbol._children with dicts, one keyed by ident and one keyed by docname. This lets us remove _find_named_symbols and replace these usage patterns with fast child lookup. Also use the _anonChildren list to speed up searching through anonymous children. Note that dict is guaranteed to maintain insertion order since Python 3.7 so we retain iteration. Whenever iteration is required, we use _childrenByName.values(). Before – with Sphinx 7.2.6: % time make htmldocs ... real 9m0.533s user 15m38.397s sys 1m0.907s After - with Sphinx 7.3.0+: % time make htmldocs ... real 4m27.063s user 10m54.985s sys 0m57.702s Signed-off-by: Donald Hunter <[email protected]>
Signed-off-by: Donald Hunter <[email protected]>
Signed-off-by: Donald Hunter <[email protected]>
Signed-off-by: Donald Hunter <[email protected]>
I think this is caused by
|
While investigating another issue, I have found something that the C domain does not account for. I was looking for the root cause of these false duplicate warnings we get from Sphinx 3.0.0 thru 7.6.2:
The root cause is e.g.
Question: Should I tackle that in this PR or in a separate PR? The reason I ask is because it will affect the implementation used in this PR. |
There is an old issue for this (#7819), and I have a branch that should fix it (https://github.com/jakobandersen/sphinx/tree/c_ids_tags). As soon as I get time I will rebase it and get it into shape. It has been a while, but if I remember correctly it was very difficult to not break intersphinx support without #8929. I'm currently updating it, so there can be progress. Edit: looks like I even made a PR for it: #8313. |
I tried the same quick fix, and indeed it broke some invariants elsewhere. I think your solution for now it's a good way to fix it. |
Signed-off-by: Donald Hunter <[email protected]>
d56d058
to
9327df6
Compare
Hi @jakobandersen, @picnixz is there anything left that I need to resolve? I can't spot any open issues in the thread above. It would be great to get this merged so please let me know if there is more needed from me. |
I personally am not well-versed enough in the C/C++ domain so I'll leave it to Jakob (would have been another question if it were the Python domain I'm most familiar with 😄) |
I hope to get to it in the coming weekend. |
Looks good to me. Apparently I can't merge it without either squashing or rebasing, is that intentional? |
I think our policy is to only to squash commits + write description if needed instead of rebasing + possible multiple commits (although it's possible to rebase but I don't think we've ever merged PRs by rebasing them). But we don't want to merge the user's commits since generally there are not.. like.. explicit enough (like mine !) |
Must be a "new" policy (in the last couple of years :-)). I see the point, but on the surface it doesn't seem like a good idea in general to destroy the history of changes made. |
Signed-off-by: Donald Hunter <[email protected]>
Signed-off-by: Donald Hunter <[email protected]>
For me it's the policy since I only arrived last year haha! But I can understand that you would like not to destroy the commits. But I think it should be fine to rebase? (and keep the commits?) @AA-Turner Any reason why we don't allow just using a commit message? (or can we use rebase?) |
@donaldh please could you add an entry to CHANGES for 7.4? |
# Conflicts: # sphinx/domains/c/_symbol.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, save the CHANGES entry.
A
Thank you @donaldh! A |
@donaldh -- would you consider making a similar PR for the C++ domain? I have had a stab at replicating your work in https://github.com/AA-Turner/sphinx/tree/cpp/lookup (AA-Turner@109ec9c), but my knowledge of C++ is limited. Importantly, the C++ domain uses A |
@AA-Turner I will try to find some time this week to take a look. |
Feature or Bugfix
Purpose
Sphinx 3.0.0 onwards have a significant performance regression from 2.4.4 for the C domain. The cProfile output shows that this is due to _find_named_symbols using a linear search. Here's the profile information for the Linux kernel 'make htmldocs':
Replace
Symbol._children
with dicts, one keyed byident
and one keyed bydocname
. This lets us remove_find_named_symbols
and replace these usage patterns with fast child lookup. Also use the_anonChildren
list to speed up searching through anonymous children.Note that dict is guaranteed to maintain insertion order since Python 3.7 so we retain iteration. Whenever iteration is required, we use _childrenByName.values().
Before – with Sphinx 7.2.6:
After - with Sphinx 7.3.0+: