Skip to content

rustdoc-json-types: Intern Types to deduplicate and flatten #142327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fluiderson
Copy link
Contributor

@fluiderson fluiderson commented Jun 11, 2025

The current format of rustdoc-types has a problem that potentially prevents it from being parsed by the majority of JSON parsers, including serde_json. Generally, they're implemented via recursion meaning that they'll hit their depth limit or invoke a stack overflow if a JSON blob has too deeply nested data. This creates issues like obi1kenobi/cargo-semver-checks#108. I don't think that the problem has anything to do with parsers' limitations. Instead, I consider it as a format design flaw. rustdoc-types implements Type as a recursive structure by using Boxes. I've removed them and added the types field to Crate that can be indexed akin to Crate::index. This eliminated the recursive property of Type making rustdoc-types parseable with much more deeply nested crates like diesel.

I've also added a test that demonstrates that rustdoc-types doesn't grow in depth anymore. Thanks to @weiznich for this snippet.

The only known to me problem with these changes is that the current testing approach for rustdoc-types ostensibly makes everything to stop you from using indexes from the same JSON blob it's querying. I've fixed all broken tests by just hardcoding type identifiers, though something tells me this isn't really deterministic, but I have no idea about other options, so I'll really appreciate any help here.

As for other improvements, due to hashing of types at the render stage, there are no more duplicated types in a resulted JSON blob, i.e. resulted JSON blobs are much smaller now. I've done a test with diesel by generating 2 JSON blobs with cargo rustdoc -Zunstable-options --lib --output-format=json -F extras,128-column-tables -p diesel -- --document-private-items. rustdoc 1.87 generated 231 MB, the PR's rustdoc generated 76 MB (3x smaller). This should drastically improve the speed of tools that use rustdoc-types.

@rustbot
Copy link
Collaborator

rustbot commented Jun 11, 2025

r? @aDotInTheVoid

rustbot has assigned @aDotInTheVoid.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Jun 11, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 11, 2025

rustdoc-json-types is a public (although nightly-only) API. If possible, consider changing src/librustdoc/json/conversions.rs; otherwise, make sure you bump the FORMAT_VERSION constant.

cc @CraftSpider, @aDotInTheVoid, @Enselic, @obi1kenobi

This PR modifies run-make tests.

cc @jieyouxu

These commits modify tests/rustdoc-json.
rustdoc-json is a public (but unstable) interface.

Please ensure that if you've changed the output:

  • It's intentional.
  • The FORMAT_VERSION in src/librustdoc-json-types is bumped if necessary.

cc @aDotInTheVoid, @obi1kenobi

@rust-log-analyzer

This comment has been minimized.

@nnethercote
Copy link
Contributor

Apparently rustdoc_json is the hot new optimization target: I just filed #142335. That PR doesn't touch rustdoc-json-types, however; its changes are all on the rustdoc side. There will be some conflicts between that PR and this PR (especially in conversions.rs) but I'm pretty sure they are superficial and both PRs will be able to merge.

@aDotInTheVoid aDotInTheVoid changed the title rustdoc-types deBoxification rustdoc-json-types: Intern Types to deduplicate and flatten Jun 11, 2025
@aDotInTheVoid aDotInTheVoid changed the title rustdoc-json-types: Intern Types to deduplicate and flatten rustdoc-json-types: Intern Types to deduplicate and flatten Jun 11, 2025
@aDotInTheVoid
Copy link
Member

The overall approach of interning Types seems like a good idea. The size win from de-duplicating types is nice. That said, I don't know how painful this would make things for consumers.


I've fixed all broken tests by just hardcoding type identifiers, though something tells me this isn't really deterministic, but I have no idea about other options, so I'll really appreciate any help here.

Yeah, this PR's current approach is fragile to any change to the order types are inserted to the types set, and seems like it'd be really painful to write and adjust tests.

If we could do something like $.types[$.index[?(@.docs="foo")].type] is jsonpath (or $.types[MY_VAR]), then the tests would be much more manageable. I think it makes sense to first work on improving the test infra so we can express the tests we want. Do you want to work on that?


Also, some advice: This would be alot easier to review if it was split into logically separate commits (i.e. all the tests still pass after each commit). In paticular, if the Id->ItemId rename was in its own commit that did just that, it'd be alot easier to review the rest of the changes in separate commits.

@fluiderson
Copy link
Contributor Author

fluiderson commented Jun 11, 2025

According to the JSONPath spec, there's no way to achieve dynamic indexing. We can try to implement some kind of extension like it's done in jsonpath_rust or add a new directive, but why was JSONPath chosen in the first place? It seems like a very inflexible query language that needs a ton of non-standard addons to be usable, at least in the rustdoc-types testsuit. Why not something like jq?

@aDotInTheVoid
Copy link
Member

aDotInTheVoid commented Jun 11, 2025

Based on the comments from when it was first added (#81063), it was because we used XPath for the HTML tests, and JSONPath was a similar thing to that but for json. I'd not be opposed to moving rustdoc-json tests to jq, especially given that it seems to natively support having variables.

@bors
Copy link
Collaborator

bors commented Jun 11, 2025

☔ The latest upstream changes (presumably #142358) made this pull request unmergeable. Please resolve the merge conflicts.

@CraftSpider
Copy link
Contributor

I can confirm that the choice of JSONPath was largely unprincipled - jq was mentioned, I believe, but I only know of jq as a command-line tool, and didn't want to deal with adding more external dependencies. Something that improves the tests further would definitely be nice.

@fluiderson
Copy link
Contributor Author

fluiderson commented Jun 12, 2025

Ok, I'll try to tackle the infra. @aDotInTheVoid, could you please provide up-to-date info about what needs to be done besides fixing the JSONPath constraints? I've found #94140, but it seems to be outdated in some parts. I think you can try to update it and assign me to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-run-make Area: port run-make Makefiles to rmake.rs A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants