Skip to content

Provide better incrementality when items are changed #19837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ChayimFriedman2
Copy link
Contributor

@ChayimFriedman2 ChayimFriedman2 commented May 21, 2025

Prior to this PR, adding a struct, enum, or impl would invalidate all trait solver queries in almost every circumstance. Trait solving, perhaps unexpectedly, is the most expensive operation in rust-analyzer. This would mean that adding a struct/enum/impl will cause features like semantic highlighting, inlay hints, or diagnostics to be fully recomputed, and therefore, make the editing experience feel sluggish. How sluggish depended on the size and complexity of the project in question. With this PR, the sluggishness should be reduced.


This pull request is divided into three parts, nicely separated into commits (reviewing commit-by-commit is recommended):

  1. Stabilize ast ids. Currently, adding or removing an item invalidates the ast id of everything below it in the same file. This has severe consequences, are not few things store an AstId. The first commit fixes that. By hashing the important bits of the item and putting that in the AstId, we ensure that only if e.g. the name of the item changes its ast id will be invalidated.
  2. Use ast id instead of item tree id in ids: today, our IDs (FunctionId etc.) are defined using item tree ids. Before this PR, item tree ids were slightly better for incrementality than ast ids, because they group by item kind and therefore changes to an item of different kind won't invalidate this item. But after the first commit, ast ids are pretty much stable while item tree ids remain unstable. That means that, e.g., adding a struct invalidates all following struct, which has even more severe consequences than those of ast ids (for example, it invalidates all impls that refer to the invalidated structs, and that in turn invalidates all trait solving queries). To fix that, the second commit switches the id types to use ast ids. The most important consequence, and the reason this is an invasive change, is that you no longer have a way to obtain the item tree data from an id (only its ast id, which can bring you to the syntax tree). This effectively makes the item tree exclusive for the def map. Since a lot of things used to retrieve their data partially or fully from the item tree, this is quite a massive change. All of these (e.g. the signature queries) are changed to retrieve their data directly from the ast.
  3. After the second commit, the item tree now contains a non-trivial amount of things that are unused. This is because they are not needed for the def map, and other things don't use the item tree anymore. This commit removes them. This includes fields, enum variants, and more.

Unfortunately, this introduces a very non-trivial regression: memory usage regress in 400 megabytes (!!), from 2392mb to 2810mb. Speed also regress in eight seconds on my machine, 91.172s -> 99.477s.

Let's start with the speed regression: unfortunately, I believe it is inherent to the change, and we can't do much to improve or eliminate it. Fortunately, the regression isn't that large, and the gain in incrementalism should justify it, given that IDEs derive their power from incrementality and laziness, not raw speed.

The memory regressions are a different story. I don't think we can tolerate a 400mb regression (although, to be honest, if the choice was either that or this PR - I'm not at all sure I would prefer the memory). Fortunately, I believe a large part of it is solvable. I didn't check (I can if needed), but my believe is that there are few factors to the regression (a) attributes are now duplicated in the item tree and the attrs() query (they don't share an Arc anymore), and (b) ErasedFileAstId grew fourfold, and it is used in spans, which are already our most memory-heavy creatures. The attributes duplication I have nothing to do about, but fortunately, I believe the majority is spans, and here I do have a hope. I have an idea (maybe I really should come back to that...) how to shrink spans considerably, and furthermore, making stuff like the ast id to not be carried by every span.

Even after all memory improvements, some regression will stay because it's inherent to this approach. I believe we need to decide - what is more important to us, incrementality or memory usage? I will point towards incrementality, given that the lack of it can make rust-analyzer barely usable on large projects, while memory is cheap.

Closes #19829.

Closes #19821.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 21, 2025
@ChayimFriedman2
Copy link
Contributor Author

ChayimFriedman2 commented May 21, 2025

I forgot we can LRU the ast id map (which I planned to), this reduces the memory regression to 360mb (2750mb).

Instead of simple numbering, we hash important bits, like the name of the item.

This will allow for much better incrementality, e.g. when you add an item. Currently, this invalidates the IDs of all following items, which invalidates pretty much everything.

Unfortunately, this also increases the size of AST IDs fourfold.
Item tree IDs are very unstable (adding an item of a kind invalidates all following items of the same kind). Instead use ast ids, which, since the previous commit, are pretty stable.
I'm joking, but now that the def map is the only thing that uses the item tree, we can remove a lot of things from it that aren't needed for the def map.
We can do that and it's pretty heavy.
@ChayimFriedman2
Copy link
Contributor Author

I was able to get the memory regression down to 45mb by using a u16 hash. I think this is very acceptable and this PR is ready.

@ChayimFriedman2
Copy link
Contributor Author

The previous run was stuck for 45 minutes, I cancelled it and reran and it completed fast... I smell a race condition. Not good. But have no idea how to debug.

Copy link
Member

@Veykril Veykril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(review for the first commit only)

pub const ROOT_ERASED_FILE_AST_ID: ErasedFileAstId =
ErasedFileAstId(pack_hash_index_and_kind(0, 0, ErasedFileAstIdKind::Root as u32));

/// FileId used as the span for syntax node fixups. Any Span containing this file id is to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// FileId used as the span for syntax node fixups. Any Span containing this file id is to be
/// ErasedFileAstId used as the span for syntax node fixups. Any Span containing this file id is to be

Comment on lines +73 to +75
if srv.version >= HASHED_AST_ID {
// We don't enable spans on versions prior to `HASHED_AST_ID`, because their ast id layout
// is different.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are bumping the protocol in an incompatible-ish way again anyways, we might wanna consider informign the server about the span that is considered fake. That way we have leeway in changing its representation later on if need be? (i always regretted not making that part of the protocol but instead just having it hardcoded ...)

That is FIXUP_ERASED_FILE_AST_ID_MARKER

/// Associated with [`BlockExprFileAstId`].
BlockExpr,
/// Keep this last.
Fixup,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is kept last, that makes my previous comment even more important, as adding a new variant will change the discriminant for this variant

const HASH_BITS: u32 = 16;
const INDEX_BITS: u32 = 11;
const KIND_BITS: u32 = 5;
const _: () = assert!(ErasedFileAstIdKind::Fixup as u32 <= ((1 << KIND_BITS) - 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add another assert that checks the 3 constants above add up to our backing types size in bits (u32)


#[inline]
const fn u16_hash(hash: u64) -> u16 {
// We do basically the same as `FxHasher`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why aren't we using rustc-hash here and just truncating the result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sometimes makes sense to shuffle the bits a little instead of truncating, especially when the higher ones have more entropy (but rustc-hash rotates at the end exactly for that).

But I'm curious about the K, why 0x72ed and not 0xecc5? (ref).

/// Reverse: map ptr to id.
map: hashbrown::HashTable<Idx<SyntaxNodePtr>>,
/// An arena of the ptrs and their associated ID.
arena: Arena<(SyntaxNodePtr, ErasedFileAstId)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are using the arena properties here anymore, so we can just use a vec instead here? (or event better, Box<[]>?


// This contains the stack of the `BlockExpr`s we are under. We do this
// so we only allocate `BlockExpr`s if they contain items.
// The general idea is: when we enter a block we push `(block, false)` here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but can we use a binary enum here instead of the bool just for clarity? ContainsItems::Yes | No

}

/// The [`AstId`] of the root node
pub fn root(&self) -> SyntaxNodePtr {
self.arena[Idx::from_raw(RawIdx::from_u32(0))]
// FIXME: Seems redundant to store the root.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for storing it was simpler code overall (no special casing needed for the root)

// After all, the block will then contain the *outer* item, so we allocate
// an ID for it anyway.
let mut blocks = Vec::new();
let mut curr_layer = vec![(node.clone(), None)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we undo the inlining of bdfs? Imo this hurts readability of the code quite a bit

Comment on lines +589 to +592
if let Some((last_block_node, already_allocated)) =
blocks.last_mut()
{
if !*already_allocated {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some((last_block_node, already_allocated)) =
blocks.last_mut()
{
if !*already_allocated {
if let Some((last_block_node, already_allocated @ false)) =
blocks.last_mut()
{

This might look worse depending on rustfmt so feel free to ignore (also binding modes might not work in favor here so this suggestion could be wrong, not entirely sure right now)

@Veykril
Copy link
Member

Veykril commented May 22, 2025

I am curious about the slow down here, I am a but surprised by that (though I have yet to look at the other commits).

Regarding memory usage (for one the PR description needs updating :^): What difference does the LRU addition now make? Could you check (if its not too much work) how things change if we used a single u64 encoding instead of a u32 (doubling the size)?

Imo we shouldn't LRU this query, it is a very frequently used one, and so the LRU bookkeeping probably does add some overhead. I also think it will hurt incrementality, as we might discard the result, the parse tree changes in a way that wouldnt effect the ast id map but now we need to recompute it and can't backdate it as there is no previous result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider hashing nodes in AstIdMap for more stable IDs
4 participants