-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Provide better incrementality when items are changed #19837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9444cba
to
2ac8a9c
Compare
I forgot we can LRU the ast id map (which I planned to), this reduces the memory regression to 360mb (2750mb). |
Instead of simple numbering, we hash important bits, like the name of the item. This will allow for much better incrementality, e.g. when you add an item. Currently, this invalidates the IDs of all following items, which invalidates pretty much everything. Unfortunately, this also increases the size of AST IDs fourfold.
Item tree IDs are very unstable (adding an item of a kind invalidates all following items of the same kind). Instead use ast ids, which, since the previous commit, are pretty stable.
I'm joking, but now that the def map is the only thing that uses the item tree, we can remove a lot of things from it that aren't needed for the def map.
We can do that and it's pretty heavy.
14441b7
to
9138ab4
Compare
I was able to get the memory regression down to 45mb by using a u16 hash. I think this is very acceptable and this PR is ready. |
The previous run was stuck for 45 minutes, I cancelled it and reran and it completed fast... I smell a race condition. Not good. But have no idea how to debug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(review for the first commit only)
pub const ROOT_ERASED_FILE_AST_ID: ErasedFileAstId = | ||
ErasedFileAstId(pack_hash_index_and_kind(0, 0, ErasedFileAstIdKind::Root as u32)); | ||
|
||
/// FileId used as the span for syntax node fixups. Any Span containing this file id is to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// FileId used as the span for syntax node fixups. Any Span containing this file id is to be | |
/// ErasedFileAstId used as the span for syntax node fixups. Any Span containing this file id is to be |
if srv.version >= HASHED_AST_ID { | ||
// We don't enable spans on versions prior to `HASHED_AST_ID`, because their ast id layout | ||
// is different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we are bumping the protocol in an incompatible-ish way again anyways, we might wanna consider informign the server about the span that is considered fake. That way we have leeway in changing its representation later on if need be? (i always regretted not making that part of the protocol but instead just having it hardcoded ...)
That is FIXUP_ERASED_FILE_AST_ID_MARKER
/// Associated with [`BlockExprFileAstId`]. | ||
BlockExpr, | ||
/// Keep this last. | ||
Fixup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is kept last, that makes my previous comment even more important, as adding a new variant will change the discriminant for this variant
const HASH_BITS: u32 = 16; | ||
const INDEX_BITS: u32 = 11; | ||
const KIND_BITS: u32 = 5; | ||
const _: () = assert!(ErasedFileAstIdKind::Fixup as u32 <= ((1 << KIND_BITS) - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add another assert that checks the 3 constants above add up to our backing types size in bits (u32)
|
||
#[inline] | ||
const fn u16_hash(hash: u64) -> u16 { | ||
// We do basically the same as `FxHasher`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why aren't we using rustc-hash here and just truncating the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sometimes makes sense to shuffle the bits a little instead of truncating, especially when the higher ones have more entropy (but rustc-hash
rotates at the end exactly for that).
But I'm curious about the K
, why 0x72ed
and not 0xecc5
? (ref).
/// Reverse: map ptr to id. | ||
map: hashbrown::HashTable<Idx<SyntaxNodePtr>>, | ||
/// An arena of the ptrs and their associated ID. | ||
arena: Arena<(SyntaxNodePtr, ErasedFileAstId)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are using the arena properties here anymore, so we can just use a vec instead here? (or event better, Box<[]>
?
|
||
// This contains the stack of the `BlockExpr`s we are under. We do this | ||
// so we only allocate `BlockExpr`s if they contain items. | ||
// The general idea is: when we enter a block we push `(block, false)` here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but can we use a binary enum here instead of the bool just for clarity? ContainsItems::Yes | No
} | ||
|
||
/// The [`AstId`] of the root node | ||
pub fn root(&self) -> SyntaxNodePtr { | ||
self.arena[Idx::from_raw(RawIdx::from_u32(0))] | ||
// FIXME: Seems redundant to store the root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for storing it was simpler code overall (no special casing needed for the root)
// After all, the block will then contain the *outer* item, so we allocate | ||
// an ID for it anyway. | ||
let mut blocks = Vec::new(); | ||
let mut curr_layer = vec![(node.clone(), None)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we undo the inlining of bdfs
? Imo this hurts readability of the code quite a bit
if let Some((last_block_node, already_allocated)) = | ||
blocks.last_mut() | ||
{ | ||
if !*already_allocated { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some((last_block_node, already_allocated)) = | |
blocks.last_mut() | |
{ | |
if !*already_allocated { | |
if let Some((last_block_node, already_allocated @ false)) = | |
blocks.last_mut() | |
{ |
This might look worse depending on rustfmt so feel free to ignore (also binding modes might not work in favor here so this suggestion could be wrong, not entirely sure right now)
I am curious about the slow down here, I am a but surprised by that (though I have yet to look at the other commits). Regarding memory usage (for one the PR description needs updating :^): What difference does the LRU addition now make? Could you check (if its not too much work) how things change if we used a single u64 encoding instead of a u32 (doubling the size)? Imo we shouldn't LRU this query, it is a very frequently used one, and so the LRU bookkeeping probably does add some overhead. I also think it will hurt incrementality, as we might discard the result, the parse tree changes in a way that wouldnt effect the ast id map but now we need to recompute it and can't backdate it as there is no previous result. |
Prior to this PR, adding a struct, enum, or impl would invalidate all trait solver queries in almost every circumstance. Trait solving, perhaps unexpectedly, is the most expensive operation in rust-analyzer. This would mean that adding a struct/enum/impl will cause features like semantic highlighting, inlay hints, or diagnostics to be fully recomputed, and therefore, make the editing experience feel sluggish. How sluggish depended on the size and complexity of the project in question. With this PR, the sluggishness should be reduced.
This pull request is divided into three parts, nicely separated into commits (reviewing commit-by-commit is recommended):
AstId
. The first commit fixes that. By hashing the important bits of the item and putting that in theAstId
, we ensure that only if e.g. the name of the item changes its ast id will be invalidated.FunctionId
etc.) are defined using item tree ids. Before this PR, item tree ids were slightly better for incrementality than ast ids, because they group by item kind and therefore changes to an item of different kind won't invalidate this item. But after the first commit, ast ids are pretty much stable while item tree ids remain unstable. That means that, e.g., adding a struct invalidates all following struct, which has even more severe consequences than those of ast ids (for example, it invalidates all impls that refer to the invalidated structs, and that in turn invalidates all trait solving queries). To fix that, the second commit switches the id types to use ast ids. The most important consequence, and the reason this is an invasive change, is that you no longer have a way to obtain the item tree data from an id (only its ast id, which can bring you to the syntax tree). This effectively makes the item tree exclusive for the def map. Since a lot of things used to retrieve their data partially or fully from the item tree, this is quite a massive change. All of these (e.g. the signature queries) are changed to retrieve their data directly from the ast.Unfortunately, this introduces a very non-trivial regression: memory usage regress in 400 megabytes (!!), from 2392mb to 2810mb. Speed also regress in eight seconds on my machine, 91.172s -> 99.477s.
Let's start with the speed regression: unfortunately, I believe it is inherent to the change, and we can't do much to improve or eliminate it. Fortunately, the regression isn't that large, and the gain in incrementalism should justify it, given that IDEs derive their power from incrementality and laziness, not raw speed.
The memory regressions are a different story. I don't think we can tolerate a 400mb regression (although, to be honest, if the choice was either that or this PR - I'm not at all sure I would prefer the memory). Fortunately, I believe a large part of it is solvable. I didn't check (I can if needed), but my believe is that there are few factors to the regression (a) attributes are now duplicated in the item tree and the
attrs()
query (they don't share anArc
anymore), and (b)ErasedFileAstId
grew fourfold, and it is used in spans, which are already our most memory-heavy creatures. The attributes duplication I have nothing to do about, but fortunately, I believe the majority is spans, and here I do have a hope. I have an idea (maybe I really should come back to that...) how to shrink spans considerably, and furthermore, making stuff like the ast id to not be carried by every span.Even after all memory improvements, some regression will stay because it's inherent to this approach. I believe we need to decide - what is more important to us, incrementality or memory usage? I will point towards incrementality, given that the lack of it can make rust-analyzer barely usable on large projects, while memory is cheap.
Closes #19829.
Closes #19821.