-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Reduce duplicated FileUtilities.GetFileTimeStamp calls #78321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
In draft mode until: 1) Receive input from Jason on whether this approach is reasonable 2) Get results from speedometer run on test insertion indicating this indeed helps If those both happen, will update with more information
...lStudio/Core/Def/ProjectSystem/MetadataReferences/VisualStudioPortableExecutableReference.cs
Show resolved
Hide resolved
...ctSystem/MetadataReferences/VisualStudioPortableExecutableReference.FileTimeStampProvider.cs
Outdated
Show resolved
Hide resolved
...ctSystem/MetadataReferences/VisualStudioPortableExecutableReference.FileTimeStampProvider.cs
Show resolved
Hide resolved
...ctSystem/MetadataReferences/VisualStudioPortableExecutableReference.FileTimeStampProvider.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry a bit there's a potential here for things to get out of sync if we were to miss a file change notification after having a cached result, or that were to get delayed a bit. If that were to happen, what behavior would we expect? I'm also not sure if leaking the contexts would be a problem -- if we close the solution do we still have file locks on the directories?
I'm also quite surprised generally we're seeing that much expense just reading the time stamps a bunch of times. Is the runtime being inefficient here?
...lStudio/Core/Def/ProjectSystem/MetadataReferences/VisualStudioPortableExecutableReference.cs
Show resolved
Hide resolved
...lStudio/Core/Def/ProjectSystem/MetadataReferences/VisualStudioPortableExecutableReference.cs
Show resolved
Hide resolved
This I'm not 100% sure about. I was leaning on this comment for my assumption that getting a slightly outdated timestamp would recover once the workspace responded to the event. I can definitely try and dig a bit more if this worries you.
There weren't any locks taken on directories, we were just advising to directory change events. That said, it's better to go ahead unadvise the contexts on solution close, so I went ahead and added that.
I don't know if inefficient is how I would describe it. As mentioned, there were > 64000 File.GetLastWriteTimeUtc calls being made, so I wouldn't say we were really making their job easy. |
2) Cleanup a using
I set a breakpoint in GetTimeStamp after the cache miss and manually modified the last write time of a referenced assembly. Without doing anything in the editor, the breakpoint was hit, making me feel fairly good about the invalidation mechanism here. FYI, I'm going to run a test insertion again to get numbers as there have been a couple changes I'd like to see measured Test insertion: https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/633179 Numbers came back good again. Updated the PR description with the new results. |
@jasonmalinowski -- This is one of the more impactful perf changes I have outstanding, would love to get your thoughts on this. |
private readonly Dictionary<string, DateTime> _cachedTimestamps = []; | ||
private readonly Dictionary<string, IFileChangeContext> _fileChangeContexts = []; | ||
|
||
private FileTimeStampProvider(Workspace workspace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was concerned that unless this was the real VisualStudioWorkspace that we might be listening to a different workspace's events. I see that's the one yuo're actually importing, but maybe it's best to keep that strongly typed here too to ensure we don't refactor it and grab something else?
|
||
private static FileTimeStampProvider GetInstance(Workspace workspace) | ||
{ | ||
return s_instance ??= new FileTimeStampProvider(workspace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically a race in that two could create this at once. It's probably fine, but would mean we end up with two subscriptions to the workspace which is kinda strange.
{ | ||
DateTime timestamp; | ||
|
||
lock (_cachedTimestamps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine this might be high-traffic enough to be worth doing a ReaderWriterLockSlim here. Did we see any contenetion on this path during load though?
_ = _fileChangeContexts.GetOrAdd(directory, static (directory, arg) => | ||
{ | ||
var (watcher, self) = arg; | ||
var context = watcher.CreateContext([new WatchedDirectory(directory, extensionFilters: [])]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a useful optimization, it might be good to watch *.dll and then explicitly watch the file we're caching for. That way we're not getting file notification traffic for things we are unlikely to be watching.
_ = _fileChangeContexts.GetOrAdd(directory, static (directory, arg) => | ||
{ | ||
var (watcher, self) = arg; | ||
var context = watcher.CreateContext([new WatchedDirectory(directory, extensionFilters: [])]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This directory watch here is going to be recursive, so if we have DLLs at different directories we might already have overlap there.
|
||
lock (_cachedTimestamps) | ||
{ | ||
// Ensure we are listening for changes to this directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have watchers being created for each DLL we're consuming in the FileWatchedReferenceFactory; can we move this cache so those can just clear this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since that's already reference counted too, you could remove things from the cache (and avoid caching them) unless somebody already has a watch on it.
var directory = Path.GetDirectoryName(fullPath); | ||
timestamp = FileUtilities.GetFileTimeStamp(fullPath); | ||
|
||
lock (_cachedTimestamps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race here where if there is a file write between the read of the time stamp and taking the lock that things could get out of order:
- Thread A reads the older timestamp. Now mentally pause this thread.
- File is touched.
- Thread B reads the newer timestamp. Takes lock, updates the map.
- Mentally unpause Thread A. Writes the old timestamp.
You'll have to think if this matters or not.
...lStudio/Core/Def/ProjectSystem/MetadataReferences/VisualStudioPortableExecutableReference.cs
Show resolved
Hide resolved
@ToddGrun My concern about missed events is since we closed #56023 without fixing it, if we had a situation where the user is doing a build and lots of writes are happening to DLLs on disk -- do we have a situation where we drop a timestamp update and the timestamp is now cached stale until they unload/reload their solution? I'm also wondering if rather than caching the timestamps it just makes sense to make sense to just have VisualStudioMetadataReferenceManager._metadataCache be cleared directly when the file changes and just dispense of the file timestamps all together. I think the original model here was so that we absolutely could immediately respond to a DLL being changed on disk. If it's the case that we are now expecting a file watch to be successfully fired before we'll do it, then we can just be a bit more direct with the intent here. |
Inspection of CPU samples during solution load in the c# editing speedometer tests shows over 3 seconds of CPU time spent in File.GetLastWriteTime. Manual debugging showed of opening Roslyn.sln showed 64,195 calls to File.GetLastWriteTime, but only 1,511 unique paths requested. The vast majoriy of the calls originate from SerializerService.CreateChecksum's call to VisualStudioPortableExecutableReference.GetMetadataImpl's access to the lazy _timestamp.
This PR introduces a class to cache the last write time of files, invalidating itself via IFileChangeWatcher notifications (or solution close). The speedometer numbers from the test insertion came back with good results, with the 3 CPU seconds reduced to just about 150 milliseconds.
Test insertion (commit 7): https://dev.azure.com/devdiv/DevDiv/_git/VS/pullrequest/633179
*** Before ***

*** After ***
