-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[Format] Do not crash on non-null terminated strings #131299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2382,14 +2382,20 @@ size_t SourceManager::getDataStructureSizes() const { | |
|
||
SourceManagerForFile::SourceManagerForFile(StringRef FileName, | ||
StringRef Content) { | ||
// We copy to `std::string` for Context instead of StringRef because the | ||
// SourceManager::getBufferData() works only with null-terminated buffers. | ||
// And we still want to keep the API convenient. | ||
ContentBuffer = Content.str(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a potentially huge copy (consider use cases like unity builds), which makes me pretty uncomfortable. Have you measured the impact of this change on large source files? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking through the uses of
I would not worry about tests for obvious reasons. Wrt to clang-format, I am also fairly confident that any large file will take a lot more time and memory to process and this copy will not be noticeable. I will get some large mock file and run some benchmarks for sure and get back to you. I am not sure in which context unity builds would be a useful use-case to support for either clang-format or Clangd, do you have any use-cases in mind that I am missing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried benchmarking clang-format on it and realized the clang-format binary is also using a different function here and this change makes no difference to it at all. So we're left only with clangd and various other source tools that use clang-format as the final step of source code transformations, as well as downstream uses of the If you could give an example use-case that you are worried about, I can spend some effort benchmarking it. But I personally feel changing this particular API to something that does an extra copy is a very low risk change and fits with the purpose of the API (i.e. convenience costs some performance, so being a little slow is preferable to crashing) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My IDE (QtCreator) does uses libformat regularly while typing. I would also prefer not to perform copies all over the place. How about storing a StringRef and assign There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But we cannot check if I think the easiest way to avoid copies would be to switch to
But what are the actual costs of this particular copy for the overall performance of the IDE? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Noted.
I mainly think about my battery, when I code while being mobile. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But this extra copy is likely a really-really small portion of the reformatting cost. Performance and battery go together here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any particular direction folks want to take this? Having an API that breaks on non-null-terminated I don't expect this to give any noticeable performance regressions, and I don't have a clear sense of whether the folks in the comment threads agree or disagree. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like it, but it's better than potentially crashing software. But I'm not comfortable enough to approve something outside libFormat or clang-format. |
||
|
||
// This is referenced by `FileMgr` and will be released by `FileMgr` when it | ||
// is deleted. | ||
IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> InMemoryFileSystem( | ||
new llvm::vfs::InMemoryFileSystem); | ||
|
||
InMemoryFileSystem->addFile( | ||
FileName, 0, | ||
llvm::MemoryBuffer::getMemBuffer(Content, FileName, | ||
/*RequiresNullTerminator=*/false)); | ||
llvm::MemoryBuffer::getMemBuffer(ContentBuffer, FileName, | ||
/*RequiresNullTerminator=*/true)); | ||
// This is passed to `SM` as reference, so the pointer has to be referenced | ||
// in `Environment` so that `FileMgr` can out-live this function scope. | ||
FileMgr = | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not go after the comment.
Or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileManager and SourceManager will be referencing the contents of this buffer, so I think it's best to destroy them before this
string
.Hence, the order is important, but maybe I should clarify the comment and mention the
Buffer
explicitly?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be useful.