[Bug][CPU Backend]: Improve L2 cache size detection and usage on aarch64#30553
[Bug][CPU Backend]: Improve L2 cache size detection and usage on aarch64#30553Radu2k wants to merge 1 commit intovllm-project:mainfrom
Conversation
- Add /sys/devices/system/cpu/cpu0/cache/index2/size parsing for L2 cache on aarch64 - Fallback to 1MB with warning if the sysfs file cannot be opened - Use 70% of detected L2 cache for attention scheduling to leave enough room for sys processes Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request enhances L2 cache size detection on aarch64 by reading from sysfs, which is a valuable improvement. The fallback mechanism is also a good addition for robustness. However, the implementation for parsing the cache size from the sysfs file contains critical flaws that could lead to application crashes or incorrect cache size calculations. I have provided a detailed comment with a more robust implementation to address these issues.
| // Parse size based on suffix | ||
| char suffix = size_str.back(); | ||
| size_str.pop_back(); | ||
| long l2_cache_size = std::stol(size_str); | ||
| switch (suffix) { | ||
| case 'K': | ||
| case 'k': | ||
| l2_cache_size *= 1024; | ||
| break; | ||
| case 'M': | ||
| case 'm': | ||
| l2_cache_size *= 1024 * 1024; | ||
| break; | ||
| default: | ||
| break; | ||
| } |
There was a problem hiding this comment.
The current parsing logic for the L2 cache size is fragile and can lead to incorrect behavior or crashes for several reasons:
- Undefined Behavior: If
size_stris empty (e.g., empty sysfs file),size_str.back()results in undefined behavior, which can crash the application. - Unhandled Exceptions:
std::stolcan throwstd::invalid_argumentorstd::out_of_rangeif the string cannot be parsed or is out of range. These exceptions are not handled, which will terminate the program. - Incorrect Parsing: The logic assumes a suffix is always present. If the file contains just a number (e.g., "2048"), it will be mis-parsed as "204" with a suffix of '8', leading to a wildly incorrect cache size. According to kernel documentation, a value without a suffix should be interpreted as KiB.
A more robust implementation is suggested below to address these critical issues.
// Parse size based on suffix
long l2_cache_size;
try {
size_t pos = 0;
l2_cache_size = std::stol(size_str, &pos);
while (pos < size_str.length() && std::isspace(size_str[pos])) {
pos++;
}
if (pos < size_str.length()) {
char suffix = std::toupper(size_str[pos]);
if (suffix == 'K') {
l2_cache_size *= 1024;
} else if (suffix == 'M') {
l2_cache_size *= 1024 * 1024;
}
} else {
// No suffix, kernel documentation says the value is in KiB.
l2_cache_size *= 1024;
}
} catch (const std::exception& e) {
std::printf(
"Failed to parse L2 cache size from string '%s', setting l2_cache_size=1MB. "
"This might have an impact on performance.", size_str.c_str());
return static_cast<long>(1024 * 1024);
}| } | ||
| // Fallback if sysctlbyname fails | ||
| return 128LL * 1024 >> 1; // use 50% of 128KB | ||
| #elif defined(__aarch64__) |
There was a problem hiding this comment.
Nice, I like the logic!
Although, for reusability, can we use something like pytorch/cpuinfo to get the data? Unless it's not possible?
There was a problem hiding this comment.
+1 to using pytorch/cpuinfo, great suggestion @aditew01
| std::ifstream l2_cache_file( | ||
| "/sys/devices/system/cpu/cpu0/cache/index2/size"); | ||
| if (!l2_cache_file.is_open()) { | ||
| std::printf( |
| char suffix = size_str.back(); | ||
| size_str.pop_back(); | ||
| long l2_cache_size = std::stol(size_str); | ||
| switch (suffix) { |
There was a problem hiding this comment.
Could we explain what these mean?
There was a problem hiding this comment.
Keeping in mind that the usual format is <value><unit(K/M)> we get the unit in suffix, pop it form the back, and then convert the string(value) to a long type.
|
Great catch and fix! Thank you |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Closing this as it will be treated in cpuinfo repo. |
|
@Radu2k thanks for this. Are you planning to raise a similar PR to https://github.com/pytorch/cpuinfo? If not I can give it a shot later this week. |
|
Hi @Rohanjames1997, I have this on my todo list currently but quite busy atm. Would be great to see a fix for this as soon as possible so please feel free to go ahead if you have availability. You can cc me on the PR and I am glad to help review it and test it across platforms. Thanks as well. |
|
Thanks @Radu2k I've raised one here pytorch/cpuinfo#372 |
|
Great! Will review. |
Purpose
Test Plan
l2_cache_sizegets the right value.Test Result
l2_cache_sizeis set to 204810240.7 bytes on c8g instance type which is the correct value give that it has 2MB of L2 cache per core and we want to use 70%.Solves: #30487
cc: @fadara01 @aditew01