Skip to content

[Bug][CPU Backend]: Improve L2 cache size detection and usage on aarch64#30553

Closed
Radu2k wants to merge 1 commit intovllm-project:mainfrom
Radu2k:fix_l2_arm_attn
Closed

[Bug][CPU Backend]: Improve L2 cache size detection and usage on aarch64#30553
Radu2k wants to merge 1 commit intovllm-project:mainfrom
Radu2k:fix_l2_arm_attn

Conversation

@Radu2k
Copy link
Contributor

@Radu2k Radu2k commented Dec 12, 2025

Purpose

  • Add /sys/devices/system/cpu/cpu0/cache/index2/size parsing for L2 cache on aarch64
  • Fallback to 1MB with warning if the sysfs file cannot be opened
  • Use 70% of detected L2 cache for attention scheduling to leave enough room for sys processes

Test Plan

l2_cache_size gets the right value.

Test Result

l2_cache_size is set to 204810240.7 bytes on c8g instance type which is the correct value give that it has 2MB of L2 cache per core and we want to use 70%.

Solves: #30487

cc: @fadara01 @aditew01

- Add /sys/devices/system/cpu/cpu0/cache/index2/size parsing for L2 cache on aarch64
- Fallback to 1MB with warning if the sysfs file cannot be opened
- Use 70% of detected L2 cache for attention scheduling to leave enough room for sys processes

Signed-off-by: Radu Salavat <radu.salavat@arm.com>
@Radu2k Radu2k requested a review from bigPYJ1151 as a code owner December 12, 2025 13:40
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances L2 cache size detection on aarch64 by reading from sysfs, which is a valuable improvement. The fallback mechanism is also a good addition for robustness. However, the implementation for parsing the cache size from the sysfs file contains critical flaws that could lead to application crashes or incorrect cache size calculations. I have provided a detailed comment with a more robust implementation to address these issues.

Comment on lines +778 to +793
// Parse size based on suffix
char suffix = size_str.back();
size_str.pop_back();
long l2_cache_size = std::stol(size_str);
switch (suffix) {
case 'K':
case 'k':
l2_cache_size *= 1024;
break;
case 'M':
case 'm':
l2_cache_size *= 1024 * 1024;
break;
default:
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current parsing logic for the L2 cache size is fragile and can lead to incorrect behavior or crashes for several reasons:

  1. Undefined Behavior: If size_str is empty (e.g., empty sysfs file), size_str.back() results in undefined behavior, which can crash the application.
  2. Unhandled Exceptions: std::stol can throw std::invalid_argument or std::out_of_range if the string cannot be parsed or is out of range. These exceptions are not handled, which will terminate the program.
  3. Incorrect Parsing: The logic assumes a suffix is always present. If the file contains just a number (e.g., "2048"), it will be mis-parsed as "204" with a suffix of '8', leading to a wildly incorrect cache size. According to kernel documentation, a value without a suffix should be interpreted as KiB.

A more robust implementation is suggested below to address these critical issues.

      // Parse size based on suffix
      long l2_cache_size;
      try {
        size_t pos = 0;
        l2_cache_size = std::stol(size_str, &pos);
        while (pos < size_str.length() && std::isspace(size_str[pos])) {
          pos++;
        }
        if (pos < size_str.length()) {
          char suffix = std::toupper(size_str[pos]);
          if (suffix == 'K') {
            l2_cache_size *= 1024;
          } else if (suffix == 'M') {
            l2_cache_size *= 1024 * 1024;
          }
        } else {
          // No suffix, kernel documentation says the value is in KiB.
          l2_cache_size *= 1024;
        }
      } catch (const std::exception& e) {
        std::printf(
            "Failed to parse L2 cache size from string '%s', setting l2_cache_size=1MB. "
            "This might have an impact on performance.", size_str.c_str());
        return static_cast<long>(1024 * 1024);
      }

}
// Fallback if sysctlbyname fails
return 128LL * 1024 >> 1; // use 50% of 128KB
#elif defined(__aarch64__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like the logic!
Although, for reusability, can we use something like pytorch/cpuinfo to get the data? Unless it's not possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using pytorch/cpuinfo, great suggestion @aditew01

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, good idea.

std::ifstream l2_cache_file(
"/sys/devices/system/cpu/cpu0/cache/index2/size");
if (!l2_cache_file.is_open()) {
std::printf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use TORCH_WARN

char suffix = size_str.back();
size_str.pop_back();
long l2_cache_size = std::stol(size_str);
switch (suffix) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we explain what these mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping in mind that the usual format is <value><unit(K/M)> we get the unit in suffix, pop it form the back, and then convert the string(value) to a long type.

@fadara01
Copy link
Contributor

Great catch and fix!
I'll test performance on my side as well

Thank you

@mergify mergify bot added the cpu Related to CPU backends label Dec 17, 2025
@mergify
Copy link

mergify bot commented Dec 18, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Radu2k.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@Radu2k
Copy link
Contributor Author

Radu2k commented Jan 29, 2026

Closing this as it will be treated in cpuinfo repo.

@Radu2k Radu2k closed this Jan 29, 2026
@Rohanjames1997
Copy link

@Radu2k thanks for this.

Are you planning to raise a similar PR to https://github.com/pytorch/cpuinfo? If not I can give it a shot later this week.

@Radu2k
Copy link
Contributor Author

Radu2k commented Feb 5, 2026

Hi @Rohanjames1997, I have this on my todo list currently but quite busy atm. Would be great to see a fix for this as soon as possible so please feel free to go ahead if you have availability. You can cc me on the PR and I am glad to help review it and test it across platforms. Thanks as well.

@Rohanjames1997
Copy link

Thanks @Radu2k

I've raised one here pytorch/cpuinfo#372

@Radu2k
Copy link
Contributor Author

Radu2k commented Feb 5, 2026

Great! Will review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpu Related to CPU backends needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants