[Bug][CPU Backend]: Improve L2 cache size detection and usage on aarch64 by Radu2k · Pull Request #30553 · vllm-project/vllm

Radu2k · 2025-12-12T13:40:39Z

Purpose

Add /sys/devices/system/cpu/cpu0/cache/index2/size parsing for L2 cache on aarch64
Fallback to 1MB with warning if the sysfs file cannot be opened
Use 70% of detected L2 cache for attention scheduling to leave enough room for sys processes

Test Plan

l2_cache_size gets the right value.

Test Result

l2_cache_size is set to 204810240.7 bytes on c8g instance type which is the correct value give that it has 2MB of L2 cache per core and we want to use 70%.

Solves: #30487

cc: @fadara01 @aditew01

- Add /sys/devices/system/cpu/cpu0/cache/index2/size parsing for L2 cache on aarch64 - Fallback to 1MB with warning if the sysfs file cannot be opened - Use 70% of detected L2 cache for attention scheduling to leave enough room for sys processes Signed-off-by: Radu Salavat <radu.salavat@arm.com>

chatgpt-codex-connector · 2025-12-12T13:40:48Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request enhances L2 cache size detection on aarch64 by reading from sysfs, which is a valuable improvement. The fallback mechanism is also a good addition for robustness. However, the implementation for parsing the cache size from the sysfs file contains critical flaws that could lead to application crashes or incorrect cache size calculations. I have provided a detailed comment with a more robust implementation to address these issues.

gemini-code-assist · 2025-12-12T13:43:29Z

csrc/cpu/cpu_attn_impl.hpp

+      // Parse size based on suffix
+      char suffix = size_str.back();
+      size_str.pop_back();
+      long l2_cache_size = std::stol(size_str);
+      switch (suffix) {
+        case 'K':
+        case 'k':
+          l2_cache_size *= 1024;
+          break;
+        case 'M':
+        case 'm':
+          l2_cache_size *= 1024 * 1024;
+          break;
+        default:
+          break;
+      }


The current parsing logic for the L2 cache size is fragile and can lead to incorrect behavior or crashes for several reasons:

Undefined Behavior: If size_str is empty (e.g., empty sysfs file), size_str.back() results in undefined behavior, which can crash the application.

Unhandled Exceptions: std::stol can throw std::invalid_argument or std::out_of_range if the string cannot be parsed or is out of range. These exceptions are not handled, which will terminate the program.

Incorrect Parsing: The logic assumes a suffix is always present. If the file contains just a number (e.g., "2048"), it will be mis-parsed as "204" with a suffix of '8', leading to a wildly incorrect cache size. According to kernel documentation, a value without a suffix should be interpreted as KiB.

A more robust implementation is suggested below to address these critical issues.

// Parse size based on suffix long l2_cache_size; try { size_t pos = 0; l2_cache_size = std::stol(size_str, &pos); while (pos < size_str.length() && std::isspace(size_str[pos])) { pos++; } if (pos < size_str.length()) { char suffix = std::toupper(size_str[pos]); if (suffix == 'K') { l2_cache_size *= 1024; } else if (suffix == 'M') { l2_cache_size *= 1024 * 1024; } } else { // No suffix, kernel documentation says the value is in KiB. l2_cache_size *= 1024; } } catch (const std::exception& e) { std::printf( "Failed to parse L2 cache size from string '%s', setting l2_cache_size=1MB. " "This might have an impact on performance.", size_str.c_str()); return static_cast<long>(1024 * 1024); }

aditew01 · 2025-12-12T14:07:50Z

csrc/cpu/cpu_attn_impl.hpp

      }
      // Fallback if sysctlbyname fails
      return 128LL * 1024 >> 1;  // use 50% of 128KB
+#elif defined(__aarch64__)


Nice, I like the logic!
Although, for reusability, can we use something like pytorch/cpuinfo to get the data? Unless it's not possible?

+1 to using pytorch/cpuinfo, great suggestion @aditew01

I agree, good idea.

fadara01 · 2025-12-12T14:43:01Z

csrc/cpu/cpu_attn_impl.hpp

+      std::ifstream l2_cache_file(
+          "/sys/devices/system/cpu/cpu0/cache/index2/size");
+      if (!l2_cache_file.is_open()) {
+        std::printf(


Let's use TORCH_WARN

fadara01 · 2025-12-12T14:43:21Z

csrc/cpu/cpu_attn_impl.hpp

+      char suffix = size_str.back();
+      size_str.pop_back();
+      long l2_cache_size = std::stol(size_str);
+      switch (suffix) {


Could we explain what these mean?

Keeping in mind that the usual format is <value><unit(K/M)> we get the unit in suffix, pop it form the back, and then convert the string(value) to a long type.

fadara01 · 2025-12-12T14:44:06Z

Great catch and fix!
I'll test performance on my side as well

Thank you

mergify · 2025-12-18T12:05:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Radu2k.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Radu2k · 2026-01-29T13:46:49Z

Closing this as it will be treated in cpuinfo repo.

Rohanjames1997 · 2026-02-04T20:23:24Z

@Radu2k thanks for this.

Are you planning to raise a similar PR to https://github.com/pytorch/cpuinfo? If not I can give it a shot later this week.

Radu2k · 2026-02-05T14:26:49Z

Hi @Rohanjames1997, I have this on my todo list currently but quite busy atm. Would be great to see a fix for this as soon as possible so please feel free to go ahead if you have availability. You can cc me on the PR and I am glad to help review it and test it across platforms. Thanks as well.

Rohanjames1997 · 2026-02-05T14:29:01Z

Thanks @Radu2k

I've raised one here pytorch/cpuinfo#372

Radu2k · 2026-02-05T14:30:03Z

Great! Will review.

Radu2k requested a review from bigPYJ1151 as a code owner December 12, 2025 13:40

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

aditew01 suggested changes Dec 12, 2025

View reviewed changes

fadara01 reviewed Dec 12, 2025

View reviewed changes

mergify bot assigned fadara01 Dec 17, 2025

mergify bot added the cpu Related to CPU backends label Dec 17, 2025

mergify bot added the needs-rebase label Dec 18, 2025

mergify bot added the bug Something isn't working label Jan 14, 2026

Radu2k mentioned this pull request Jan 23, 2026

cpuinfo reports incorrect L2 cache size on ARM (Neoverse V1/V2) pytorch/cpuinfo#369

Open

Radu2k closed this Jan 29, 2026

Uh oh!

Conversation

Radu2k commented Dec 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

aditew01 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Radu2k Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Radu2k Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 commented Dec 12, 2025

Uh oh!

mergify bot commented Dec 18, 2025

Uh oh!

Radu2k commented Jan 29, 2026

Uh oh!

Rohanjames1997 commented Feb 4, 2026

Uh oh!

Radu2k commented Feb 5, 2026

Uh oh!

Rohanjames1997 commented Feb 5, 2026

Uh oh!

Radu2k commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Radu2k commented Dec 12, 2025 •

edited by github-actions bot

Loading