Skip to content

opt(caching): reduce ProgramCache lock contention#955

Open
Haihan-Jiang wants to merge 1 commit into
bytedance:mainfrom
Haihan-Jiang:codex/sonic-948-programcache-singleflight
Open

opt(caching): reduce ProgramCache lock contention#955
Haihan-Jiang wants to merge 1 commit into
bytedance:mainfrom
Haihan-Jiang:codex/sonic-948-programcache-singleflight

Conversation

@Haihan-Jiang

Copy link
Copy Markdown

What type of PR is this?

optimize

Check the PR title.

  • This PR title match the format: (optional scope):
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

减少 ProgramCache 的锁竞争

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:

This reduces ProgramCache.Compute() lock contention on cache misses.

Before this change, Compute() held the global cache mutex while running the potentially expensive compute() function. That serialized all cache-miss compilations, even when goroutines were compiling unrelated *rt.GoType values.

This PR adds a small per-type in-flight call table:

  • cache hits still read the RCU program map without locking;
  • same-type concurrent misses share one compile and wait for the same result;
  • different types can compile concurrently;
  • failed compiles are not cached, preserving existing behavior;
  • Reset() prevents an older in-flight compile from repopulating the fresh cache;
  • a panicking compile releases the pending entry and re-panics, so future calls are not stuck behind a poisoned pending state.

zh(optional):

这个 PR 减少 ProgramCache.Compute() 在 cache miss 时的锁竞争。

之前 Compute() 会在执行可能较慢的 compute() 时一直持有全局 cache mutex,导致不同 *rt.GoType 的编译也被串行化。

本 PR 增加了一个按类型区分的 in-flight call 表:

  • cache hit 仍然无锁读取 RCU program map;
  • 同一个类型的并发 miss 共享一次编译;
  • 不同类型可以并发编译;
  • 编译失败不会被缓存,保持原行为;
  • Reset() 后,旧的 in-flight 编译不会重新写入新 cache;
  • 如果 compute() panic,会清理 pending entry 并继续 panic,避免后续同类型调用被永久卡住。

(Optional) Which issue(s) this PR fixes:

Fixes #948

Test Results

go test ./internal/caching
go test -race ./internal/caching
go test ./internal/caching -run 'TestPcache' -count=100
go test -race ./internal/caching -run 'TestPcache' -count=20

All passed locally.

I also tried go test ./... on Darwin/ARM64 with Go 1.26.3. The full run still has existing platform/JIT build failures in packages such as internal/decoder/jitdec, internal/encoder/x86, internal/jit, internal/native/avx2, and internal/native/sse; the targeted internal/caching package passed.

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.74468% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.42%. Comparing base (59be92f) to head (0d32d2c).
⚠️ Report is 83 commits behind head on main.

Files with missing lines Patch % Lines
internal/caching/pcache.go 95.74% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #955      +/-   ##
==========================================
- Coverage   51.86%   51.42%   -0.45%     
==========================================
  Files         127      170      +43     
  Lines       10893    14186    +3293     
==========================================
+ Hits         5650     7295    +1645     
- Misses       4920     6500    +1580     
- Partials      323      391      +68     
Flag Coverage Δ
arm 43.14% <93.61%> (?)
macos-latest 44.29% <92.50%> (?)
ubuntu-24.04-arm 43.12% <93.61%> (?)
ubuntu-latest 49.88% <95.00%> (?)
x86 49.88% <95.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce ProgramCache lock contention and wake only waiters for the same type

3 participants