Skip to content

perf(luavm): cache compiled Lua FunctionProto in RunScript to avoid r…#7602

Open
qsxDree wants to merge 4 commits into
karmada-io:masterfrom
qsxDree:perf-7597
Open

perf(luavm): cache compiled Lua FunctionProto in RunScript to avoid r…#7602
qsxDree wants to merge 4 commits into
karmada-io:masterfrom
qsxDree:perf-7597

Conversation

@qsxDree

@qsxDree qsxDree commented Jun 4, 2026

Copy link
Copy Markdown

What type of PR is this?
/kind feature

What this PR does / why we need it:
Cache compiled Lua FunctionProto in the VM to avoid redundant parse+compile on every RunScript call.

RunScript is called on every status event through ReflectStatus, InterpretHealth, AggregateStatus, etc. In large fleets this causes continuous unnecessary allocation and GC pressure. Since scripts are immutable, recompiling every time is pure waste.

Fixes #
This fix caches compiled protos using sync.Map keyed by script content and reuses via NewFunctionFromProto + PCall. Behavior remains identical.

Which issue(s) this PR fixes:

Fixes #7597
Part of #7596

Does this PR introduce a user-facing change?:

karmada-controller-manager`/`karmada-agent`: Optimized Lua script execution in the resource interpreter by caching compiled FunctionProto, improving performance for customized resource interpretation.

Copilot AI review requested due to automatic review settings June 4, 2026 18:36
@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 4, 2026
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the Lua VM execution path by implementing a caching layer for compiled scripts. By storing and reusing FunctionProto objects, the system avoids the overhead of repeated parsing and compilation, which significantly improves performance and reduces memory pressure in high-load scenarios where scripts are executed frequently.

Highlights

  • Performance Optimization: Introduced a caching mechanism for compiled Lua FunctionProto objects using sync.Map to eliminate redundant parsing and compilation during RunScript calls.
  • Resource Efficiency: Reduced unnecessary memory allocations and GC pressure in environments with high-frequency status updates by reusing immutable script protos.
  • Testing: Added comprehensive unit tests to verify that compiled script prototypes are correctly cached and reused across multiple execution calls.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@karmada-bot

Copy link
Copy Markdown
Contributor

Welcome @kurlingtown! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot requested a review from chaunceyjiang June 4, 2026 18:37
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 4, 2026
@karmada-bot karmada-bot requested a review from ikaven1024 June 4, 2026 18:37

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces compilation caching for Lua scripts within the VM by caching compiled FunctionProto structures in a sync.Map, which avoids redundant compilation during script execution. Unit tests were also added to verify this caching behavior. The reviewer feedback recommends adding a TODO comment to implement an eviction policy for the cache to prevent potential memory leaks, and adding a documentation comment to the unexported compileScript helper function to improve code maintainability.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

// UseOpenLibs flag to enable open libraries. Libraries are disabled by default while running, but enabled during testing to allow the use of print statements.
UseOpenLibs bool
Pool *fixedpool.FixedPool
funcCache sync.Map // map[string]*lua.FunctionProto

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The funcCache uses a sync.Map to cache compiled Lua FunctionProto structures. Since scripts can be updated or dynamically registered, this cache can grow indefinitely over time, leading to a potential memory leak. Please add a TODO comment indicating that a garbage collection or eviction policy should be implemented.

	// TODO(user): Implement a garbage collection or eviction policy for funcCache to prevent memory leaks if scripts are updated or dynamically generated.
	funcCache   sync.Map // map[string]*lua.FunctionProto
References
  1. When implementing garbage collection (GC) for caches, ensure the GC method is invoked periodically (e.g., in a background goroutine) to prevent memory leaks. If the invocation is deferred, use a TODO comment to prevent confusion for other developers.

return vm
}

func (vm *VM) compileScript(script string) (*lua.FunctionProto, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The compileScript function is an unexported helper containing key compilation and caching logic. According to the general rules, unexported functions with key logic should have doc comments to improve maintainability.

// compileScript parses and compiles the Lua script into a FunctionProto, caching the result to avoid redundant compilation.
func (vm *VM) compileScript(script string) (*lua.FunctionProto, error) {
References
  1. Add doc comments to unexported functions that contain key logic to improve maintainability.

…edundant parse+compile

Signed-off-by: kurlingtown <kurling.town@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@codecov-commenter

codecov-commenter commented Jun 4, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 81.39535% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.06%. Comparing base (80ff6e5) to head (dde8fae).
⚠️ Report is 50 commits behind head on master.

Files with missing lines Patch % Lines
...rceinterpreter/customized/declarative/luavm/lua.go 89.74% 2 Missing and 2 partials ⚠️
...interpreter/customized/declarative/configurable.go 0.00% 2 Missing ⚠️
...sourceinterpreter/default/thirdparty/thirdparty.go 0.00% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7602      +/-   ##
==========================================
- Coverage   42.16%   42.06%   -0.10%     
==========================================
  Files         879      879              
  Lines       54731    54855     +124     
==========================================
+ Hits        23076    23077       +1     
- Misses      29911    30029     +118     
- Partials     1744     1749       +5     
Flag Coverage Δ
unittests 42.06% <81.39%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: kurlingtown <kurling.town@gmail.com>
@qsxDree

qsxDree commented Jun 8, 2026

Copy link
Copy Markdown
Author

/cc @RainbowMango
Would like to request a review and let me know anywhere i went wrong, Thank you

@karmada-bot karmada-bot requested a review from RainbowMango June 8, 2026 15:52
@RainbowMango

Copy link
Copy Markdown
Member

Putting this into my queue. Thanks.
/assign

@RainbowMango

Copy link
Copy Markdown
Member

@qsxDree Yes, I like the approach. But I think the memory leak concerns should be addressed, as you mentioned on the comments:

// TODO(user): Implement a garbage collection or eviction policy for funcCache to prevent memory leaks if scripts are updated or dynamically generated.

@zhzhuang-zju zhzhuang-zju left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @qsxDree, this change brings performance improvements. Please add a corresponding release note in the PR description, thanks.

}

// compileScript parses and compiles the Lua script into a FunctionProto, caching the result to avoid redundant compilation.
func (vm *VM) compileScript(script string) (*lua.FunctionProto, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
func (vm *VM) compileScript(script string) (*lua.FunctionProto, error) {
func (vm *VM) loadOrCompileScript(script string) (*lua.FunctionProto, error) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that makes more sense. updated!

@qsxDree

qsxDree commented Jun 17, 2026

Copy link
Copy Markdown
Author

@qsxDree Yes, I like the approach. But I think the memory leak concerns should be addressed, as you mentioned on the comments:

// TODO(user): Implement a garbage collection or eviction policy for funcCache to prevent memory leaks if scripts are updated or dynamically generated.

to address the memory leak concern, i could implement the functionality of clearing the funcCache on config reload as we already watch ResourceInterpreterCustomization changes in configManager. this will delete the stale compiled protos when scripts are updated.
should i implement this?
cc: @RainbowMango

@qsxDree

qsxDree commented Jun 17, 2026

Copy link
Copy Markdown
Author

Hi @qsxDree, this change brings performance improvements. Please add a corresponding release note in the PR description, thanks.

done.
cc: @zhzhuang-zju

@karmada-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rainbowmango. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RainbowMango

Copy link
Copy Markdown
Member

to address the memory leak concern, i could implement the functionality of clearing the funcCache on config reload as we already watch ResourceInterpreterCustomization changes in configManager. this will delete the stale compiled protos when scripts are updated.
should i implement this?

Do you mean let configManager refresh the cache in LuaVM?

I'm not sure, but another idea in my mind is that, taking group/version/kind, and operation type as the cache key, the cache will be re-compiled once the script changes.

@qsxDree

qsxDree commented Jun 18, 2026

Copy link
Copy Markdown
Author

to address the memory leak concern, i could implement the functionality of clearing the funcCache on config reload as we already watch ResourceInterpreterCustomization changes in configManager. this will delete the stale compiled protos when scripts are updated.
should i implement this?

Do you mean let configManager refresh the cache in LuaVM?

I'm not sure, but another idea in my mind is that, taking group/version/kind, and operation type as the cache key, the cache will be re-compiled once the script changes.

sorry for my inexperience, but i don't get how having group/version/kind, and operation type as the cache key will enable to detect change in lua script. like none of those factors will change on modification of the script right?

@RainbowMango

Copy link
Copy Markdown
Member

like none of those factors will change on modification of the script right?

Yes, you are right. But note that each script belongs to a Group/Version/Kind AND operation. The cache workflow might like:

  • Before executing a script, try to find the FunctionProto from the cache by the GVK And operation.
  • If hit the cache, we can also figure out if the script changed or not, as the previous script in the cach as well.

@qsxDree

qsxDree commented Jun 22, 2026

Copy link
Copy Markdown
Author

like none of those factors will change on modification of the script right?

Yes, you are right. But note that each script belongs to a Group/Version/Kind AND operation. The cache workflow might like:

* Before executing a script, try to find the FunctionProto from the cache by the GVK And operation.

* If hit the cache, we can also figure out if the script changed or not, as the previous script in the cach as well.

okay, you mean have the gvk + operation type as key and the {string script, compiled proto} as the value, right?
then on cache hit we can check if the scripts match, if they do we proceed with the compiled proto, and if not we compile the new script and refresh the cache with it.

PS:
noticed that for InterpretDependecy a single GVK + Operation Type can have multiple Lua scripts. so another key of index (look_index) can be used to give each dependency script its own cache, would be 0 for the rest.

qsxDree added 2 commits June 22, 2026 22:03
Signed-off-by: qsxDree <kurling.town@gmail.com>
Signed-off-by: qsxDree <kurling.town@gmail.com>
@RainbowMango

Copy link
Copy Markdown
Member

okay, you mean have the gvk + operation type as key and the {string script, compiled proto} as the value, right?

Yes, that's the idea.

noticed that for InterpretDependecy a single GVK + Operation Type can have multiple Lua scripts. so another key of index (look_index) can be used to give each dependency script its own cache, would be 0 for the rest.

Looks great, but one more thing needs to be confirmed: is the script index constant?

@qsxDree

qsxDree commented Jun 23, 2026

Copy link
Copy Markdown
Author

Looks great, but one more thing needs to be confirmed: is the script index constant?

the index is the position in the list for the current config snapshot. so it is constant for an unchanged config but not if it is updated.
this has a minor effect on performance as some recompilation will have to be performed in case of updation to current config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[perf] : cache compiled Lua FunctionProto in RunScript to avoid redundant parse+compile on every call

6 participants