Minimize memory leaks when server is unreachable by mgr-inz-rafal · Pull Request #350 · Ylianst/MeshAgent

mgr-inz-rafal · 2026-05-22T19:42:28Z

This PR is related to #110 and is a best effort to clear some memory leaks.

Recently in our project we've also noticed high memory usage in cases where the server is unreachable. I did small review of the reconnection loop and implemented a couple of fixes and optimizations.

I did some tests for which I dodged the exponential backoff mechanism and forced the reconnections every 10ms. I noticed that even after a couple of hours the memory footprint is relatively low and much lower that when I tested the code from master to establish a baseline. There could be some more memory leaks lurking there, though.

Quick summary of changes (details are in separate commits):

Closing some dangling handles
Add clean-up code in some early-exit paths
Removed one redundant call to ILibMemory_Free
One additional, explicit invokation of duk_gc
Probably the biggest win: cache of the authenticode check result - I assumed that the binary itself will never be modified while running, so we can cache the check result. My guess is that some code in the duk_* functions which is now extracted to MeshServer_CheckAuthenticode was leaking memory.

Hope this helps.

…Remove` via `DestroyPtr`

Tesla2k · 2026-05-30T05:45:13Z

Tested on Linux x86-64 (Flatcar headless VM, no KVM, ~zero workload — just heartbeat to server). Compared the stock 2026-05-22 build against a build of this PR (HEAD 48b9aa6, make linux ARCHID=6 KVM=0).

Stock binary on this host showed the classic pattern:

Run	Duration	RSS peak	Swap peak	Outcome
1	24 h	202 M	3.7 G	SIGKILL (OOM)
2	93 h	205 M	3.8 G	core-dump

(MemoryMax=300M was set in the unit — it caps RSS but the kernel pushes the rest into swap until the process either OOMs or crashes.)

Replaced just the binary, no other changes:

Build	Duration	RSS	RSS peak	Swap	Restarts
PR #350	41 h	213 M	216 M	0 B	0

RSS peak hasn't moved in the last ~38 h — clear plateau. Task count plateaus too (started at 2, jumped once to 36 around the 7 h mark, since then 36–37). CPU is ~41 min/day, same as the stock binary. Same workload, same MeshCentral server, same systemd config.

The fix at agentcore.c:3554 (drop redundant ILibMemory_Free after ILibLifeTime_Remove) and the extra duk_gc() in MeshServer_Connect are likely doing the heavy lifting on Linux — the Windows authenticode cache changes don't apply here and we still see this clean a behaviour change. Will report back at the 7-day mark; happy to share the monitor log if useful.

mgr-inz-rafal added 6 commits May 22, 2026 21:01

close HCERTSTORE and HCRYPTMSG handles in win-authenticode-opus

b7f3dd4

Add missing call to ILibDestructParserResults

7070124

Remove redundant ILibMemory_Free, already cleared in `ILibLifeTime_…

a4929dc

…Remove` via `DestroyPtr`

Add another explicit call to Duktape Garbage Collector

073fd7c

Cache Win32 authenticode check across reconnect cycles

d30ca31

Clean-up the authenticode cache code

48b9aa6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize memory leaks when server is unreachable#350

Minimize memory leaks when server is unreachable#350
mgr-inz-rafal wants to merge 6 commits into
Ylianst:masterfrom
mgr-inz-rafal:stab_at_memleaks

mgr-inz-rafal commented May 22, 2026

Uh oh!

Tesla2k commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgr-inz-rafal commented May 22, 2026

Uh oh!

Tesla2k commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants