Scaler won't scale certain services due to incorrect resource estimates

**Describe the bug**
I'm running AL4 under proxmox in my homelab, and am currently blocked on upgrading to 4.7. I have over allocation set to 0 for cpu and mem, since those are limited by the VM config (no auto-scaling). I set the minimum instances for most services to 0, since I do not want them spun up while the system's idle. I've had this setup working for the past year+, from 4.5 through the latest 4.6.

Under 4.7.x, the scaler will refuse to scale up various services, seemingly at random. This will result in the submission stalling, the system going idle with pending service queues. If I manually scale these services by setting their minimum instances to 1, then they will scale up and process their queue and the scaler will then scale others. Eventually, this will occur on other services, which can also be manually scaled until processing eventually succeeds. Future submissions will then stall on other services which were not manually scaled to 1.

Enabling debug logging, I can see that the scaler has a wildly inaccurate estimate of used/available resources (likely a [known issue](https://github.com/CybercentreCanada/assemblyline-core/blob/22abf89052f106fb2b3ad17a749af3ae98ab10fb/assemblyline_core/scaler/controllers/docker_ctl.py#L334), now a problem). When it stalls, it does so under the assessment of there being insufficient resources for the service.


**Example**

In one case, it claimed there was only 534mb ram available, when htop shows 13gb and the hypervisor shows 8gb. It also claimed there were only 4 cores available when the running containers were idle and cpu utilization was 1% per hypervisor and htop:

`"logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Can't make more because not enough resources {'Safelist': (1.0, 512)}"}
scaler  | {"@timestamp": "2026-02-20 01:28:03,731", "event": { "module": "assemblyline", "dataset": "assemblyline.scaler" }, "host": { "ip": "x.x.x.x", "hostname": "c68c0cefeebe" }, "log": { "level": "DEBUG", "logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Total Memory available 534.77734375/24026.77734375"}
scaler  | {"@timestamp": "2026-02-20 01:28:03,787", "event": { "module": "assemblyline", "dataset": "assemblyline.scaler" }, "host": { "ip": "x.x.x.x", "hostname": "c68c0cefeebe" }, "log": { "level": "DEBUG", "logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Total CPU available 4.699999999999999/24"}`

In either situation, even if those were the actual resources available, safelist would have had the 1 core and 512mb RAM needed, yet the scaler still refused. I manually scaled safelist up to a minimum of 1 instance, and it launched other services no problem **thinking it had negative resources** for them... Until it reached avclass and tagcheck, at which point it stalled again. After letting the system sit idle, this was the state in which it was stalled:

`"logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Can't make more because not enough resources {'AVClass': (0.25, 512), 'TagCheck': (0.5, 2048)}"}
scaler  | {"@timestamp": "2026-02-20 01:49:16,370", "event": { "module": "assemblyline", "dataset": "assemblyline.scaler" }, "host": { "ip": "x.x.x.x", "hostname": "c68c0cefeebe" }, "log": { "level": "DEBUG", "logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Total Memory available -15337.22265625/24026.77734375"}
scaler  | {"@timestamp": "2026-02-20 01:49:16,437", "event": { "module": "assemblyline", "dataset": "assemblyline.scaler" }, "host": { "ip": "x.x.x.x", "hostname": "c68c0cefeebe" }, "log": { "level": "DEBUG", "logger": "assemblyline.scaler" }, "process": { "pid": "1" }, "message": "Total CPU available -3.8000000000000007/24"}`

In reality, CPU utilization was 1% with 14/24GB RAM used, certainly not negative, no OOM reaping occurred via kernel, plenty available to run these services.

**To Reproduce**
Steps to reproduce the behavior:
1. Create a monolithic docker deployment of 4.7.x
2. 0 overallocation configured
3. Enable various services, 0 minimum instances
4. Submit a sample
5. Watch scaler not scale, processing stalls

**Expected behavior**
Scaler has a sane estimate of resource utilization
Scaler continues to scale up/down services given available resources

**Screenshots**
* Docker stats output during the last mentioned stall where scaler saw absurd negative resources available:
<img width="1055" height="540" alt="Image" src="https://github.com/user-attachments/assets/4e009002-340c-44a7-8baf-4b21eb741ca9" />

**Environment (please complete the following information if pertinent):**
 - Assemblyline Version: 4.7.x (4.6 works)
- Docker deployed under a single virtual machine, debian 12 given 24 cores 24gb ram



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaler won't scale certain services due to incorrect resource estimates #424

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scaler won't scale certain services due to incorrect resource estimates #424

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions