Skip to content

Processes not auto scaled when long running jobs are being processed #1539

Open
@markieo1

Description

@markieo1

Horizon Version

5.30.3

Laravel Version

11.42.1

PHP Version

8.4.3

Redis Driver

PhpRedis

Redis Version

7.2.4

Database Driver & Version

MySQL 8.0.30

Description

Hi,

We've recently switched from simple balancing to auto.
We noticed that some processes weren't moving between queues while the other queue was filling up.

This seemed to be happening when there were some long running jobs in the queue. In this same queue some short jobs were being processed that caused the AutoScaler to give more processes to the queue. After processing the short jobs, the long jobs were still running.

In the meantime another queue would get filled up. We'd expect the AutoScaler to move processes but this isn't happening. Only after the long-running jobs were finished we saw the processes being moved to another queue.

Steps To Reproduce

In these steps we have set maxProcesses to 5. This can also be happening with different amounts.
Create a new queue configuration in horizon.php:

'tests' => [
    'connection' => 'horizon',
    'queue' => ['test_long', 'test_short'],
    'autoScalingStrategy' => 'size',
    'balance' => 'auto',
    // The number of seconds to wait in between auto-scaling attempts.
    'balanceCooldown' => 3,
    // The maximum number of processes that can be scaled up or down in a single auto-scaling attempt.
    'balanceMaxShift' => 1,
    'memory' => 256,
    'tries' => 1,
    'timeout' => 3600,
    'minProcesses' => 1,
    'maxProcesses' => 5,
    'maxJobs' => 1000
]

Create a new TestJob that does nothing more than sleep:

<?php

namespace App\Jobs;

use Carbon\Carbon;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Support\Facades\Log;

class TestJob implements ShouldQueue
{
    use Queueable;

    public function __construct(private int $sleepingTime)
    {
    }

    public function handle()
    {
        $currentTime = Carbon::now();

        // Check if the seconds have passed
        while ($currentTime->diffInSeconds(Carbon::now()) < $this->sleepingTime) {
            // Do nothing
        }

        Log::info('Job has been processed');
    }
}

Call the following code:

public function reproduce(){
    // Create 2 jobs that are very long. This causes the auto scaling to create 2 processes
    for ($long = 0; $long < 2; $long++) {
        dispatch(new \App\Jobs\TestJob(300))->onQueue('test_long');
    }

    // Create 100 jobs that are very short. This causes the auto scaling to have the maximum of processes
    for ($short = 0; $short < 100; $short++) {
        dispatch(new \App\Jobs\TestJob(1))->onQueue('test_long');
    }

    // Wait for a bit to let the auto scaler finish
    sleep(10);
    // Create 1000 jobs that are very short. This should move some processes from the long queue to the short queue. After the short jobs in the long queue have been processed.
    for ($short = 0; $short < 1000; $short++) {
        dispatch(new \App\Jobs\TestJob(1))->onQueue('test_short');
    }
}

When you watch Horizon, you'll see the process count for test_long being kept high even after the "short" jobs have been processed. The other weird thing is that it doesn't match the maxProcesses of 5.

Image

When the "long" jobs finish, the processes will begin moving to the test_short queue. The "missing" process also came back to live.
Image

We'd expect the moving of the processes to happen way earlier, since these processes are idling until the long running jobs finish processing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions