Skip to content

AV1 FrameHistory.doFind() infinite loop causing 100% CPU usage #2346

@scroom

Description

@scroom

Description

A critical performance issue has been identified in Jitsi Video Bridge where the AV1 codec implementation causes an infinite loop in org.jitsi.videobridge.cc.av1.FrameHistory.doFind(), resulting in sustained 100% CPU usage even when no conferences are active.

The problematic thread Global CPU poolpool-9-thread-10 accumulated approximately 43 hours of CPU time over ~8.5 days, indicating a severe infinite loop condition in the AV1 frame history processing logic.


Current behavior

  1. Sustained high CPU usage: One thread consistently uses 100% of a CPU core
  2. No active conferences: Problem occurs even with zero active conferences ({"conferences": {}})
  3. Long-running problematic thread: Thread accumulated ~43 hours of CPU time over approximately one week
  4. Normal service status: All Jitsi services (JVB, Jicofo, Prosody) report as "active" and healthy
  5. REST API responsive: JVB REST API at localhost:8080 remains functional but shows no conferences

Thread dump showing the infinite loop:

"Global CPU poolpool-9-thread-10" #76 prio=5 os_prio=0 cpu=155153434.77ms elapsed=736097.72s tid=0x00007397ec0af800 nid=0x29646 runnable [0x00007396e3efc000]
   java.lang.Thread.State: RUNNABLE
	at org.jitsi.videobridge.cc.av1.FrameHistory.doFind(Av1DDFrameMap.kt:240)
	at org.jitsi.videobridge.cc.av1.FrameHistory.findBefore(Av1DDFrameMap.kt:217)
	at org.jitsi.videobridge.cc.av1.Av1DDFrameMap.prevFrame(Av1DDFrameMap.kt:128)
	- locked <0x000000040a331e00> (a org.jitsi.videobridge.cc.av1.Av1DDFrameMap)
	at org.jitsi.videobridge.cc.av1.Av1DDAdaptiveSourceProjectionContext.prevFrame(Av1DDAdaptiveSourceProjectionContext.kt:262)
	- locked <0x000000040a331eb0> (a org.jitsi.videobridge.cc.av1.Av1DDAdaptiveSourceProjectionContext)
	at org.jitsi.videobridge.cc.av1.Av1DDAdaptiveSourceProjectionContext.createInEncodingProjection(Av1DDAdaptiveSourceProjectionContext.kt:574)
	at org.jitsi.videobridge.cc.av1.Av1DDAdaptiveSourceProjectionContext.createInEncodingProjection(Av1DDAdaptiveSourceProjectionContext.kt:528)
	at org.jitsi.videobridge.cc.av1.Av1DDAdaptiveSourceProjectionContext.createProjection(Av1DDAdaptiveSourceProjectionContext.kt:300)
	at org.jitsi.videobridge.cc.av1.Av1DDAdaptiveSourceProjectionContext.accept(Av1DDAdaptiveSourceProjectionContext.kt:114)

CPU Analysis:

  • Total CPU time: 155153434.77ms (approximately 43 hours)
  • Thread uptime: 736097.72s (approximately 8.5 days)
  • Effective CPU usage: Nearly 100% of one core for the entire duration

Expected Behavior

  1. CPU usage should remain low (<5%) when no conferences are active
  2. AV1 frame processing should not cause infinite loops
  3. Frame history searches should have reasonable bounds and timeouts
  4. JVB should efficiently handle AV1 frame sequences without getting stuck

Possible Solution

Root Cause: The issue appears to be in the FrameHistory.doFind() method where the frame search algorithm cannot locate a required frame reference, causing endless iteration.

Suggested fixes:

  1. Add bounds checking in FrameHistory.doFind() method (Av1DDFrameMap.kt:240)
  2. Implement timeout mechanism for frame search operations
  3. Add circuit breaker pattern for problematic frame sequences
  4. Enhanced logging for AV1 frame processing debugging
  5. Graceful fallback when frame references cannot be resolved

Immediate workaround: Disable AV1 codec:

videobridge {
    codec {
        av1 {
            enabled = false
        }
    }
}

Steps to reproduce

Note: Specific reproduction steps are unclear as this appears to be triggered by specific AV1 frame sequences. However, the issue seems to occur:

  1. Install Jitsi Meet with default configuration (AV1 enabled by default since stable-9909)
  2. Allow normal operation over several days
  3. Process AV1 video streams (likely during or after conferences)
  4. Monitor CPU usage - one thread will eventually get stuck in infinite loop

Detection method:

# Check for high CPU with no active conferences
ps aux | grep jvb | grep -v grep
curl -s http://localhost:8080/stats | jq '.conferences | length'

# If CPU >50% and conferences = 0, create thread dump
jstack $(pgrep -f jvb) | grep -A 20 "org.jitsi.videobridge.cc.av1"

Environment details

  • JVB Version: 2.3 (stable packages, using apt-get installation)
  • OS: Ubuntu 24.04 LTS (GNU/Linux 4.15.0-91-generic x86_64)
  • Java: OpenJDK 64-Bit Server VM (11.0.27+6-post-Ubuntu-0ubuntu124.04)
  • Installation method: Standard apt-get installation following official Jitsi documentation
  • Server specs: Dedicated server with sufficient resources (32GB RAM, 12 CPU cores)
  • AV1 status: Enabled by default (recent stable release)
  • Other services: Prosody, Jicofo running normally
  • Network: Production environment with real users

JVB Configuration: Default configuration with AV1 enabled
Service status: All services healthy according to systemctl
REST API response:

{
  "shutdown_state": "RUNNING",
  "drain": false,
  "conferences": {},
  "health": {
    "success": true,
    "message": null
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions