Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 17, 2025

Two race conditions cause undefined behavior: (1) cv::Mat references unmapped stage surface memory accessed by other threads, (2) plain bool isDisabled accessed without synchronization across multiple threads.

Changes

Use-after-unmap in stage surface handling

File: src/obs-utils/obs-utils.cpp

The video render thread creates a cv::Mat wrapping mapped stage surface memory, then immediately unmaps it. The video tick thread then accesses this invalid pointer via clone().

// Before: cv::Mat references freed memory
{
    std::lock_guard<std::mutex> lock(tf->inputBGRALock);
    tf->inputBGRA = cv::Mat(height, width, CV_8UC4, video_data, linesize);
}
gs_stagesurface_unmap(tf->stagesurface);  // video_data now invalid

// After: independent copy
{
    std::lock_guard<std::mutex> lock(tf->inputBGRALock);
    cv::Mat temp(height, width, CV_8UC4, video_data, linesize);
    tf->inputBGRA = temp.clone();  // owns its data
}
gs_stagesurface_unmap(tf->stagesurface);

Non-atomic shared state

File: src/FilterData.h

The isDisabled flag is written by the settings/activate/deactivate threads and read by video tick/render threads without synchronization.

// Before: data race
bool isDisabled;

// After: atomic operations
std::atomic<bool> isDisabled{false};

Documentation

Added docs/RACE-CONDITIONS.md with thread model analysis, synchronization patterns, and a minor TOCTOU issue with model pointer access (mitigated by atomic isDisabled).

Original prompt

Detect potential race condition and suggest fixes


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Detect potential race condition and suggest fixes Fix race conditions in stage surface data handling and shared state access Nov 17, 2025
Copilot AI requested a review from umireon November 17, 2025 07:22
@umireon umireon marked this pull request as ready for review November 17, 2025 07:24
@umireon umireon requested a review from royshil November 17, 2025 07:25
@umireon
Copy link
Collaborator

umireon commented Nov 17, 2025

Race Condition Analysis and Fixes

This document describes race conditions that were identified and fixed in the OBS Background Removal plugin.

Critical Race Conditions Fixed

1. Use-After-Unmap in Stage Surface Data Access

Location: src/obs-utils/obs-utils.cpp:63-65

Description:
The function getRGBAFromStageSurface() was creating a cv::Mat object that wrapped a pointer to mapped stage surface memory, then immediately unmapping the stage surface. This caused the cv::Mat to reference invalid memory.

Original Code:

{
    std::lock_guard<std::mutex> lock(tf->inputBGRALock);
    tf->inputBGRA = cv::Mat(height, width, CV_8UC4, video_data, linesize);
}
gs_stagesurface_unmap(tf->stagesurface);

Issue:

  • Line 63: tf->inputBGRA is assigned a cv::Mat that wraps the video_data pointer
  • Line 65: Stage surface is unmapped, invalidating the video_data pointer
  • Other threads (video tick thread) could access tf->inputBGRA via .clone() with an invalid pointer
  • This could lead to crashes, memory corruption, or undefined behavior

Fix:
Clone the data before unmapping the stage surface:

{
    std::lock_guard<std::mutex> lock(tf->inputBGRALock);
    // Create a temporary Mat that wraps the video_data pointer
    cv::Mat temp(height, width, CV_8UC4, video_data, linesize);
    // Clone the data to ensure tf->inputBGRA has its own copy
    // This prevents use-after-unmap race condition
    tf->inputBGRA = temp.clone();
}
gs_stagesurface_unmap(tf->stagesurface);

Impact:

  • Ensures tf->inputBGRA contains valid data after stage surface is unmapped
  • Eliminates potential crashes and undefined behavior
  • Slight performance impact due to memory copy, but necessary for correctness

2. Non-Atomic Access to isDisabled Flag

Location: src/FilterData.h:27, accessed in multiple source files

Description:
The isDisabled flag was a plain bool accessed from multiple threads without synchronization.

Threads Accessing isDisabled:

  • Settings Update Thread: Writes in background_filter_update(), background_filter_activate(), background_filter_deactivate()
  • Video Tick Thread: Reads in background_filter_video_tick()
  • Video Render Thread: Reads in background_filter_video_render()
  • Destroy Thread: Writes in background_filter_destroy()

Original Code:

bool isDisabled;

Issue:

  • Plain bool reads/writes are not atomic in C++
  • Multiple threads reading and writing without synchronization creates a data race
  • Could lead to torn reads, lost updates, or undefined behavior per C++ memory model

Fix:
Changed to std::atomic<bool> with explicit initialization:

std::atomic<bool> isDisabled{false};

Impact:

  • All reads and writes are now atomic operations
  • Provides proper memory ordering guarantees
  • Eliminates undefined behavior from data races
  • No performance impact on modern architectures (bool reads/writes are naturally atomic)

Minor Race Conditions (Documented but Not Fixed)

3. Model Pointer Check Without Mutex (TOCTOU)

Location: src/background-filter.cpp:463

Description:
The code checks if (!tf->model) without holding the modelMutex, but the model can be reset in background_filter_update() while holding the mutex.

Code:

// In background_filter_video_tick()
if (!tf->model) {  // Line 463 - no lock held
    obs_log(LOG_ERROR, "Model is not initialized");
    return;
}
// ... later ...
{
    std::unique_lock<std::mutex> lock(tf->modelMutex);  // Line 512
    processImageForBackground(tf, imageBGRA, backgroundMask);
}

Analysis:
This is technically a Time-of-Check Time-of-Use (TOCTOU) race condition. However:

  • The isDisabled flag (now atomic) provides an additional layer of protection
  • In background_filter_update(), isDisabled is set to true before model changes
  • The video_tick checks isDisabled first, so model should not be modified during processing
  • The race window is small and unlikely to cause issues in practice

Recommendation:
This could be fixed by:

  1. Moving the null check inside the mutex lock, or
  2. Using std::shared_ptr for the model with atomic operations

However, given the protection from isDisabled and the complexity of changing the model ownership pattern, this is left as a documented issue rather than fixing it in this PR.

Thread Safety Analysis

Thread Model

The plugin operates with multiple threads:

  1. Main/Settings Thread: Calls background_filter_update(), background_filter_create(), background_filter_destroy()
  2. Video Tick Thread: Calls background_filter_video_tick() - processes frames and runs inference
  3. Video Render Thread: Calls background_filter_video_render() - renders the output
  4. Graphics Thread: Manages GPU resources via obs_enter_graphics()/obs_leave_graphics()

Mutex Usage

  • inputBGRALock: Protects inputBGRA cv::Mat

    • Written by render thread in getRGBAFromStageSurface()
    • Read by video tick thread in background_filter_video_tick()
  • outputLock: Protects output data (e.g., backgroundMask, outputBGRA)

    • Written by video tick thread
    • Read by render thread
  • modelMutex: Protects model and inference session

    • Written during model initialization in background_filter_update()
    • Read during inference in background_filter_video_tick()

Thread-Safe Members

  • isDisabled: Now std::atomic<bool>, can be safely read/written from any thread
  • lastImageBGRA, lastBackgroundMask: Only accessed from video tick thread (no synchronization needed)

Testing Recommendations

To verify these fixes:

  1. Stress Test: Run the plugin with frequent model changes and resolution changes to stress the synchronization
  2. Thread Sanitizer: Build with ThreadSanitizer (-fsanitize=thread) and run to detect any remaining data races
  3. Memory Sanitizer: Build with AddressSanitizer (-fsanitize=address) to detect use-after-free or invalid memory access
  4. Valgrind: Run under Valgrind with --tool=helgrind to detect threading issues

References

  • C++17 Standard: Memory Model and Atomics
  • OpenCV Documentation: Mat memory management
  • OBS Studio Plugin API: Thread safety considerations

@umireon umireon removed their request for review November 17, 2025 07:26
@umireon
Copy link
Collaborator

umireon commented Nov 17, 2025

@royshil Copilot analyzed the race conditions. Please review.

@umireon
Copy link
Collaborator

umireon commented Nov 17, 2025

I think this change is valid one.

Copy link
Owner

@royshil royshil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks legit

@umireon umireon merged commit b3d01d4 into main Nov 17, 2025
9 checks passed
@umireon umireon mentioned this pull request Nov 21, 2025
sobalap pushed a commit to sobalap/obs-backgroundremoval that referenced this pull request Jan 7, 2026
…ccess (royshil#716)

* Initial plan

* Fix race condition: Clone stage surface data before unmap

Co-authored-by: umireon <[email protected]>

* Fix race condition: Make isDisabled atomic and add documentation

Co-authored-by: umireon <[email protected]>

* Delete RACE-CONDITIONS.md

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: umireon <[email protected]>
Co-authored-by: Kaito Udagawa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants