Skip to content

ThreadedScanning instances share the same inputs and output Queues due to mutable defaults in dataclass #106

Closed
@githole

Description

@githole

Description

I instantiated the Scanner class multiple times and executed scans sequentially. However, I noticed that the scan results were incorrect. After investigating, I found that Scanner internally creates ThreadedScanning instances, but these instances unexpectedly share the same inputs and output queues across multiple Scanner instances.

The ThreadedScanning class defines inputs and output as queue.Queue() in the dataclass fields. However, because queue.Queue() is a mutable object and is defined as a default value, all ThreadedScanning instances created within Scanner share the same Queue objects.

This means that when I created a new Scanner instance and executed a scan, it retained data from previous Scanner instances, causing unexpected scan results.

If this behavior is intentional, could you clarify the reasoning behind it? However, if it is unintended, I suggest modifying the implementation to ensure that each ThreadedScanning instance gets its own independent queues.

Steps to Reproduce

from scanoss.threadedscanning import ThreadedScanning
from scanoss.scanossapi import ScanossApi

api = ScanossApi()
scanner1 = ThreadedScanning(api)
scanner2 = ThreadedScanning(api)

scanner1.output.put('Test data')
print(scanner2.output.qsize())  # Expected: 0, Actual: 1

Suggested Fix

To ensure that each instance of ThreadedScanning has its own independent Queue, the class should use field(default_factory=queue.Queue) instead of directly assigning queue.Queue().

Current Implementation

@dataclass
class ThreadedScanning(ScanossBase):
    inputs: queue.Queue = queue.Queue()  # Shared across instances
    output: queue.Queue = queue.Queue()  # Shared across instances

Recommended Fix

By using field(default_factory=queue.Queue), each instance will get its own fresh Queue, preventing unintended data retention across Scanner instances.

from dataclasses import field

@dataclass
class ThreadedScanning(ScanossBase):
    inputs: queue.Queue = field(default_factory=queue.Queue)
    output: queue.Queue = field(default_factory=queue.Queue)

This change ensures that each ThreadedScanning instance, whether created inside Scanner or elsewhere, gets a unique Queue, preventing scan data from leaking across different instances.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions