Description
Description
I instantiated the Scanner
class multiple times and executed scans sequentially. However, I noticed that the scan results were incorrect. After investigating, I found that Scanner
internally creates ThreadedScanning
instances, but these instances unexpectedly share the same inputs
and output
queues across multiple Scanner
instances.
The ThreadedScanning
class defines inputs
and output
as queue.Queue()
in the dataclass
fields. However, because queue.Queue()
is a mutable object and is defined as a default value, all ThreadedScanning
instances created within Scanner
share the same Queue
objects.
This means that when I created a new Scanner
instance and executed a scan, it retained data from previous Scanner
instances, causing unexpected scan results.
If this behavior is intentional, could you clarify the reasoning behind it? However, if it is unintended, I suggest modifying the implementation to ensure that each ThreadedScanning
instance gets its own independent queues.
- Relevant Code in
threadedscanning.py
Steps to Reproduce
from scanoss.threadedscanning import ThreadedScanning
from scanoss.scanossapi import ScanossApi
api = ScanossApi()
scanner1 = ThreadedScanning(api)
scanner2 = ThreadedScanning(api)
scanner1.output.put('Test data')
print(scanner2.output.qsize()) # Expected: 0, Actual: 1
Suggested Fix
To ensure that each instance of ThreadedScanning
has its own independent Queue
, the class should use field(default_factory=queue.Queue)
instead of directly assigning queue.Queue()
.
Current Implementation
@dataclass
class ThreadedScanning(ScanossBase):
inputs: queue.Queue = queue.Queue() # Shared across instances
output: queue.Queue = queue.Queue() # Shared across instances
Recommended Fix
By using field(default_factory=queue.Queue)
, each instance will get its own fresh Queue
, preventing unintended data retention across Scanner
instances.
from dataclasses import field
@dataclass
class ThreadedScanning(ScanossBase):
inputs: queue.Queue = field(default_factory=queue.Queue)
output: queue.Queue = field(default_factory=queue.Queue)
This change ensures that each ThreadedScanning
instance, whether created inside Scanner
or elsewhere, gets a unique Queue
, preventing scan data from leaking across different instances.