Open
Description
Bug report
Bug description:
server.py
from multiprocessing.managers import BaseManager
from queue import Queue
queue = Queue()
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda:queue)
m = QueueManager(address=('', 5000), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()
consumer.py
import time
from multiprocessing.managers import BaseManager
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('', 5000), authkey=b'abracadabra')
m.connect()
queue = m.get_queue()
while True:
t = time.time()
x = queue.get()
producerA.py
from multiprocessing.managers import BaseManager
import time
import numpy as np
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('', 5000), authkey=b'abracadabra')
m.connect()
queue = m.get_queue()
out_img = np.zeros((128, 128, 3), dtype=np.uint8)
for i in range(100):
t = time.time()
queue.put(
{
'type' : 'not working',
'data': out_img
}
)
print('put took', (time.time() - t)*1000)
producerB.py
from multiprocessing.managers import BaseManager
import time
import numpy as np
class QueueManager(BaseManager): pass
QueueManager.register('get_queue')
m = QueueManager(address=('', 5000), authkey=b'abracadabra')
m.connect()
queue = m.get_queue()
out_img = np.zeros((256, 256, 3), dtype=np.uint8)
for i in range(100):
t = time.time()
queue.put(
{
'type' : 'not working',
'data': out_img
}
)
print('put took', (time.time() - t)*1000)
steps to reproduce the issue
- run
server.py
- run
consumer.py
- run
producerA.py
and check theput time
and compare it withproducerB.py
put time.
put time
in producerA.py is higher than producerB.py
, however the size of data being send through shared queue objects is more in producerB.py( 256x256
) than in producerA.py (128x128
).
I don't have much context if this is related to numpy or cpython multiprocessing, but this is definitely shouldn't be the case.
Ideally it should take more time in producerB.py beacuase the size of data is relatively higher relative to producerA.py.
This is my first bug report, I'm hoping to learn something...
Thanks everyone...
CPython versions tested on:
3.10
Operating systems tested on:
Linux
Activity
ZeroIntensity commentedon Nov 7, 2024
I don't think this an issue on our side, but probably with how the numpy data gets serialized.
ndarray
can't be serialized by multiprocessing, so it's going to have to guess how to serialize it. I'm not 100% certain how it does that, but it probably just uses the__str__
or something similar, which would explain the odd results.You probably want to call
tolist
onout_img
in your reproducer:If that fixes it, let me know and I'll close this. If not, could you come up with a reproducer that does not involve
numpy
?ayushbisht2001 commentedon Nov 12, 2024
@ZeroIntensity thanks for the explanation. But as per my use case if i use
tolist()
, then its taking more time for big sized ndarray. I'm not sure if nunpy is doing some optimization for big arrays. Also, if its just str, then consumer shouldn't get the data. With this approach, I'm actually getting the data.. could this becopy-on-write
to efficiently transfer big numpy arrays?ayushbisht2001 commentedon Nov 12, 2024
I'll post this in numpy community channel, so far i haven't faced any issue with python generic types.
ZeroIntensity commentedon Nov 12, 2024
Yeah, then that's a good sign that this isn't an issue on our end.