Skip to content

Weirdo multiprocessing: Shared objects taking more time in sharing smaller data than larger data between multiple processes.  #126471

Open
@ayushbisht2001

Description

@ayushbisht2001

Bug report

Bug description:

server.py

from multiprocessing.managers import BaseManager
from queue import Queue
queue = Queue()
class QueueManager(BaseManager): pass
QueueManager.register('get_queue', callable=lambda:queue)
m = QueueManager(address=('', 5000), authkey=b'abracadabra')
s = m.get_server()
s.serve_forever()

consumer.py

import time

from multiprocessing.managers import BaseManager

class QueueManager(BaseManager): pass

QueueManager.register('get_queue')
m = QueueManager(address=('', 5000), authkey=b'abracadabra')
m.connect()
queue = m.get_queue()

while True:
    t = time.time()
    x = queue.get()

producerA.py

from multiprocessing.managers import BaseManager
import time
import numpy as np

class QueueManager(BaseManager): pass

QueueManager.register('get_queue')

m = QueueManager(address=('', 5000), authkey=b'abracadabra')
m.connect()
queue = m.get_queue()

out_img = np.zeros((128, 128, 3), dtype=np.uint8)

for i in range(100):
    t = time.time()
    queue.put(
    {
        'type' : 'not working',
        'data': out_img
    }    
    )
    print('put took', (time.time() - t)*1000)

producerB.py

from multiprocessing.managers import BaseManager
import time
import numpy as np

class QueueManager(BaseManager): pass

QueueManager.register('get_queue')

m = QueueManager(address=('', 5000), authkey=b'abracadabra')
m.connect()
queue = m.get_queue()

out_img = np.zeros((256, 256, 3), dtype=np.uint8)

for i in range(100):
    t = time.time()
    queue.put(
    {
        'type' : 'not working',
        'data': out_img
    }    
    )
    print('put took', (time.time() - t)*1000)

steps to reproduce the issue

  • run server.py
  • run consumer.py
  • run producerA.py and check the put time and compare it with producerB.py put time.

put time in producerA.py is higher than producerB.py, however the size of data being send through shared queue objects is more in producerB.py( 256x256 ) than in producerA.py (128x128).
I don't have much context if this is related to numpy or cpython multiprocessing, but this is definitely shouldn't be the case.
Ideally it should take more time in producerB.py beacuase the size of data is relatively higher relative to producerA.py.

This is my first bug report, I'm hoping to learn something...

Thanks everyone...

CPython versions tested on:

3.10

Operating systems tested on:

Linux

Activity

ZeroIntensity

ZeroIntensity commented on Nov 7, 2024

@ZeroIntensity
Member

I don't think this an issue on our side, but probably with how the numpy data gets serialized. ndarray can't be serialized by multiprocessing, so it's going to have to guess how to serialize it. I'm not 100% certain how it does that, but it probably just uses the __str__ or something similar, which would explain the odd results.

You probably want to call tolist on out_img in your reproducer:

    queue.put(
    {
        'type' : 'not working',
        'data': out_img.tolist()
    }    

If that fixes it, let me know and I'll close this. If not, could you come up with a reproducer that does not involve numpy?

added
pendingThe issue will be closed if no feedback is provided
on Nov 7, 2024
ayushbisht2001

ayushbisht2001 commented on Nov 12, 2024

@ayushbisht2001
Author

@ZeroIntensity thanks for the explanation. But as per my use case if i use tolist(), then its taking more time for big sized ndarray. I'm not sure if nunpy is doing some optimization for big arrays. Also, if its just str, then consumer shouldn't get the data. With this approach, I'm actually getting the data.. could this be copy-on-write to efficiently transfer big numpy arrays?

ayushbisht2001

ayushbisht2001 commented on Nov 12, 2024

@ayushbisht2001
Author

I'll post this in numpy community channel, so far i haven't faced any issue with python generic types.

ZeroIntensity

ZeroIntensity commented on Nov 12, 2024

@ZeroIntensity
Member

so far i haven't faced any issue with python generic types.

Yeah, then that's a good sign that this isn't an issue on our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirpendingThe issue will be closed if no feedback is providedtopic-multiprocessingtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Weirdo multiprocessing: Shared objects taking more time in sharing smaller data than larger data between multiple processes. · Issue #126471 · python/cpython