Skip to content

dataloader2 is forced to copy data #12

@coreyjadams

Description

@coreyjadams

7% of execution time to read batches of data with dataloader2 is spent copying data. When trying to not copy data, an error occurs:

Traceback (most recent call last):
  File "bin/resnet3d.py", line 79, in <module>
    main()
  File "bin/resnet3d.py", line 53, in main
    trainer.initialize(io_only=True)
  File "/home/cadams/DLP3/NEXT_SparseEventID/src/utils/trainercore.py", line 155, in initialize
    self._initialize_io()
  File "/home/cadams/DLP3/NEXT_SparseEventID/src/utils/trainercore.py", line 82, in _initialize_io
    self._larcv_interface.prepare_manager('primary', io_config, FLAGS.MINIBATCH_SIZE, data_keys)
  File "/home/cadams/DLP3/dlp/lib/python2.7/site-packages/larcv-3.0a1-py2.7-linux-x86_64.egg/larcv/larcv_interface.py", line 90, in prepare_manager
    self.next(mode)
  File "/home/cadams/DLP3/dlp/lib/python2.7/site-packages/larcv-3.0a1-py2.7-linux-x86_64.egg/larcv/larcv_interface.py", line 116, in next
    self._dataloaders[mode].next(store_event_ids=True, store_entries=True)
  File "/home/cadams/DLP3/dlp/lib/python2.7/site-packages/larcv-3.0a1-py2.7-linux-x86_64.egg/larcv/dataloader2.py", line 257, in next
    storage.set_data(next_storage_id, batch_data)
  File "/home/cadams/DLP3/dlp/lib/python2.7/site-packages/larcv-3.0a1-py2.7-linux-x86_64.egg/larcv/dataloader2.py", line 66, in set_data
    self._npy_data = larcv.as_ndarray(larcv_batchdata.data())
NotImplementedError: Wrong number or type of arguments for overloaded function 'as_ndarray'.
  Possible C/C++ prototypes are:
    larcv3::as_ndarray(std::vector< short,std::allocator< short > > const &)
    larcv3::as_ndarray(std::vector< unsigned short,std::allocator< unsigned short > > const &)
    larcv3::as_ndarray(std::vector< long long,std::allocator< long long > > const &)
    larcv3::as_ndarray(std::vector< unsigned long long,std::allocator< unsigned long long > > const &)
    larcv3::as_ndarray(larcv3::Image2D const &)

It looks like there is some work to do to fix this, but it would give a moderate boost to io performance and data pipelining.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions