-
Notifications
You must be signed in to change notification settings - Fork 451
Numpy DataItem #4347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Numpy DataItem #4347
Conversation
…e unused validation module
…, enhancing flexibility in dataset processing.
…reamline data processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the efforts. It looks good overall. I have a few minor comments
…type safety and clarity in collate function usage across OTXDataset and related classes.
…aming and error messaging across Numpy and PyTorch validation classes.
…ity in data processing across various tasks.
… use property method, improving clarity and consistency in dataset handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I am fine with the changes
… compatibility with Numpy-based data processing.
…ith NumpyDataBatch, improving compatibility with Numpy-based data processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, Eugene. That said, I feel now the NumpyDataItem is starting to overcomplicate things. We're now duplicating logic by preprocessing and collating everything separately into NumPy arrays. On top of that, for IR validation, we still need the annotations as PyTorch tensors since we're using torchmetrics for evaluation.
Would it make sense to support only NumPy images with optional transforms, and keep the annotations in torch format? From the perspective of the OV Engine and model implementations, it seems more consistent that way.
What do you think?
@kprokofi I think we should avoid depending on torchmetrics for IR inference/validation. Using torchmetrics requires converting both predictions and annotations into PyTorch tensors, which doesn’t make sense for IR tasks — the outputs are typically generic (e.g., lists, NumPy arrays) and shouldn't be tied to a specific deep learning framework. I also feel mixing NumPy arrays and torch tensors within a single data entity isn’t clean and could introduce bugs or unnecessary complexity. A more framework-agnostic alternative could be HuggingFace’s evaluate library (https://huggingface.co/docs/evaluate/index), which supports segmentation, object detection, and other tasks without depending on any particular DL framework. As for the duplication concern: if we remove What do you think? |
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.