Skip to content

899 extend, test and benchmark data transfer between Python and DAPHNE#949

Merged
pdamme merged 26 commits intodaphne-project:mainfrom
mariakrzywnicka:main
May 21, 2025
Merged

899 extend, test and benchmark data transfer between Python and DAPHNE#949
pdamme merged 26 commits intodaphne-project:mainfrom
mariakrzywnicka:main

Conversation

@irhox
Copy link
Copy Markdown
Contributor

@irhox irhox commented Feb 25, 2025

Description:

This PR contains additions, bug fixes, new tests and benchmarks on the data transfer features of the daphnelib library

This PR addresses the concerns in issue: #899 and was done by @mariakrzywnicka and @irhox

First update: We extended the tests regarding data transfer between python and daphne and found two bugs in the data transfer with numpy and pandas. One of them was fixed immediately, the other was reported.

@irhox irhox changed the title #899 add more comprehensive tests for the data transfer feature between daphne and different python libraries 899 add more comprehensive tests for the data transfer feature between daphne and different python libraries Feb 25, 2025
@irhox irhox marked this pull request as draft March 8, 2025 22:37
@irhox
Copy link
Copy Markdown
Contributor Author

irhox commented Mar 10, 2025

Since creating this PR, there have been a couple updates:

  • Support for data transfer between python lists and daphne has been added
  • Support for string data transfer via files has been added for numpy, pandas and python lists

@irhox
Copy link
Copy Markdown
Contributor Author

irhox commented Mar 15, 2025

Update:

@irhox
Copy link
Copy Markdown
Contributor Author

irhox commented Mar 17, 2025

Update:

@irhox
Copy link
Copy Markdown
Contributor Author

irhox commented Mar 31, 2025

Final updates:

  • Graphs for comparing the performance of matrix multiplication and matrix addition between different python libraries and daphnelib where created.
  • Graphs for comparing data transfer and data exchange between different python libraries and data structures and daphnelibs where created.
  • Documentation was adjusted to include new from_python() method

@irhox irhox changed the title 899 add more comprehensive tests for the data transfer feature between daphne and different python libraries 899 extend, test and benchmark data transfer between Python and DAPHNE Mar 31, 2025
@irhox irhox marked this pull request as ready for review March 31, 2025 22:01
@pdamme pdamme self-requested a review April 1, 2025 15:17
@pdamme pdamme added DaphneLib Related to DaphneLib (DAPHNE's Python API) LDE winter 2024/25 Student project in the course Large-scale Data Engineering at TU Berlin (winter 2024/25). labels Apr 1, 2025
@pdamme pdamme force-pushed the main branch 4 times, most recently from a24cae3 to a6614de Compare May 16, 2025 20:48
- Removed experimental results (not needed in the code base).
- Fixed minor issues in the documentation and the example script.
- Cleaned up the DaphneLib source code:
  - Merged from_numpy()/from_numpy_numeric() and from_pandas()/from_pandas_numeric(); these variants partly duplicated code and unnecessarily complicated the API.
  - Fixed the string data transfer via files (in code_line(), it stored the matrix as a JSON file, which cannot work correctly as JSON data cannot be read as CSV on the other side; the fix uses appropriate formatting for np.savetxt() depending on the dtype).
  - Removed some unused or irrelevant code.
  - Reintroduced various code lines that were removed without clear reason, e.g., in script.py.
  - Undid pure whitespace changes.
  - Various other little things.
- Restructured the new DaphneLib test cases.
  - Removed the unrelated "function_*" test cases.
  - Removed script files that were not used in any test case, e.g., "data_transfer_numpy.py" and "data_transfer_pandas.py".
  - More systematic naming scheme for the newly added test script files.
  - Reduced the number of newly introduced test script files by testing special float values (nan, inf, large, small, etc.) all in the same script instead of separate scripts.
  - Simplified the test scripts.
  - Undid the renaming of some existing test script files.
- Removed the specialization of the receiveFromNumpy-kernel for string value type, since string data transfer via shared memory is anyway not supported in this PR.
Copy link
Copy Markdown
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @irhox and @mariakrzywnicka for improving the data transfer between Python and DAPHNE by (a) adding support for more Python data structures (e.g., Python lists, string-valued numpy arrays, 1-dimensional numpy arrays), (b) fixing some bugs related to the data transfer, and (c) adding numerous additional test cases. This contribution is highly welcome.

I tidied up the code in this PR thoroughly (see the commit I added for details) and now it's ready to be merged.

@pdamme pdamme merged commit 14d1c24 into daphne-project:main May 21, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DaphneLib Related to DaphneLib (DAPHNE's Python API) LDE winter 2024/25 Student project in the course Large-scale Data Engineering at TU Berlin (winter 2024/25).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants