Skip to content

Refactor jni writer data sink #12458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mythrocks
Copy link
Contributor

@mythrocks mythrocks commented Dec 31, 2022

Description

Fixes #12456.

This is purely a JNI change. The implementation of jni_writer_data_sink has been streamlined a little:

  1. The commonality in device_write() and host_write() has been moved to a common function.
  2. rotate_buffer() has been renamed to handle_buffer_and_reallocate().
  3. jni_writer_data_sink has been moved to the cudf::jni::io namespace.
  4. jni_writer_data_sink is now named writer_data_sink.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Note: This is purely a JNI change. This is a follow-up to #12425, which has yet to be merged. The changes in that PR are included in this one. Once #12425 is merged, this changes here will be revealed as minuscule.

This change adds JNI bindings to write tables out as CSV, to either the filesystem or memory.

The Java Table class now has additional methods:

1. Table.writeCSVToFile(): Writes the current table out to the specified file on the filesystem.
2. Table.writeCSVToBuffer(): Writes the current table out to a HostBufferConsumer.
These calls are analogous to cudf::io::write_csv().

Current limitations:

1. The cudf::io::csv_writer_options interface binds the CSV options tightly to the Table being written. This makes it a little clumsy to write multiple Tables to the same HostBufferConsumer, because each could be written with different, contradictory options.
2. cudf::io::write_csv(file_name) overwrites the specified file, if it exists. There currently isn't a way to keep a file open, and write multiple tables to it; each write call overwrites the previous file.
1. Added setter to change Table instance in csv_writer_options.
2. Plumbing for new chunked writer.
3. Tests.
1. Formatting.
2. Better names for JNI CSV functions.
Moved commonality between host_write() and device_write() to common
place.

Signed-off-by: MithunR <[email protected]>
@mythrocks mythrocks requested review from a team as code owners December 31, 2022 00:28
@mythrocks mythrocks self-assigned this Dec 31, 2022
@mythrocks mythrocks requested review from bdice and divyegala December 31, 2022 00:28
@github-actions github-actions bot added Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Dec 31, 2022
@mythrocks mythrocks marked this pull request as draft December 31, 2022 00:31
@mythrocks mythrocks added tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed libcudf Affects libcudf (C++/CUDA) code. labels Dec 31, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Dec 31, 2022
@mythrocks mythrocks mentioned this pull request Dec 31, 2022
3 tasks
@codecov
Copy link

codecov bot commented Dec 31, 2022

Codecov Report

Base: 86.58% // Head: 85.71% // Decreases project coverage by -0.86% ⚠️

Coverage data is based on head (bbfebdf) compared to base (b6dccb3).
Patch has no changes to coverable lines.

❗ Current head bbfebdf differs from pull request most recent head e1b6703. Consider uploading reports for the commit e1b6703 to get more accurate results

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #12458      +/-   ##
================================================
- Coverage         86.58%   85.71%   -0.87%     
================================================
  Files               155      155              
  Lines             24368    24798     +430     
================================================
+ Hits              21098    21255     +157     
- Misses             3270     3543     +273     
Impacted Files Coverage Δ
python/cudf/cudf/_version.py 1.41% <0.00%> (-98.59%) ⬇️
python/cudf/cudf/core/buffer/spill_manager.py 72.50% <0.00%> (-7.50%) ⬇️
python/cudf/cudf/core/buffer/spillable_buffer.py 90.04% <0.00%> (-2.81%) ⬇️
python/cudf/cudf/utils/dtypes.py 77.77% <0.00%> (-1.69%) ⬇️
python/cudf/cudf/options.py 86.11% <0.00%> (-1.59%) ⬇️
python/cudf/cudf/core/single_column_frame.py 94.30% <0.00%> (-1.27%) ⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py 98.38% <0.00%> (-1.01%) ⬇️
python/dask_cudf/dask_cudf/io/parquet.py 91.81% <0.00%> (-0.59%) ⬇️
python/cudf/cudf/core/multiindex.py 91.66% <0.00%> (-0.51%) ⬇️
python/cudf/cudf/core/algorithms.py 90.00% <0.00%> (-0.48%) ⬇️
... and 36 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mythrocks
Copy link
Contributor Author

Closing this. I'm not sure we need to chase this down. Mostly nitpicks.

@mythrocks mythrocks closed this May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor JNI jni_writer_data_sink
1 participant