-
Notifications
You must be signed in to change notification settings - Fork 952
JNI bindings to write CSV #12425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
JNI bindings to write CSV #12425
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
21c3deb
JNI bindings to write CSV
mythrocks 7fa0204
Support for chunked CSV writes in JNI:
mythrocks faa64c4
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks 2fa91e7
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks e446ae3
Added tests header inclusion.
mythrocks 54a5a87
Formatting.
mythrocks c8f74de
Support to specify TRUE/FALSE strings.
mythrocks ebbfcb8
Added tests for combinations of True/False reps, header inclusion, etc.
mythrocks cce5574
Removed JNI's non-chunked CSV writes to memory.
mythrocks 7089163
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks 15693f1
Added newline at the end of the file, per CUDF guideline.
mythrocks 15e84c5
Removed unnecessary whitespace at top of file.
mythrocks 0da15a4
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks e9107c9
Re-added whitespace at end of file.
mythrocks 52f62e2
Fixed header order. Removed trailing newlines.
mythrocks af7eed3
Postpone setting _first_write till after write.
mythrocks 5728549
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks 0d82984
Trailing newlines.
mythrocks fa24027
Review changes:
mythrocks f5e30c5
More formatting .
mythrocks d3642a4
Updated documentation for _inter_column_delimiter.
mythrocks c83e0d9
Updated copyright date.
mythrocks 6dee89a
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks e4fa895
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks bfd2cd3
Review fixes:
mythrocks b57e8d9
Merge remote-tracking branch 'origin/branch-23.02' into hive-text-writer
mythrocks 8d9b374
Merge branch 'branch-23.02' into hive-text-writer
mythrocks File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
/* | ||
* Copyright (c) 2020-2022, NVIDIA CORPORATION. | ||
* Copyright (c) 2020-2023, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
|
@@ -1332,7 +1332,7 @@ class csv_writer_options { | |
size_type _rows_per_chunk = std::numeric_limits<size_type>::max(); | ||
// character to use for separating lines (default "\n") | ||
std::string _line_terminator = "\n"; | ||
// character to use for separating lines (default "\n") | ||
// character to use for separating column values (default ",") | ||
ttnghia marked this conversation as resolved.
Show resolved
Hide resolved
mythrocks marked this conversation as resolved.
Show resolved
Hide resolved
|
||
char _inter_column_delimiter = ','; | ||
// string to use for values != 0 in INT8 types (default 'true') | ||
std::string _true_value = std::string{"true"}; | ||
|
@@ -1422,9 +1422,9 @@ class csv_writer_options { | |
[[nodiscard]] std::string get_line_terminator() const { return _line_terminator; } | ||
|
||
/** | ||
* @brief Returns character used for separating lines. | ||
* @brief Returns character used for separating column values. | ||
* | ||
* @return Character used for separating lines | ||
* @return Character used for separating column values. | ||
*/ | ||
[[nodiscard]] char get_inter_column_delimiter() const { return _inter_column_delimiter; } | ||
|
||
|
@@ -1479,9 +1479,9 @@ class csv_writer_options { | |
void set_line_terminator(std::string term) { _line_terminator = term; } | ||
|
||
/** | ||
* @brief Sets character used for separating lines. | ||
* @brief Sets character used for separating column values. | ||
* | ||
* @param delim Character to indicate delimiting | ||
* @param delim Character to delimit column values | ||
*/ | ||
void set_inter_column_delimiter(char delim) { _inter_column_delimiter = delim; } | ||
|
||
|
@@ -1498,6 +1498,13 @@ class csv_writer_options { | |
* @param val String to represent values == 0 in INT8 types | ||
*/ | ||
void set_false_value(std::string val) { _false_value = val; } | ||
|
||
/** | ||
* @brief (Re)sets the table being written. | ||
* | ||
* @param table Table to be written | ||
*/ | ||
void set_table(table_view const& table) { _table = table; } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. As mentioned offline, we might want to look into separating the sink, input table and writer options to facilitate easier reuse of options. Not in scope for this PR. |
||
}; | ||
|
||
/** | ||
|
@@ -1586,9 +1593,9 @@ class csv_writer_options_builder { | |
} | ||
|
||
/** | ||
* @brief Sets character used for separating lines. | ||
* @brief Sets character used for separating column values. | ||
* | ||
* @param delim Character to indicate delimiting | ||
* @param delim Character to delimit column values | ||
* @return this for chaining | ||
*/ | ||
csv_writer_options_builder& inter_column_delimiter(char delim) | ||
|
134 changes: 134 additions & 0 deletions
134
java/src/main/java/ai/rapids/cudf/CSVWriterOptions.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
/* | ||
* | ||
* Copyright (c) 2023, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* | ||
*/ | ||
|
||
package ai.rapids.cudf; | ||
|
||
import java.util.ArrayList; | ||
import java.util.Collections; | ||
import java.util.List; | ||
|
||
public class CSVWriterOptions { | ||
mythrocks marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
private String[] columnNames; | ||
private Boolean includeHeader = false; | ||
private String rowDelimiter = "\n"; | ||
private byte fieldDelimiter = ','; | ||
private String nullValue = ""; | ||
private String falseValue = "false"; | ||
private String trueValue = "true"; | ||
|
||
private CSVWriterOptions(Builder builder) { | ||
this.columnNames = builder.columnNames.toArray(new String[builder.columnNames.size()]); | ||
this.nullValue = builder.nullValue; | ||
this.includeHeader = builder.includeHeader; | ||
this.fieldDelimiter = builder.fieldDelimiter; | ||
this.rowDelimiter = builder.rowDelimiter; | ||
this.falseValue = builder.falseValue; | ||
this.trueValue = builder.trueValue; | ||
} | ||
|
||
public String[] getColumnNames() { | ||
return columnNames; | ||
} | ||
|
||
public Boolean getIncludeHeader() { | ||
return includeHeader; | ||
} | ||
|
||
public String getRowDelimiter() { | ||
return rowDelimiter; | ||
} | ||
|
||
public byte getFieldDelimiter() { | ||
return fieldDelimiter; | ||
} | ||
|
||
public String getNullValue() { | ||
return nullValue; | ||
} | ||
|
||
public String getTrueValue() { | ||
return trueValue; | ||
} | ||
|
||
public String getFalseValue() { | ||
return falseValue; | ||
} | ||
|
||
public static Builder builder() { | ||
return new Builder(); | ||
} | ||
|
||
public static class Builder { | ||
|
||
private List<String> columnNames = Collections.emptyList(); | ||
private Boolean includeHeader = false; | ||
private String rowDelimiter = "\n"; | ||
private byte fieldDelimiter = ','; | ||
private String nullValue = ""; | ||
private String falseValue = "false"; | ||
private String trueValue = "true"; | ||
|
||
public CSVWriterOptions build() { | ||
return new CSVWriterOptions(this); | ||
} | ||
|
||
public Builder withColumnNames(List<String> columnNames) { | ||
this.columnNames = columnNames; | ||
return this; | ||
} | ||
|
||
public Builder withColumnNames(String... columnNames) { | ||
List<String> columnNamesList = new ArrayList<>(); | ||
for (String columnName : columnNames) { | ||
columnNamesList.add(columnName); | ||
} | ||
return withColumnNames(columnNamesList); | ||
} | ||
|
||
public Builder withIncludeHeader(Boolean includeHeader) { | ||
this.includeHeader = includeHeader; | ||
return this; | ||
} | ||
|
||
public Builder withRowDelimiter(String rowDelimiter) { | ||
this.rowDelimiter = rowDelimiter; | ||
return this; | ||
} | ||
|
||
public Builder withFieldDelimiter(byte fieldDelimiter) { | ||
this.fieldDelimiter = fieldDelimiter; | ||
return this; | ||
} | ||
|
||
public Builder withNullValue(String nullValue) { | ||
this.nullValue = nullValue; | ||
return this; | ||
} | ||
|
||
public Builder withTrueValue(String trueValue) { | ||
this.trueValue = trueValue; | ||
return this; | ||
} | ||
|
||
public Builder withFalseValue(String falseValue) { | ||
this.falseValue = falseValue; | ||
return this; | ||
} | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.