Implement Cookie serialization format#3463
Conversation
The exception handling in cudf jni is changed to prepare for support capturing native stacktrace when exception being thrown. That is breaking changes and this PR fixes it. No new feature/implementation is added. Depends on: * rapidsai/cudf#18983 This is part of [[Epic] Capture native stacktrace when throwing exception using cpptrace NVIDIA#3398](NVIDIA#3398). --------- Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
# Conflicts: # thirdparty/cudf
# Conflicts: # thirdparty/cudf-pins/versions.json
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
Signed-off-by: Nghia Truong <nghiat@nvidia.com>
|
hi @ttnghia , quick question, why do we need a new format here? |
We just want to dump data from memory to disk for later reading back. The data format needs to be just something that we can write/read in C++ quickly with minimal data conversion. I know that there are some external libraries for doing this, but we don't need much advanced functionalities thus a simple internal data format would be sufficient. |
|
NOTE: release/25.12 has been created from main. Please retarget your PR to release/25.12 if it should be included in the release. |
|
NOTE: release/26.02 has been created from main. Please retarget your PR to release/26.02 if it should be included in the release. |
|
NOTE: release/26.04 has been created from main. Please retarget your PR to release/26.04 if it should be included in the release. |
This implements
Cookieserialization format, a fast and efficient data serialization that targets efficiency for both data serialization/deserialization and disk IO. From the input as an array of host buffers given as byte arrays, the serializer simply compresses these byte arrays (using CPU thread pool) and assembles the compressed data along with other metadata into one output byte array for efficient disk IO. Deserialization is performed in the reversed way.Contribute to NVIDIA/spark-rapids#12509.