--float-precision Not Being Considered in qsv sqlp #2644
-
Describe the bug To Reproduce
Expected behavior Screenshots/Backtrace/Sample Data Desktop (please complete the following information):
Additional Notes |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 30 replies
-
Currently, the Looking at the Polars parquet writer - https://docs.pola.rs/api/rust/dev/polars/prelude/struct.ParquetWriter.html, it's not currently available. Will explore setting the precision on the Dataframe and see if its preserved downstream when saving to parquet. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the clarification. In the meantime, could you confirm if there are any workarounds or alternative approaches to explicitly control the precision for Parquet output? Like, would defining a schema with specific precision for floating-point columns help? |
Beta Was this translation helpful? Give feedback.
-
Your idea to define a schema beforehand may very well work @brian-mendicino Give it a try! Just be sure you're doing it on the latest release - v3.3.0. |
Beta Was this translation helpful? Give feedback.
-
$ cat table.csv
constant,value
e,2.71828182845904523536
pi,3.14159265358979323844
phi,1.61803398874989484820
$ qsv table table.csv
constant value
e 2.71828182845904523536
pi 3.14159265358979323844
phi 1.61803398874989484820
$ qsv sqlp table.csv 'select constant, ROUND(value, 10) from table' --format parquet --output table.parquet --quiet && qsv sqlp SKIP_INPUT "select * from read_parquet('table.parquet')" --quiet | qsv table
constant value
e 2.7182818285
pi 3.1415926536
phi 1.6180339887
$ qsv --version
qsv 3.3.0-mimalloc-apply;fetch;foreach;geocode;Luau 0.663;prompt;python-3.12.9 (main, Feb 4 2025, 00:00:00) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)];to;polars-0.46.0:py-1.26.0;self_update-4-4;6.12 GiB-17.02 GiB-3.13 GiB-7.65 GiB (x86_64-unknown-linux-gnu compiled with Rust 1.85) compiled |
Beta Was this translation helpful? Give feedback.
-
I agree that using Regarding the schema, I attempted to use a basic schema like the following:
However, this did not resolve the issue because it does not allow specifying the precision or scale for the Float64 type. (version 3.1.1, debian glibc still on 2.36) Another option may be to coerce decimal fields into strings. |
Beta Was this translation helpful? Give feedback.
-
@jqnatividad I was able to test with the following setup and observed different results than expected. Compiled from source
Command
Input
Actual Output
|
Beta Was this translation helpful? Give feedback.
-
What would prevent you from creating a suitable test file finding out? |
Beta Was this translation helpful? Give feedback.
@brian-mendicino
I enabled Polars support for the decimal data type, so you can now override the generated schema file to explicitly set precision and scale.
https://github.com/dathere/qsv/pull/2646/files
For example, instead of Float64, set it to a Decimal with precision 16 and scale 10: