All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
9.0.0 - 2026-03-19
- [breaking] Parquet Decimal type is now inserted as Relational Type DECIMAL rather than VARCHAR.
- [breaking] Parquet Time type is now inserted as Relational Type TIME rather than VARCHAR.
- [breaking] Parquet Timestamp type is now inserted as Relational Type TIMESTAMP rather than VARCHAR.
This enables inserting with ODBC drivers which are less lenient with implicit conversions.
8.1.7 - 2026-03-18
- Remove superfluous backtick from example in README
- Fix a panic which occurred on windows then querying text columns.
8.1.6 - 2026-03-03
- Use simd implementation of encoding-rs for utf8 to utf16 path
- Use x86-64-v3 (AVX2, BMI, FMA — Haswell 2013+) for all x86-64
8.1.5 - 2025-11-26
- Update to odbc-api 20.1
- (deps) bump parquet from 57.0.0 to 57.1.0
- (deps) bump odbc-api from 19.1.0 to 20.0.0
- (deps) bump actions/checkout from 5 to 6
- (deps) bump bytesize from 2.2.0 to 2.3.0
- (deps) bump clap from 4.5.52 to 4.5.53
- (deps) bump clap_complete from 4.5.60 to 4.5.61
- (deps) bump clap from 4.5.51 to 4.5.52
- (deps) bump bytes from 1.10.1 to 1.11.0
- (deps) bump bytesize from 2.1.0 to 2.2.0
8.1.4 - 2025-11-02
- Repalace
Command::cargo_binwithcargo_bin_cmd! - replace lazy_static with Once
- (deps) bump assert_cmd from 2.0.17 to 2.1.1
- (deps) bump clap from 4.5.50 to 4.5.51
- (deps) bump clap_complete from 4.5.59 to 4.5.60
- Update to parquet 57
8.1.3 - 2025-09-23
- change test expectation to reflect formatting changes in parquet-read cli tool
- (deps) bump clap from 4.5.47 to 4.5.48
- (deps) bump odbc-api from 19.0.0 to 19.0.1
- (deps) bump anyhow from 1.0.99 to 1.0.100
- (deps) bump clap_complete from 4.5.57 to 4.5.58
- (deps) bump bytesize from 2.0.1 to 2.1.0
- update to newest version of chrono
- (deps) bump tempfile from 3.21.0 to 3.22.0
- (deps) bump odbc-api from 17.0.0 to 19.0.0
- (deps) bump log from 0.4.27 to 0.4.28
- (deps) bump clap from 4.5.46 to 4.5.47
- (deps) bump clap from 4.5.45 to 4.5.46
- (deps) bump parquet from 56.0.0 to 56.1.0
- (deps) bump tempfile from 3.20.0 to 3.21.0
8.1.2 - 2025-08-07
- Update to parquet 56
- Remove dev container
- (deps) bump clap from 4.5.42 to 4.5.43
- (deps) bump odbc-api from 14.2.0 to 14.2.1
- (deps) bump clap from 4.5.41 to 4.5.42
- (deps) bump odbc-api from 14.1.0 to 14.2.0
8.1.1 - 2025-07-20
- Statements are only prepared once for execute or insert subcommand.
- Update some comments in code
- (deps) bump clap from 4.5.40 to 4.5.41
- (deps) bump clap_complete from 4.5.54 to 4.5.55
- (deps) bump io-arg from 0.2.1 to 0.2.2
- (deps) bump odbc-api from 14.0.0 to 14.0.2
8.1.0 - 2025-06-26
- The
execsubcommand has been addded. It is a generalization ofinsertas it allows for passing the SQL statement on the command line. Named input placeholders are used to associate the statement parameters with columns in the input parquet file.
- (deps) bump odbc-api from 13.0.1 to 13.1.0
8.0.1 - 2025-06-24
-
Maintain parts of the filestem separated by dot (
.) in the presence of suffixes.There had been an issue writing into an output path which contained a dot (
.) in the file stem, like e.g.my_db.my_table.parquet. In case file splitting was enabled the last part of the file stem had been replaced by the suffix with the file number. E.g.my_db_01.parquet. After this fix it would now bemy_db.my_table_01.parquet.
- formatting
- Add unfinished feature flag to exec subcommand in order to facilitate bugfix release
- Introude array placeholder index
- Test switched order of exec input arguments
- Use TmpParquetFile in all insert tests.
- More test use TmpParquetFile
- Introduce TmpParquetFile test helper
- deactivate module execute if unfinished flag is not set
- Exec command
- (exec) unmasking of arguments
- Test for exec passes trivialy
- Failing integration test for exec subcommand
- introduce unfinished feature flag and start work on Exec command behind it
- (deps) bump clap from 4.5.39 to 4.5.40
- (deps) bump clap_complete from 4.5.52 to 4.5.54
- (deps) bump clap_complete from 4.5.51 to 4.5.52
- (deps) bump clap_complete from 4.5.50 to 4.5.51
- (deps) bump clap from 4.5.38 to 4.5.39
- (deps) bump odbc-api from 13.0.0 to 13.0.1
- (deps) bump odbc-api from 12.2.0 to 13.0.0
- update dependencies
- (deps) bump clap_complete from 4.5.48 to 4.5.50
- Use column attributes instead of description
- (deps) bump clap_complete from 4.5.47 to 4.5.48
- (deps) bump chrono from 0.4.40 to 0.4.41
- (deps) bump odbc-api from 12.0.1 to 12.0.2
- Add hint to install PostgreSQL ODBC drivers
- Update to PostgreSQL 17
- (deps) bump clap from 4.5.36 to 4.5.37
8.0.0 - 2025-04-17
- [breaking] Updated to newest parquet crate. As a consequece valid compression level ranges using
--column-compression-level-defaulthave changed. - (deps) bump assert_cmd from 2.0.16 to 2.0.17
- (deps) bump clap from 4.5.35 to 4.5.36
- (deps) bump anyhow from 1.0.97 to 1.0.98
7.0.4 - 2025-04-08
- update odbc-api
- (deps) bump clap from 4.5.34 to 4.5.35
- (deps) bump parquet from 54.3.0 to 54.3.1
- (deps) bump clap from 4.5.32 to 4.5.34
- (deps) bump log from 0.4.26 to 0.4.27
- Document how to add shell completions to powershell
- (deps) bump parquet from 54.2.0 to 54.3.0
- (deps) bump tempfile from 3.19.0 to 3.19.1
- (deps) bump clap_complete from 4.5.46 to 4.5.47
- (deps) bump tempfile from 3.18.0 to 3.19.0
- (deps) bump clap from 4.5.31 to 4.5.32
- (deps) bump tempfile from 3.17.1 to 3.18.0
- (deps) bump bytes from 1.10.0 to 1.10.1
- (deps) bump anyhow from 1.0.96 to 1.0.97
- (deps) bump bytesize from 2.0.0 to 2.0.1
- (deps) bump chrono from 0.4.39 to 0.4.40
- (deps) bump clap_complete from 4.5.45 to 4.5.46
- (deps) bump clap from 4.5.30 to 4.5.31
- (deps) bump bytesize from 1.3.2 to 2.0.0
- (deps) bump log from 0.4.25 to 0.4.26
- (deps) bump anyhow from 1.0.95 to 1.0.96
7.0.3 - 2025-02-19
- decimals will now have correct value, even if the datasource omits trailing zeroes in their string representation
- Replace asterisk (*) with dash (-) for lists in changelog
- (deps) bump clap from 4.5.29 to 4.5.30
- (deps) bump clap_complete from 4.5.44 to 4.5.45
- (deps) bump tempfile from 3.16.0 to 3.17.1
- (deps) bump parquet from 54.1.0 to 54.2.0
7.0.2 - 2025-02-16
- fix linux tests
- Update to odbc-api 11
- (deps) bump clap from 4.5.28 to 4.5.29
- (deps) bump bytesize from 1.3.0 to 1.3.2
- (deps) bump clap from 4.5.27 to 4.5.28
- (deps) bump bytes from 1.9.0 to 1.10.0
- (deps) bump parquet from 54.0.0 to 54.1.0
- (deps) bump clap_complete from 4.5.43 to 4.5.44
- (deps) bump tempfile from 3.15.0 to 3.16.0
- (deps) bump clap_complete from 4.5.42 to 4.5.43
7.0.1 - 2025-01-23
- If
--outputis ommited forcompletionssubcommand, emit shell completions to stdout - Restore help text for
--avoid-decimaloption in query subcommand
7.0.0 - 2025-01-06
In the past users struggled to find the --column-length-limit options. Therefore the default behavior of odbc2parquet is now to set it by default to 4096. In order to prevent silent data loss due to truncation as a consequence of this change, reporting truncation errors is now always active. In addition to that the error message for the truncation errors have been improved, mentioning the affected column as well as hinting that increasing the --column-length-limit option might be a good idea.
- [breaking] column-length-limit now defaults to 4096
- Report truncations for sequential fetches
- Mention column name in truncation error.
- Error message for truncation now hints at column-length-limit option.
- [breaking] The
--concurrent-fetchingflag has been removed, since concurrent fetching is now the new default behavior. The--sequential-fetchingflag has been introduced to opt into the old behaviour.
- Utilize upstream dependency
odbc-api 10.1which autodetects homebrew library path. This allows for easier builds on Mac-OS ARM platforms.
- Fix: A panic then inserting from a parquet file there the last row group has less rows than the other row groups.
- Introduced flag
--concurrent-fetching. Setting it uses separate system threads for writing to parquet and fetching from the database. This can be a significant speepedup, but also increases memory consumption.
Failed release
- DEBUG Log messages now show column names as text, rather than utf-8 bytes
- Not enough memory message mentions option to limit length using
--column-length-limit
- Utilize ODBC API version 3.5 instead of 3.8 to increase compatibility with older drivers.
- Binary release for Ubuntu ARM architectures. Thanks @sindilevich
- Fix release 6.0.1
- Binary release names for artifacts now end in architecture rather native bit size
- File extensions are now retained then splitting files. E.g. if
--outputis 'my_results.parquet' and split into two files they will be named 'my_results_01.parquet' and 'my_results_02.parquet'. Previously there has been always the ending '.par' attached.
- Fix: 5.1.0 introduced a regression, which caused output file enumeration to happen even if file splitting is not activated, if
--no-empty-filehad been set.
- Fix: 5.1.0 introduced a regression, which caused output file enumeration to be start with a suffix of
2instead of1if in addition to file splitting the--no-empty-fileflag had also been set.
- Additional log message with info level emitting the number of total number of rows written, the file size and the path for each file.
- Removed flag
--driver_returns_memory_garbage_for_indicators. Turns out the issue with IBM DB/2 drivers which triggered this can better be solved using a version of their ODBC driver which ends inoand is compiled with a 64Bit size forSQLLEN. - Release for MacOS M1 (Thanks to free tier of fly.io)
- Updated dependencies, including a bug fix in decimal parsing. Negative decimals smaller than 1 would have misjuged as positive.
- Updated dependencies, including an update to
parquet-rs 50
- Decimal parsing is now more robust, against different radix characters and missing trailing zeroes.
- Introduced flag
--driver-returns-memory-garbage-for-indicators. This is a reaction to witnessing IBM DB2/Linux drivers filling the indicator arrays with memory garbage. Activating this flag will activate a workaround using terminating zeroes to determine string length.odbc2parquetwill not be able to distinguish between empty strings and NULL anymore with this active and map everything to NULL. Currently the workaround is only active for UTF-8 encoded payloads.
- Default compression is now
zstdwith level3.
- Fix: If ODBC drivers report
-4(NO_TOTAL) as display size, the size can now be controlled with--column-length-limit. The issue occurred for JSON columns with MySQL.
- Fix: Invalid UTF-16 encoding emitted from the data source will now cause an error instead of a panic.
- Additional log message emitting the number of total rows fetched so far.
- Fix:
--no-empty-filenow works correctly with options causing files to be splitted like--file-size-thresholdor--row-groups-per-file.
- Zero sized columns are now treated as an error. Before
odbc2parquetissued a warning and ignored them. - Fix:
--file-size-thresholdhad an issue with not resetting the current file size after starting a new file. This caused only the first file to have the desired size. All subsequent files would contain only one row group each.
- Fix:
--column-length-limitnot only caused large variadic columns to have a sensible upper bound, but also caused columns with known smaller bound to allocate just as much memory, effectivly wasting a lot of memory in some scenarios. In this version the limit will only be applied if the column length actually exceeds the length limit specified. - Fix: Some typos in the
--helptext have been fixed.
- Fix: Then fetching relational type
TINYINT, the driver is queried for the signess of the column. The result is now reflected in the logical type written into parquet. In the past theTINYINThas always been assumed to be signed, even if the ODBC driver would have described the column as unsigned.
- Fix: The
--helpsubcommand for query wrongly listedbit-packedas supported.
- Explicitly check for lower and upper bound if writing timestamps with nanoseconds precision into a parquet file. Timestamps have to be between
1677-09-21 00:12:44and2262-04-11 23:47:16.854775807. Emit an error and abort if bound checks fails.
- Write timestamps with precision greater than seven as nanoseconds
- Write Parquet Version 2.0
- Establish semantic versioning
- Update dependencies
- Update dependencies
- Improves error message in case creating an output file fails.
- Accidential release from branch.
- Adds option
--column-compression-level-defaultto specify compression level explicitly.
- Introduced now option for
querysubcommand--column-length-limitto limit the memory allocated for an indivual variadic column of the result set.
- Updated dependencies
- Time(p: 7..) is mapped to Timestamp Nanoseconds for Microsoft SQL Server
- Time(p: 4..=6) is mapped to Timestamp Microseconds for Microsoft SQL Server
- Time(p: 0..=3) is mapped to Timestamp Milliseconds for Microsoft SQL Server
- Introduced new flag for
querysubcommand--no-empty-filewhich prevents creation of an output file in case the query comes back with0rows.
- Updated dependencies
- Time(p) is mapped to Timestamp Nanoseconds for Microsoft SQL Server
- Fix: Fixed an issue there setting the
--column-compression-defaulttosnappydid result in the column compression default actually being set tozstd.
- Introduce flag
avoid-decmialto produce output without logical typeDECIMAL. This allows artefacts without decimal support to process the output ofodbc2parquet.
- The level of verbosity had been one to high:
--quietnow suppresses warning messages as intended.-v->Info-vv->Debug
DATETIMEOFFSETonMicrosoft SQL Servernow is mapped toTIMESTAMPwith instant semantics i.e. its mapped to UTC and understood to reference a specific point in time, rather than a wall clock time.
- Allow specifying ODBC connection string via environment variable
ODBC_CONNECTION_STRINGinstead of--connection_stringoption.
- Pad suffixes
_01with leading zeroes to make the file names more friendly for lexical sorting if splitting fetch output. Number is padded to two digits by default. - Updated dependencies
- Use narrow text on non-windows platforms by default. Connection strings, queries and error messages are assumed to be UTF-8 and not transcoded to and from UTF-16.
- Physical type of
DECIMALis nowINT64instead ofFIXED_LEN_BYTE_ARRAYif precision does not exceed 18. - Physical type of
DECIMALis nowINT32instead ofFIXED_LEN_BYTE_ARRAYif precision does not exceed 9. - Dropped support for Decimals and a Numeric with precision higher than
38. Please open issue if required. Microsoft SQL does support this type up to this precision so currently there is no easy way to test forDECIMALs which can not be represented asi128. - Fetching decimal columns with scale
0and--driver-does-not-support-64bit-integersnow specifies the logical type asDECIMAL. Physical type remains a 64 Bit Integer. - Updated dependencies
- Updated dependencies
- Pass
-as query string to read the query statement text from standard in instead.
- Updated dependencies
- Release binary artifact for
x86_64-ubuntu.
- Introduced flag
--no-colorwhich allows to supress emitting colors for the log output.
querynow allows for specifying-as a positional output argument in order to stream to standard out instead of writing to a file.
--batches-per-fileis now named--row-groups-per-file.- New
queryoption--file-size-threshold.
- Fixed bug causing
--batch-size-memoryto be interpreted as many times the specified limit.
- Updated dependencies. Including
parquet 15.0.0 queryoption--batch-size-mibis now--batch-size-memoryand allows specifying inputs with SI units. E.g.2GiB.
- Updated dependencies. Improvements in upstream
odbc-apimay lead to faster insertion if using many batches.
- Updated dependencies. Including
parquet 14.0.0
- Updated dependencies.
- Undo: Recover from failed memory allocations of binary and text buffers, because of unclear performance implications.
- Recover from failed memory allocations of binary and text buffers and terminate the tool gracefully.
- Update dependencies. This includes an upstream improvement in
odbc-api 0.36.1which emits a better error if theunixODBCversion does not supportODBC 3.80.
- Updated dependencies
- Updated dependencies
Peace for the citizens of Ukraine who fight for their freedom and stand up to oppression. Peace for the Russian soldier, who does not know why he is shooting at his brothers and sisters, may he be reunited with his family soon.
Peace to 🇺🇦, 🇷🇺 and the world. May sanity prevail.
- Updating dependencies.
- Added message for Oracle users telling them about the
--driver-does-not-support-64bit-integers, if SQLFetch fails withHY004.
- Update dependencies. Including
parquet 9.0.2.
- Introduce flag
--driver-does-not-support-64bit-integersin order to compensate for missing 64 Bit integer support in the Oracle driver.
- Updated dependencies
- Including update to
parquet 8.0.0
- Including update to
- Updated dependencies
- Including update to
parquet 7.0.0
- Including update to
- New
Completionsubcommand to generate shell completions. - Fix: An issue with not reserving enough memories for the largest possible string if the octet length reported by the driver is to small. Now calculation is based on column sized.
- Update dependencies.
- Includes upstream fix: Passwords containing a
+character are now escaped if passed via the--passwordcommand line option.
- Update dependencies.
- Update dependencies.
- Use less memory for Text columns.
- Update dependencies
- Fix: Version number
- Fix: An issue with the mapping of ODBC data type FLOAT has been resolved. Before it had always been mapped to 32 Bit floating point precision. Now the precision of that column is also taken into account to map it to a 64 Bit floating point in case the precision exceeds 24.
-
Optimization: Required columns which do not require conversion to parquet types during fetch, are now no longer copied in the intermediate buffer. This will result in a little bit less memory usage and faster processing required (i.e. NOT NULL) columns with types:
- Double
- Real
- Float
- TinyInteger
- SmallInteger
- Integer
- Big Int
- and Decimals with Scale 0 and precision <= 18.
- Fix: An issue with the ODBC buffer allocated for
NUMERICandDECIMALtypes being two bytes to short, which could lead to wrong values being written into parquet without emmitting an error message.
- Both
--batch-size-rowand--batch-size-mibcan now be both specified together. - If no
--batch-size-*limit is specified. A row limit of 65535 is now also applied by default, next to the size limit.
- Fixed an issue where large batch sizes could cause failures writing Boolean columns.
- Updated dependencies
- Updated dependencies
- Updated dependencies
- Allow specifyig fallback encodings for output parquet files then using the
querysubcommand.
- Updated dependencies.
- Better error message in case unixODBC version is outdated.
- Default log level is now warning.
--quietflag has been introduced to supress warnings, if desired. - Introduce the
--prefer-varbinaryflag to thequerysubcommand. It allows for mappingBINARYSQL colmuns toBYTE_ARRAYinstead ofFIXED_LEN_BYTE_ARRAY. This flag has been introduced in an effort to increase the compatibility of the output with spark.
- Fix: Columns for which the driver reported size zero were not ignored then UTF-16 encoding had been enabled. This is the default setting for Windows. These columns are now complete missing from the output file, instead of the column being present with all values being NULL or empty strings.
- Introduced support for connecting via GUI on windows platforms via
--promptflag.
- Introduced
--column-compression-defaultin order to allow users to specify column compression. - Default column compression is now
gzip.
- Requires at least Rust 1.51.0 to build.
- Command line parameters
userandpasswordwill no longer be ignored then passed together with a connection string. Instead their values will be appended asUIDandPWDattributes at the end. --batch-sizecommand line flag has been renamed tobatch-size-row.- Introduced
--batch-size-mibcommand line flag to limit batch size based on memory usage - Default batch size is adapted so buffer allocation requires 2 GiB on 64 Bit platforms and 1 GiB on 32 Bit Platforms.
- Fix: There is now an error message produced if the resulting parquet file would not contain any columns.
- Add new sub command
insert.
- Fix: Right truncation of values in fixed sized
NCHARcolumns had occurred if a character in the value did use more than one byte in UTF-8 encoding (or more than two bytes for UTF-16).
- Fix: On windows platforms the tool is now using UTF-16 encoding by default to exchange character data with the data source. The behaviour has been changed since on most windows platform the system locale is not configured to use UTF-8. The behaviour can be configured manually on any platform using the newly introduced
--encodingoption.
- Fix: Interior nuls within
VARCHARvalues did cause the tool to panic. Now these values are written into parquet as is.
- Fix: Replace non UTF-8 characters with the UTF-8 replacement character (
�). ODBC encodes string according to the current locale, so this issue could cause non UTF-8 characters to be written into Parquet Text columns on Windows systems. If a non UTF-8 character is encountered a warning is generated hinting at the user to change to a UTF-8 locale.
VARBINARYandBINARYSQL columns are now mapped untoBYTE_ARRAYandFIXED_LEN_BYTE_ARRAYparquet physical types.
- Update dependencies
- Builds with stable Rust
- Update to
parquet 3.0.0.
- Maps ODBC
Timestamps with precision <= 3 to parquetTIMESTAMP_MILLISECONDS. - Updated dependencies
- Introduces option
--batches-per-filein order to define an upper limit for batches in a single output file and split output across multiple files.
- Fix: Microsoft SQL Server user defined types with unbounded lengths have been mapped to Text columns with length zero. This caused at least one warning per row. These columns are now ignored at the beginning, causing exactly one warning. They also do no longer appear in the output schema.
- Fix: Allocate extra space in text column for multi byte UTF-8 characters.
- SQL Numeric and Decimal are now always mapped to the parquet Decimal independent of the precision or scale. The 32Bit or 62Bit "physical" Integer representation is chosen for SQL types with Scale Null and a precision smaller ten or 19, otherwise the "physical" type is to be a fixed size byte array.
- Fix: Tool could panic if too many warnings were generated at once.
- Introduces subcommand
list-data-sources.
- Introduces subcommands.
queryis now required toquerythe database and store contents into parquet. - Introduces
driverssubcommand.
- Adds support for parameterized queries.
- Fix: A major issue caused columns containing NULL values to either cause a panic or even worse, produce a parquet file with wrong data in the affected column without showing any error at all.
- Binary release of 32 Bit Window executable on GitHub
- Binary release for OS-X
- Connection string is no longer a positional argument.
- Allow connecting to an ODBC datasource using dsn.
- Binary release of 64 Bit Window executable on GitHub
- Maps ODBC
Bitto ParquetBoolean. - Maps ODBC
Tinyintto ParquetINT 8. - Maps ODBC
Realto ParquetFloat. - Maps ODBC
Numericsame as it wouldDECIMAL.
- Default row group size is now
100000. - Adds support for Decimal types with precision 0..18 and scale = 0 (i.e. Everything that has a straightforward
i32ori64representation).
- Fix: Fixed an issue there some column types were not bound to the cursor, which let to some column only containing
NULLor0.
- Retrieve column names more reliably with a greater range of drivers.
- Log batch number and numbers of rows at info level.
- Log bound and detected ODBC type.
- Auto generate names for unnamed columns.
Initial release