Skip to content

Commit ca0f086

Browse files
authored
Merge pull request #353 from FerencBodon-Kx/KXI-59279
KXI-59279 extending compression performance
2 parents 02b749c + 8e116bd commit ca0f086

File tree

2 files changed

+22
-3
lines changed

2 files changed

+22
-3
lines changed

docs/kb/file-compression.md

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
22
title: File compression | Database | kdb+ and q documentation
33
description: How to work with compressed files in kdb+
4-
author: Stephen Taylor
5-
date: August 2022
4+
author: [Stephen Taylor, Ferenc Bodon]
5+
date: February 2025
66
---
77
# File compression
88

@@ -181,8 +181,19 @@ If you experience [`wsfull`](../basics/errors.md#wsfull) even with sufficient sw
181181

182182
## Performance
183183

184-
A single thread with full use of a core can decompress approx 300MB/s, depending on data/algorithm and level.
184+
There are three key aspects of compression algorithms:
185+
186+
1. **Compression ratio**: This indicates how much the final data file size is reduced. A high compression ratio means smaller files and lower storage, I/O costs. If the column files are smaller, we can store more data on a storage of a given size. Similarly, more storage space costs more (especially in the cloud). Smaller files may reduce query execution time if the storage is slow because smaller files are read.
187+
1. **Compression speed**: This measures the time required to compress a file. Compression is typically CPU-intensive, so a high compression speed minimizes CPU usage and associated costs. High compression speed is good. The time to save a column file determines the upper bound of data ingestion. The faster we can save a file, the more a kdb+ system can ingest. In the [kdb+ tick](../architecture/tickq.md) system, the RDB is unavailable for queries during write, meaning that write speed also affects system availability.
188+
1. **Decompression speed**: This reflects the time taken to restore the original file from the compressed (encrypted) version. High decompression speed means faster queries.
185189

190+
There is no single best compression algorithm that outperforms all others in all aspects. You need to select compression (or avoid compression) based on your priorities:
191+
192+
- Is achieving the fastest possible query execution more important to you, or do you prefer to minimize storage costs?
193+
- Does your kdb+ system handle a high volume of incoming data, requiring a reliable intraday write process to manage the data effectively?
194+
- Are you looking for a general solution that provides balanced performance across various aspects without excelling or underperforming in any particular area?
195+
196+
A single thread with full use of a core can decompress approx 300MB/s, depending on data/algorithm and level.
186197

187198
### Benchmarking
188199

@@ -241,6 +252,8 @@ The following libraries are required by kdb+:
241252
---|---|---
242253
libz.so.1 | libz.dylib<br>(pre-installed) | zlibwapi.dll<br>(32-bit and 64-bit versions available from [WinImage](http://www.winimage.com/zLibDll/index.html "winimage.com"))
243254

255+
Gzip has very good compression ratio and average compression/decompression speed. Avoid high compression levels (like 8 and 9) if write speed is important for you. Gzip with level 5 is a good general solution.
256+
244257
### Snappy
245258

246259
Compression algorithm `3` uses Snappy. Source and algorithm details can be found [here](http://google.github.io/snappy/).
@@ -250,6 +263,8 @@ The following libraries are required by kdb+:
250263
---|---|---
251264
libsnappy.so.1 | libsnappy.dylib<br>(available via package managers such as [Homebrew](https://brew.sh/) or [MacPorts](https://www.macports.org/)) | snappy.dll
252265

266+
Snappy has excellent compression and decompression speed so it is a good choice if you optimize for query speed and ingestion times. Snappy falls behind the other compression solutions in compression ratio.
267+
253268
### LZ4
254269

255270
Compression algorithm `4` uses LZ4. Source and algorithm details can be found [here](https://github.com/lz4/lz4).
@@ -266,6 +281,8 @@ liblz4.so.1 | liblz4.dylib<br>(available through package managers such as [Homeb
266281
kdb+ requires at least `lz4-r129`.
267282
`lz4-1.8.3` works.
268283
We recommend using the latest `lz4` [release](https://github.com/lz4/lz4/releases) available.
284+
285+
LZ4 is great at decompression speed and compression ratio but does not perform well in compression speed. Compression level 5 is a good choice if you aim fast queries and low storage costs. Avoid high compression levels (above 11).
269286

270287
### Zstd
271288

@@ -276,6 +293,7 @@ The following libraries are required by kdb+:
276293
---|---|---
277294
libzstd.so.1 | libzstd.1.dylib<br>(available via package managers such as [Homebrew](https://brew.sh/) or [MacPorts](https://www.macports.org/)) | libzstd.dll
278295

296+
Zstd is outstanding in compression ratio of low entropy columns. Use low compression level (like 1) if you optimize for compression (write) speed and increase level to achieve better compression ratio. Avoid high levels (above 14).
279297

280298
## Running kdb+ under Gdb
281299

docs/ref/get.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ q)(`:ztbl/;dic) set t / splay table compressed
214214
`:ztbl/
215215
```
216216

217+
!!! warning "Compression may speed up or slow down the execution of `set`. The [performance impact](../kb/file-compression.md#performance) depends mainly on the data characteristics and the storage speed."
217218

218219
----
219220
:fontawesome-solid-database:

0 commit comments

Comments
 (0)