You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/kb/file-compression.md
+21-3Lines changed: 21 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
---
2
2
title: File compression | Database | kdb+ and q documentation
3
3
description: How to work with compressed files in kdb+
4
-
author: Stephen Taylor
5
-
date: August 2022
4
+
author: [Stephen Taylor, Ferenc Bodon]
5
+
date: February 2025
6
6
---
7
7
# File compression
8
8
@@ -181,8 +181,19 @@ If you experience [`wsfull`](../basics/errors.md#wsfull) even with sufficient sw
181
181
182
182
## Performance
183
183
184
-
A single thread with full use of a core can decompress approx 300MB/s, depending on data/algorithm and level.
184
+
There are three key aspects of compression algorithms:
185
+
186
+
1.**Compression ratio**: This indicates how much the final data file size is reduced. A high compression ratio means smaller files and lower storage, I/O costs. If the column files are smaller, we can store more data on a storage of a given size. Similarly, more storage space costs more (especially in the cloud). Smaller files may reduce query execution time if the storage is slow because smaller files are read.
187
+
1.**Compression speed**: This measures the time required to compress a file. Compression is typically CPU-intensive, so a high compression speed minimizes CPU usage and associated costs. High compression speed is good. The time to save a column file determines the upper bound of data ingestion. The faster we can save a file, the more a kdb+ system can ingest. In the [kdb+ tick](../architecture/tickq.md) system, the RDB is unavailable for queries during write, meaning that write speed also affects system availability.
188
+
1.**Decompression speed**: This reflects the time taken to restore the original file from the compressed (encrypted) version. High decompression speed means faster queries.
185
189
190
+
There is no single best compression algorithm that outperforms all others in all aspects. You need to select compression (or avoid compression) based on your priorities:
191
+
192
+
- Is achieving the fastest possible query execution more important to you, or do you prefer to minimize storage costs?
193
+
- Does your kdb+ system handle a high volume of incoming data, requiring a reliable intraday write process to manage the data effectively?
194
+
- Are you looking for a general solution that provides balanced performance across various aspects without excelling or underperforming in any particular area?
195
+
196
+
A single thread with full use of a core can decompress approx 300MB/s, depending on data/algorithm and level.
186
197
187
198
### Benchmarking
188
199
@@ -241,6 +252,8 @@ The following libraries are required by kdb+:
241
252
---|---|---
242
253
libz.so.1 | libz.dylib<br>(pre-installed) | zlibwapi.dll<br>(32-bit and 64-bit versions available from [WinImage](http://www.winimage.com/zLibDll/index.html"winimage.com"))
243
254
255
+
Gzip has very good compression ratio and average compression/decompression speed. Avoid high compression levels (like 8 and 9) if write speed is important for you. Gzip with level 5 is a good general solution.
256
+
244
257
### Snappy
245
258
246
259
Compression algorithm `3` uses Snappy. Source and algorithm details can be found [here](http://google.github.io/snappy/).
@@ -250,6 +263,8 @@ The following libraries are required by kdb+:
250
263
---|---|---
251
264
libsnappy.so.1 | libsnappy.dylib<br>(available via package managers such as [Homebrew](https://brew.sh/) or [MacPorts](https://www.macports.org/)) | snappy.dll
252
265
266
+
Snappy has excellent compression and decompression speed so it is a good choice if you optimize for query speed and ingestion times. Snappy falls behind the other compression solutions in compression ratio.
267
+
253
268
### LZ4
254
269
255
270
Compression algorithm `4` uses LZ4. Source and algorithm details can be found [here](https://github.com/lz4/lz4).
@@ -266,6 +281,8 @@ liblz4.so.1 | liblz4.dylib<br>(available through package managers such as [Homeb
266
281
kdb+ requires at least `lz4-r129`.
267
282
`lz4-1.8.3` works.
268
283
We recommend using the latest `lz4` [release](https://github.com/lz4/lz4/releases) available.
284
+
285
+
LZ4 is great at decompression speed and compression ratio but does not perform well in compression speed. Compression level 5 is a good choice if you aim fast queries and low storage costs. Avoid high compression levels (above 11).
269
286
270
287
### Zstd
271
288
@@ -276,6 +293,7 @@ The following libraries are required by kdb+:
276
293
---|---|---
277
294
libzstd.so.1 | libzstd.1.dylib<br>(available via package managers such as [Homebrew](https://brew.sh/) or [MacPorts](https://www.macports.org/)) | libzstd.dll
278
295
296
+
Zstd is outstanding in compression ratio of low entropy columns. Use low compression level (like 1) if you optimize for compression (write) speed and increase level to achieve better compression ratio. Avoid high levels (above 14).
Copy file name to clipboardExpand all lines: docs/ref/get.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -214,6 +214,7 @@ q)(`:ztbl/;dic) set t / splay table compressed
214
214
`:ztbl/
215
215
```
216
216
217
+
!!! warning "Compression may speed up or slow down the execution of `set`. The [performance impact](../kb/file-compression.md#performance) depends mainly on the data characteristics and the storage speed."
0 commit comments