Skip to content

Commit fa504ed

Browse files
committed
doc: document TableConfig
1 parent 844168f commit fa504ed

File tree

5 files changed

+259
-107
lines changed

5 files changed

+259
-107
lines changed

README.md

Lines changed: 91 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -104,28 +104,28 @@ The documentation provides two measures of complexity:
104104
The complexities are described in terms of the following variables and
105105
constants:
106106

107-
- The variable *n* refers to the number of *physical* table entries. A
107+
- The variable $`n`$ refers to the number of *physical* table entries. A
108108
*physical* table entry is any key–operation pair, e.g., `Insert k v`
109109
or `Delete k`, whereas a *logical* table entry is determined by all
110-
physical entries with the same key. If the variable *n* is used to
110+
physical entries with the same key. If the variable $`n`$ is used to
111111
describe the complexity of an operation that involves multiple tables,
112112
it refers to the sum of all table entries.
113113

114-
- The variable *o* refers to the number of open tables and cursors in
114+
- The variable $`o`$ refers to the number of open tables and cursors in
115115
the session.
116116

117-
- The variable *s* refers to the number of snapshots in the session.
117+
- The variable $`s`$ refers to the number of snapshots in the session.
118118

119-
- The variable *b* usually refers to the size of a batch of
119+
- The variable $`b`$ usually refers to the size of a batch of
120120
inputs/outputs. Its precise meaning is explained for each occurrence.
121121

122-
- The constant *B* refers to the size of the write buffer, which is a
123-
configuration parameter.
122+
- The constant $`B`$ refers to the size of the write buffer, which is
123+
determined by the `TableConfig` parameter `confWriteBufferAlloc`.
124124

125-
- The constant *T* refers to the size ratio of the table, which is a
126-
configuration parameter.
125+
- The constant $`T`$ refers to the size ratio of the table, which is
126+
determined by the `TableConfig` parameter `confSizeRatio`.
127127

128-
- The constant *P* refers to the the average number of key–value pairs
128+
- The constant $`P`$ refers to the the average number of key–value pairs
129129
that fit in a page of memory.
130130

131131
#### Disk I/O cost of operations <span id="performance_time" class="anchor"></span>
@@ -134,7 +134,9 @@ The following table summarises the cost of the operations on LSM-trees
134134
measured in the number of disk I/O operations. If the cost depends on
135135
the merge policy or merge schedule, then the table contains one entry
136136
for each relevant combination. Otherwise, the merge policy and/or merge
137-
schedule is listed as N/A.
137+
schedule is listed as N/A. The merge policy and merge schedule are
138+
determined by the `TableConfig` parameters `confMergePolicy` and
139+
`confMergeSchedule`.
138140

139141
<table>
140142
<thead>
@@ -273,39 +275,39 @@ schedule is listed as N/A.
273275
</tbody>
274276
</table>
275277

276-
(\*The variable *b* refers to the number of entries retrieved by the
278+
(\*The variable $`b`$ refers to the number of entries retrieved by the
277279
range lookup.)
278280

279281
TODO: Document the average-case behaviour of lookups.
280282

281283
#### In-memory size of tables <span id="performance_size" class="anchor"></span>
282284

283285
The in-memory size of an LSM-tree is described in terms of the variable
284-
*n*, which refers to the number of *physical* database entries. A
286+
$`n`$, which refers to the number of *physical* database entries. A
285287
*physical* database entry is any key–operation pair, e.g., `Insert k v`
286288
or `Delete k`, whereas a *logical* database entry is determined by all
287289
physical entries with the same key.
288290

289-
The worst-case in-memory size of an LSM-tree is *O*(*n*).
291+
The worst-case in-memory size of an LSM-tree is $`O(n)`$.
290292

291-
- The worst-case in-memory size of the write buffer is *O*(*B*).
293+
- The worst-case in-memory size of the write buffer is $`O(B)`$.
292294

293295
The maximum size of the write buffer on the write buffer allocation
294-
strategy, which is determined by the `confWriteBufferAlloc` field of
295-
`TableConfig`. Regardless of write buffer allocation strategy, the
296-
size of the write buffer may never exceed 4GiB.
296+
strategy, which is determined by the `TableConfig` parameter
297+
`confWriteBufferAlloc`. Regardless of write buffer allocation
298+
strategy, the size of the write buffer may never exceed 4GiB.
297299

298300
`AllocNumEntries maxEntries`
299301
The maximum size of the write buffer is the maximum number of entries
300302
multiplied by the average size of a key–operation pair.
301303

302-
- The worst-case in-memory size of the Bloom filters is *O*(*n*).
304+
- The worst-case in-memory size of the Bloom filters is $`O(n)`$.
303305

304306
The total in-memory size of all Bloom filters is the number of bits
305307
per physical entry multiplied by the number of physical entries. The
306308
required number of bits per physical entry is determined by the Bloom
307-
filter allocation strategy, which is determined by the
308-
`confBloomFilterAlloc` field of `TableConfig`.
309+
filter allocation strategy, which is determined by the `TableConfig`
310+
parameter `confBloomFilterAlloc`.
309311

310312
`AllocFixed bitsPerPhysicalEntry`
311313
The number of bits per physical entry is specified as
@@ -318,20 +320,20 @@ The worst-case in-memory size of an LSM-tree is *O*(*n*).
318320
The false-positive rate scales exponentially with the number of bits
319321
per entry:
320322

321-
| False-positive rate | Bits per entry |
322-
|---------------------|----------------|
323-
| 1 in 10 |  ≈ 4.77 |
324-
| 1 in 100 |  ≈ 9.85 |
325-
| 1 in 1, 000 |  ≈ 15.79 |
326-
| 1 in 10, 000 |  ≈ 22.58 |
327-
| 1 in 100, 000 |  ≈ 30.22 |
323+
| False-positive rate | Bits per entry |
324+
|---------------------------|--------------------|
325+
| $`1\text{ in }10`$ | $`\approx 4.77 `$ |
326+
| $`1\text{ in }100`$ | $`\approx 9.85 `$ |
327+
| $`1\text{ in }1{,}000`$ | $`\approx 15.79 `$ |
328+
| $`1\text{ in }10{,}000`$ | $`\approx 22.58 `$ |
329+
| $`1\text{ in }100{,}000`$ | $`\approx 30.22 `$ |
328330

329-
- The worst-case in-memory size of the indexes is *O*(*n*).
331+
- The worst-case in-memory size of the indexes is $`O(n)`$.
330332

331333
The total in-memory size of all indexes depends on the index type,
332-
which is determined by the `confFencePointerIndex` field of
333-
`TableConfig`. The in-memory size of the various indexes is described
334-
in reference to the size of the database in [*memory
334+
which is determined by the `TableConfig` parameter
335+
`confFencePointerIndex`. The in-memory size of the various indexes is
336+
described in reference to the size of the database in [*memory
335337
pages*](https://en.wikipedia.org/wiki/Page_%28computer_memory%29 "https://en.wikipedia.org/wiki/Page_%28computer_memory%29").
336338

337339
`OrdinaryIndex`
@@ -346,10 +348,66 @@ The worst-case in-memory size of an LSM-tree is *O*(*n*).
346348
a negligible amount of memory for tie breakers. The total in-memory
347349
size of all indexes is approximately 66 bits per memory page.
348350

349-
The total size of an LSM-tree must not exceed 2<sup>41</sup> physical
351+
The total size of an LSM-tree must not exceed $`2^{41}`$ physical
350352
entries. Violation of this condition *is* checked and will throw a
351353
`TableTooLargeError`.
352354

355+
#### Fine-tuning <span id="fine_tuning" class="anchor"></span>
356+
357+
An LSM-tree stores its data in a partially-sorted structure. The
358+
key–operation pairs are stored in *runs*, which are sorted sequences of
359+
key–operation pairs. The runs are organised in *levels*. The 0th level
360+
is the in-memory write buffer and all following levels are sequences of
361+
on-disk runs. Each level has a maximum size. The maximum size of the
362+
write buffer is determined by the configuration parameter
363+
`confWriteBufferAlloc`. The maximum size of every other level $`l`$ is
364+
$`l \times T \times B`$. The constant $`B`$ refers to the write buffer
365+
size and the constant $`T`$ refers to the size ratio. (See
366+
[Performance](#performance "#performance").)
367+
368+
``` math
369+
370+
\begin{array}{l:l:l:l}
371+
\text{Level}
372+
&
373+
\text{Tiering}
374+
&
375+
\text{Levelling}
376+
&
377+
\text{Lazy Levelling}
378+
\\
379+
0
380+
&
381+
\fbox{\(\texttt{4}\,\_\)}
382+
&
383+
\fbox{\(\texttt{4}\,\_\)}
384+
&
385+
\fbox{\(\texttt{4}\,\_\)}
386+
\\
387+
1
388+
&
389+
\fbox{\(\texttt{1}\,\texttt{3}\)}
390+
\quad
391+
\fbox{\(\texttt{2}\,\texttt{7}\)}
392+
&
393+
\fbox{\(\texttt{1}\,\texttt{2}\,\texttt{3}\,\texttt{7}\)}
394+
&
395+
\fbox{\(\texttt{1}\,\texttt{3}\)}
396+
\quad
397+
\fbox{\(\texttt{2}\,\texttt{7}\)}
398+
\\
399+
2
400+
&
401+
\fbox{\(\texttt{4}\,\texttt{5}\,\texttt{7}\,\texttt{8}\)}
402+
\quad
403+
\fbox{\(\texttt{0}\,\texttt{2}\,\texttt{3}\,\texttt{9}\)}
404+
&
405+
\fbox{\(\texttt{0}\,\texttt{2}\,\texttt{3}\,\texttt{4}\,\texttt{5}\,\texttt{6}\,\texttt{8}\,\texttt{9}\)}
406+
&
407+
\fbox{\(\texttt{0}\,\texttt{2}\,\texttt{3}\,\texttt{4}\,\texttt{5}\,\texttt{6}\,\texttt{8}\,\texttt{9}\)}
408+
\end{array}
409+
```
410+
353411
### Implementation
354412

355413
The implementation of LSM-trees in this package draws inspiration from:

bench/macro/lsm-tree-bench-wp8.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,7 @@ cmdP = O.subparser $ mconcat
227227

228228
setupOptsP :: O.Parser SetupOpts
229229
setupOptsP = pure SetupOpts
230-
<*> O.option O.auto (O.long "bloom-filter-alloc" <> O.value LSM.defaultBloomFilterAlloc <> O.showDefault <> O.help "Bloom filter allocation method [AllocFixed n | AllocRequestFPR d]")
230+
<*> O.option O.auto (O.long "bloom-filter-alloc" <> O.value (LSM.confBloomFilterAlloc LSM.defaultTableConfig) <> O.showDefault <> O.help "Bloom filter allocation method [AllocFixed n | AllocRequestFPR d]")
231231

232232
runOptsP :: O.Parser RunOpts
233233
runOptsP = pure RunOpts

lsm-tree.cabal

Lines changed: 65 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -71,15 +71,18 @@ description:
7171
* The variable \(s\) refers to the number of snapshots in the session.
7272
* The variable \(b\) usually refers to the size of a batch of inputs\/outputs.
7373
Its precise meaning is explained for each occurrence.
74-
* The constant \(B\) refers to the size of the write buffer, which is a configuration parameter.
75-
* The constant \(T\) refers to the size ratio of the table, which is a configuration parameter.
74+
* The constant \(B\) refers to the size of the write buffer,
75+
which is determined by the @TableConfig@ parameter @confWriteBufferAlloc@.
76+
* The constant \(T\) refers to the size ratio of the table,
77+
which is determined by the @TableConfig@ parameter @confSizeRatio@.
7678
* The constant \(P\) refers to the the average number of key–value pairs that fit in a page of memory.
7779

7880
=== Disk I\/O cost of operations #performance_time#
7981

8082
The following table summarises the cost of the operations on LSM-trees measured in the number of disk I\/O operations.
8183
If the cost depends on the merge policy or merge schedule, then the table contains one entry for each relevant combination.
8284
Otherwise, the merge policy and\/or merge schedule is listed as N\/A.
85+
The merge policy and merge schedule are determined by the @TableConfig@ parameters @confMergePolicy@ and @confMergeSchedule@.
8386

8487
+----------+------------------------+-----------------+-----------------+------------------------------------------------+
8588
| Resource | Operation | Merge policy | Merge schedule | Cost in disk I\/O operations |
@@ -132,7 +135,8 @@ description:
132135

133136
* The worst-case in-memory size of the write buffer is \(O(B)\).
134137

135-
The maximum size of the write buffer on the write buffer allocation strategy, which is determined by the @confWriteBufferAlloc@ field of @TableConfig@.
138+
The maximum size of the write buffer on the write buffer allocation strategy,
139+
which is determined by the @TableConfig@ parameter @confWriteBufferAlloc@.
136140
Regardless of write buffer allocation strategy, the size of the write buffer may never exceed 4GiB.
137141

138142
[@AllocNumEntries maxEntries@]:
@@ -141,7 +145,8 @@ description:
141145
* The worst-case in-memory size of the Bloom filters is \(O(n)\).
142146

143147
The total in-memory size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries.
144-
The required number of bits per physical entry is determined by the Bloom filter allocation strategy, which is determined by the @confBloomFilterAlloc@ field of @TableConfig@.
148+
The required number of bits per physical entry is determined by the Bloom filter allocation strategy,
149+
which is determined by the @TableConfig@ parameter @confBloomFilterAlloc@.
145150

146151
[@AllocFixed bitsPerPhysicalEntry@]:
147152
The number of bits per physical entry is specified as @bitsPerPhysicalEntry@.
@@ -166,7 +171,8 @@ description:
166171

167172
* The worst-case in-memory size of the indexes is \(O(n)\).
168173

169-
The total in-memory size of all indexes depends on the index type, which is determined by the @confFencePointerIndex@ field of @TableConfig@.
174+
The total in-memory size of all indexes depends on the index type,
175+
which is determined by the @TableConfig@ parameter @confFencePointerIndex@.
170176
The in-memory size of the various indexes is described in reference to the size of the database in [/memory pages/](https://en.wikipedia.org/wiki/Page_%28computer_memory%29).
171177

172178
[@OrdinaryIndex@]:
@@ -179,6 +185,60 @@ description:
179185
The total size of an LSM-tree must not exceed \(2^{41}\) physical entries.
180186
Violation of this condition /is/ checked and will throw a 'TableTooLargeError'.
181187

188+
=== Fine-tuning #fine_tuning#
189+
190+
An LSM-tree stores its data in a partially-sorted structure.
191+
The key–operation pairs are stored in /runs/, which are sorted sequences of key–operation pairs.
192+
The runs are organised in /levels/.
193+
The 0th level is the in-memory write buffer and all following levels are sequences of on-disk runs.
194+
Each level has a maximum size.
195+
The maximum size of the write buffer is determined by the configuration parameter @confWriteBufferAlloc@.
196+
The maximum size of every other level \(l\) is \(l \times T \times B\).
197+
The constant \(B\) refers to the write buffer size and the constant \(T\) refers to the size ratio.
198+
(See [Performance](#performance).)
199+
200+
\[
201+
\begin{array}{l:l:l:l}
202+
\text{Level}
203+
&
204+
\text{Tiering}
205+
&
206+
\text{Levelling}
207+
&
208+
\text{Lazy Levelling}
209+
\\
210+
0
211+
&
212+
\fbox{\(\texttt{4}\,\_\)}
213+
&
214+
\fbox{\(\texttt{4}\,\_\)}
215+
&
216+
\fbox{\(\texttt{4}\,\_\)}
217+
\\
218+
1
219+
&
220+
\fbox{\(\texttt{1}\,\texttt{3}\)}
221+
\quad
222+
\fbox{\(\texttt{2}\,\texttt{7}\)}
223+
&
224+
\fbox{\(\texttt{1}\,\texttt{2}\,\texttt{3}\,\texttt{7}\)}
225+
&
226+
\fbox{\(\texttt{1}\,\texttt{3}\)}
227+
\quad
228+
\fbox{\(\texttt{2}\,\texttt{7}\)}
229+
\\
230+
2
231+
&
232+
\fbox{\(\texttt{4}\,\texttt{5}\,\texttt{7}\,\texttt{8}\)}
233+
\quad
234+
\fbox{\(\texttt{0}\,\texttt{2}\,\texttt{3}\,\texttt{9}\)}
235+
&
236+
\fbox{\(\texttt{0}\,\texttt{2}\,\texttt{3}\,\texttt{4}\,\texttt{5}\,\texttt{6}\,\texttt{8}\,\texttt{9}\)}
237+
&
238+
\fbox{\(\texttt{0}\,\texttt{2}\,\texttt{3}\,\texttt{4}\,\texttt{5}\,\texttt{6}\,\texttt{8}\,\texttt{9}\)}
239+
\end{array}
240+
\]
241+
182242
== Implementation
183243

184244
The implementation of LSM-trees in this package draws inspiration from:

src/Database/LSMTree.hs

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,13 +113,12 @@ module Database.LSMTree (
113113
),
114114
defaultTableConfig,
115115
MergePolicy (LazyLevelling),
116+
MergeSchedule (..),
116117
SizeRatio (Four),
117118
WriteBufferAlloc (AllocNumEntries),
118119
BloomFilterAlloc (AllocFixed, AllocRequestFPR),
119-
defaultBloomFilterAlloc,
120120
FencePointerIndexType (OrdinaryIndex, CompactIndex),
121121
DiskCachePolicy (..),
122-
MergeSchedule (..),
123122

124123
-- ** Table Configuration Overrides #table_configuration_overrides#
125124
OverrideDiskCachePolicy (..),
@@ -205,8 +204,7 @@ import Database.LSMTree.Internal.Config
205204
DiskCachePolicy (..), FencePointerIndexType (..),
206205
MergePolicy (..), MergeSchedule (..), SizeRatio (..),
207206
TableConfig (..), WriteBufferAlloc (..),
208-
defaultBloomFilterAlloc, defaultTableConfig,
209-
serialiseKeyMinimalSize)
207+
defaultTableConfig, serialiseKeyMinimalSize)
210208
import Database.LSMTree.Internal.Config.Override
211209
(OverrideDiskCachePolicy (..))
212210
import qualified Database.LSMTree.Internal.Entry as Entry

0 commit comments

Comments
 (0)