|
1 | 1 | # Raft FastLog |
2 | 2 |
|
3 | | -Dkron Pro includes support for the [Raft FastLog](https://github.com/tidwall/raft-fastlog) storage engine, which provides significant performance improvements over the default BoltDB storage. |
| 3 | +Dkron Pro supports the [Raft FastLog](https://github.com/tidwall/raft-fastlog) storage engine for Raft state. FastLog keeps the Raft log in memory for fast access while persisting it to disk, which can reduce Raft write latency and improve throughput compared to the default BoltDB-backed Raft log. |
4 | 4 |
|
5 | 5 | ## Overview |
6 | 6 |
|
7 | | -Raft FastLog is a high-performance storage engine designed specifically for Raft consensus logs. It offers: |
| 7 | +FastLog is optimized for Raft's append-heavy write pattern. It is a good fit when the default Raft storage becomes a bottleneck for cluster coordination, scheduling activity, or leader-side state changes. |
8 | 8 |
|
9 | | -- **Higher throughput** - Significantly faster write operations |
10 | | -- **Lower latency** - Reduced response times for log operations |
11 | | -- **Better concurrency** - Improved performance under concurrent load |
12 | | -- **Memory efficiency** - Optimized memory usage patterns |
| 9 | +Typical reasons to evaluate FastLog include: |
13 | 10 |
|
14 | | -## Performance Benefits |
| 11 | +- **Higher Raft throughput** for write-heavy workloads |
| 12 | +- **Lower commit latency** for Raft log operations |
| 13 | +- **Faster in-memory access** to recent Raft log entries |
| 14 | +- **Durability tuning** through the `--raft-duration` setting |
15 | 15 |
|
16 | | -FastLog typically provides: |
17 | | -- 10-100x faster write performance compared to BoltDB |
18 | | -- Reduced memory allocation and garbage collection pressure |
19 | | -- Better scaling characteristics under high load |
20 | | -- Lower CPU utilization for log operations |
| 16 | +FastLog is most useful when you are running busy production clusters, scheduling a high volume of jobs, or trying to reduce the overhead of Raft log persistence. |
21 | 17 |
|
22 | 18 | ## Configuration |
23 | 19 |
|
24 | | -To enable Raft FastLog, use the `--fast` command line option when starting Dkron Pro: |
| 20 | +Enable FastLog with the `--fast` flag when starting Dkron Pro. |
| 21 | + |
| 22 | +### CLI example |
| 23 | + |
| 24 | +```bash |
| 25 | +dkron agent --server --bootstrap-expect=3 --fast --raft-duration=0 |
| 26 | +``` |
| 27 | + |
| 28 | +### Environment variables |
25 | 29 |
|
26 | 30 | ```bash |
27 | | -dkron agent --fast |
| 31 | +DKRON_FAST=true |
| 32 | +DKRON_RAFT_DURATION=0 |
28 | 33 | ``` |
29 | 34 |
|
30 | | -## When to Use FastLog |
| 35 | +### Config file |
| 36 | + |
| 37 | +```yaml |
| 38 | +fast: true |
| 39 | +raft-duration: 0 |
| 40 | +``` |
| 41 | +
|
| 42 | +## Durability and performance tuning |
| 43 | +
|
| 44 | +The `--raft-duration` flag controls how aggressively FastLog flushes Raft log data to disk. Lower values favor performance. Higher values favor durability. |
| 45 | + |
| 46 | +| Value | Mode | Behavior | Trade-off | |
| 47 | +|-------|------|----------|-----------| |
| 48 | +| `-1` | Low | No explicit fsync by FastLog | Highest performance, highest risk of losing recent writes after a crash or power loss | |
| 49 | +| `0` | Mid | Fsync approximately once per second | Default setting and the best starting point for most deployments | |
| 50 | +| `1` | High | Fsync on every change | Strongest durability, lowest write throughput | |
| 51 | + |
| 52 | +For most clusters, start with `--raft-duration=0`. Move to `1` if you need the strongest durability guarantees, or test `-1` only if you fully understand the crash-recovery trade-off. |
| 53 | + |
| 54 | +## When to use FastLog |
| 55 | + |
| 56 | +FastLog is a good option when: |
| 57 | + |
| 58 | +- You are running a busy production cluster with frequent scheduling activity. |
| 59 | +- Raft log persistence is contributing noticeable latency. |
| 60 | +- You want more control over the durability/performance trade-off for Raft writes. |
| 61 | +- You are scaling the server side of the cluster and want to reduce Raft storage overhead. |
| 62 | + |
| 63 | +If your cluster is small and Raft storage is not a bottleneck, BoltDB may remain the simpler default choice. |
| 64 | + |
| 65 | +## Operational considerations |
| 66 | + |
| 67 | +### Storage format compatibility |
| 68 | + |
| 69 | +FastLog uses a different on-disk format than BoltDB. The Raft storage files are not interchangeable. |
| 70 | + |
| 71 | +:::warning |
| 72 | +Do not point a FastLog-enabled node at an existing BoltDB Raft data directory and expect it to reuse the old log files. Treat the switch as a storage migration. |
| 73 | +::: |
| 74 | + |
| 75 | +### Migration from BoltDB |
31 | 76 |
|
32 | | -FastLog is recommended for: |
33 | | -- High-frequency job scheduling scenarios |
34 | | -- Large clusters with many nodes |
35 | | -- Environments requiring low-latency job execution |
36 | | -- Production deployments with performance requirements |
| 77 | +Switching an existing cluster from BoltDB to FastLog should be planned as a maintenance operation. |
37 | 78 |
|
38 | | -## Considerations |
| 79 | +1. Stop all Dkron server nodes in the cluster. |
| 80 | +2. Back up the existing `data-dir` on every node. |
| 81 | +3. Remove or archive the existing Raft log data. |
| 82 | +4. Restart the cluster with `--fast` enabled on every server node. |
| 83 | +5. Verify cluster health, leadership, and job scheduling after the cluster comes back. |
39 | 84 |
|
40 | | -- FastLog requires Dkron Pro license |
41 | | -- Log files are not compatible between FastLog and BoltDB engines |
42 | | -- Ensure adequate disk space for log storage |
43 | | -- Monitor system resources during initial deployment |
| 85 | +Because the storage formats differ, you should not mix old BoltDB Raft state with a FastLog deployment. |
44 | 86 |
|
45 | | -## Migration |
| 87 | +### Rollout guidance |
46 | 88 |
|
47 | | -When switching from BoltDB to FastLog: |
| 89 | +- **Test in staging first** using a workload that resembles production. |
| 90 | +- **Monitor memory usage** because FastLog keeps the Raft log in memory for fast access. |
| 91 | +- **Watch disk latency and Raft health** after rollout, especially if you change `--raft-duration`. |
| 92 | +- **Keep backups** of your cluster data before changing Raft storage settings. |
48 | 93 |
|
49 | | -1. Stop all Dkron nodes in the cluster |
50 | | -2. Backup existing data |
51 | | -3. Start nodes with `--fast` flag |
52 | | -4. The cluster will rebuild its state from scratch |
| 94 | +## Summary |
53 | 95 |
|
54 | | -**Note**: Migration requires cluster restart and state rebuild. Plan accordingly for production environments. |
| 96 | +FastLog gives Dkron Pro operators a faster Raft storage backend with configurable durability. Enable it with `--fast`, tune it with `--raft-duration`, and plan migrations carefully because the underlying storage format is different from BoltDB. |
0 commit comments