You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below are **8 focused, high-quality references** (official docs, deep dives, and case studies) that back the key points I used earlier: VictoriaMetrics’ column-oriented parts (timestamps/values), global inverted index (TSID/MetricID), efficient merges/deduplication, and comparative case studies showing resource savings versus Prometheus. I list each reference with a one-sentence summary of which claim it supports so you can jump straight to the most relevant material.
1.**VictoriaMetrics — official docs (home)** — central documentation for VM (architecture, components, deployment). Supports general architectural claims and links to component docs. ([docs.victoriametrics.com][1])
@@ -18,9 +16,7 @@ Below are **8 focused, high-quality references** (official docs, deep dives, and
18
16
19
17
5.**VictoriaMetrics articles & performance tips (docs/blogs)** — collection of articles including string interning, performance techniques and part layout (timestamps/values per part). Supports columnar/time/value separation and memory optimizations. ([docs.victoriametrics.com][5])
# Comparative / independent analyses and case studies
25
21
26
22
6.**Prometheus — Storage (official docs)** — Prometheus TSDB block model, 2-hour blocks, per-block index; supports the discussion of Prometheus’ per-block index and compaction behavior. ([prometheus.io][6])
@@ -32,8 +28,6 @@ Below are **8 focused, high-quality references** (official docs, deep dives, and
32
28
8.**Independent write-ups / explainers (blogs, medium, community posts)** — several third-party analyses summarizing VM internals, columnar parts, and comparative experiences (e.g., Medium posts, dbdb.io, Groundcover explainer). These provide independent corroboration and useful diagrams/interpretations. ([Medium][8])
33
29
_Use for:_ alternative explanations, diagrams, and community benchmarks.
34
30
35
-
---
36
-
37
31
# Quick mapping: which references support which major claims
38
32
39
33
-**VM stores timestamps & values in separate per-part files (column-like layout):** vmstorage blog + articles. ([VictoriaMetrics][3])
@@ -42,18 +36,6 @@ Below are **8 focused, high-quality references** (official docs, deep dives, and
42
36
-**Prometheus uses 2-hour blocks with per-block indexes (leads to duplicated index data):** Prometheus storage docs. ([prometheus.io][6])
43
37
-**Real-world resource improvements after migrating from Prometheus to VM:** PingCAP TiDB migration case study. ([TiDB][7])
44
38
45
-
---
46
-
47
-
# If you want next steps
48
-
49
-
I can do any of the following (pick one and I’ll proceed):
50
-
51
-
- Compile a **downloadable pack** (PDF or Markdown) with excerpts and relevant snippets from the sources above (with full links and short quotes).
52
-
- Extract and compare **specific on-disk file formats** (exact file names, byte layout) from VM docs vs Prometheus TSDB format.
53
-
- Produce a **brief annotated benchmark** plan you can run (prometheus + remote_write → VM) with metrics to capture (disk, RAM, CPU, query latency) and expected outcomes.
54
-
55
-
Which of those would you like me to prepare now?
56
-
57
39
[1]: https://docs.victoriametrics.com/?utm_source=chatgpt.com"Welcome to VictoriaMetrics Docs"
[7]: https://www.pingcap.com/blog/tidb-observability-migrating-prometheus-victoriametrics/?utm_source=chatgpt.com"Moving from Prometheus to VictoriaMetrics"
64
46
[8]: https://alexmarket.medium.com/victoriametrics-a-look-inside-its-innards-b00eaa2a1e32?utm_source=chatgpt.com"VictoriaMetrics: a look inside its innards. | by Alex - Medium"
47
+
48
+
---
49
+
50
+
# Architecture Deep Dive: VictoriaMetrics vs. Prometheus
51
+
52
+
## Executive Summary
53
+
54
+
While both systems share the same goal (ingesting and querying metrics), their internal architectures optimize for completely different patterns.
55
+
56
+
***Prometheus** is optimized for **fast recent reads**. It sacrifices memory and write-flexibility to keep the "Head" of data instantly accessible in RAM.
57
+
***VictoriaMetrics** is optimized for **high-volume writes and storage efficiency**. It uses a Log-Structured Merge (LSM) tree approach (similar to ClickHouse or LevelDB) to decouple ingestion from storage, making it resilient to high churn and massive scale.
58
+
59
+
## 1. The Indexing Engine: Inverted Index vs. LSM Tree
60
+
61
+
The most critical difference lies in how they map a Label (e.g., `pod="A"`) to the data on disk.
62
+
63
+
### Prometheus: The Classic Inverted Index
64
+
65
+
Prometheus uses a search-engine style index designed for **immutable blocks**.
66
+
67
+
***Structure:**
68
+
* **Posting Lists:** A sorted list of Series IDs for every label value.
69
+
* **Offset Table:** A lookup table pointing to where the Posting List begins on disk.
70
+
***The Workflow:**
71
+
***Write:** New series live in the **Head Block** (RAM). When this block flushes (every 2h), Prometheus must "stop the world" to rewrite the entire index sequentially.
72
+
***High Churn Penalty:** If you add 100k ephemeral pods, the Head Block explodes in size because it must track every mapping in memory. Flushing requires rewriting the entire Posting List structure, leading to massive I/O and CPU spikes.
73
+
74
+
### VictoriaMetrics: The Key-Value LSM Tree
75
+
76
+
VictoriaMetrics does **not** use a separate index file format. It treats index entries as simple Key-Value pairs in an LSM tree (the `MergeSet`).
* **Storage:** These keys are appended to a log structure and sorted in the background.
81
+
***The Workflow:**
82
+
***Write:** Adding a new series is just an **append-only** operation. VM writes a new key to the end of the LSM tree.
83
+
***High Churn Advantage:** There is no "read-modify-write" penalty. Creating 100k new pods just means writing 100k small keys. The heavy lifting (sorting/merging) happens lazily in the background.
C -->|Must rewrite entire list on update| D[High Churn = Pain]
93
+
end
94
+
95
+
subgraph "VictoriaMetrics: LSM Stream"
96
+
X[New Series: pod='A'] -->|Append Key| Y[LSM Tree Log]
97
+
Y -->|Key: pod=A+MetricID_1| Z[Background Merge]
98
+
Z -->|No Read penalty on Write| W[High Churn = Easy]
99
+
end
100
+
```
101
+
102
+
## 2. The I/O Path: WAL vs. Compressed Parts
103
+
104
+
This explains why VictoriaMetrics has "smoother" disk usage despite flushing more frequently.
105
+
106
+
### Prometheus: The WAL (Write Ahead Log)
107
+
108
+
***Strategy:** Immediate durability via a WAL file.
109
+
***Write Pattern:** Every sample is appended to the WAL file on disk.
110
+
***Compression:** None/Low (for speed).
111
+
***Payload:** Large (Raw bytes).
112
+
***Fsync:** Infrequent (usually on segment rotation or checkpoint). Relying on OS page cache.
113
+
***Consequence:** High "Write Amplification" during compaction. The disk is constantly busy writing raw data, and then gets hammered every 2 hours when the Head Block flushes.
114
+
115
+
### VictoriaMetrics: The Buffered Flush
116
+
117
+
***Strategy:** Periodic durability via compressed micro-parts.
118
+
***Write Pattern:** Data is buffered in RAM (`inmemoryPart`) and flushed every \~1-5 seconds.
119
+
***Compression:** High (ZSTD-like + Gorilla).
120
+
***Payload:** Tiny (Data is compressed *before* writing).
121
+
***Fsync:****Frequent** (Every flush).
122
+
***Consequence:** Even though VM calls `fsync` every few seconds, the **payload is so small** (50KB vs 2MB) that modern SSDs handle it effortlessly. This avoids the "Stop the World" I/O spikes seen in Prometheus.
123
+
124
+
**Critical Trade-off:** VictoriaMetrics sacrifices the last \~5 seconds of data (held in RAM buffer) in the event of a hard crash (`kill -9`) to achieve this massive I/O gain. Prometheus recovers everything via WAL replay.
|**RAM Usage**|**High.** Scales with Active Series + Ingestion Rate (Head Block bloat). |**Low.** Scales with Cache Size. Writes don't require massive RAM buffers. |
131
+
|**CPU Usage**|**High.** Uses 1 Goroutine per target. Garbage Collector struggles with millions of small objects in Head. |**Optimized.** Uses a fixed-size worker pool. Optimized code reduces memory allocations, lightening the load on the GC. |
132
+
|**Disk Space**|**Standard.**\~1.5 bytes per sample. |**Ultra-Low.**\~0.4 bytes per sample. Precision reduction + better compression algorithms. |
133
+
|**Operation**|**Spiky.** Periodic heavy loads (Compaction/GC). |**Smooth.** Continuous small background merges. |
134
+
135
+
## Summary Recommendation
136
+
137
+
***Stick with Prometheus if:** You have a small-to-medium static environment, you need 100% standard adherence, and you cannot tolerate even 1 second of data loss on a crash.
138
+
***Switch to VictoriaMetrics if:**
139
+
1.**High Churn:** You run Kubernetes with frequent deployments or auto-scaling.
140
+
2.**Long Retention:** You need to store months/years of data cheaply.
141
+
3.**Performance Issues:** Your Prometheus is OOMing or using too much CPU.
142
+
143
+
### Final "Under the Hood" Visualization
144
+
145
+
This graph (referenced from our earlier discussion) summarizes the reality: Prometheus shows a "Sawtooth" pattern of resource usage (building up to a flush), whereas VictoriaMetrics shows a "Flat" line, making it far easier to capacity plan for production clusters.
0 commit comments