Skip to content

Commit 632bac1

Browse files
committed
victoriametrics vs. prometheus
1 parent 86d5263 commit 632bac1

File tree

1 file changed

+100
-19
lines changed

1 file changed

+100
-19
lines changed

victoriametrics/vs-prometheus.md

Lines changed: 100 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
Below are **8 focused, high-quality references** (official docs, deep dives, and case studies) that back the key points I used earlier: VictoriaMetrics’ column-oriented parts (timestamps/values), global inverted index (TSID/MetricID), efficient merges/deduplication, and comparative case studies showing resource savings versus Prometheus. I list each reference with a one-sentence summary of which claim it supports so you can jump straight to the most relevant material.
22

3-
---
4-
53
# Primary (official) documentation & engineering blogs
64

75
1. **VictoriaMetrics — official docs (home)** — central documentation for VM (architecture, components, deployment). Supports general architectural claims and links to component docs. ([docs.victoriametrics.com][1])
@@ -18,9 +16,7 @@ Below are **8 focused, high-quality references** (official docs, deep dives, and
1816

1917
5. **VictoriaMetrics articles & performance tips (docs/blogs)** — collection of articles including string interning, performance techniques and part layout (timestamps/values per part). Supports columnar/time/value separation and memory optimizations. ([docs.victoriametrics.com][5])
2018
_Use for:_ concrete optimizations (string interning, selective loading).
21-
22-
---
23-
19+
s
2420
# Comparative / independent analyses and case studies
2521

2622
6. **Prometheus — Storage (official docs)** — Prometheus TSDB block model, 2-hour blocks, per-block index; supports the discussion of Prometheus’ per-block index and compaction behavior. ([prometheus.io][6])
@@ -32,8 +28,6 @@ Below are **8 focused, high-quality references** (official docs, deep dives, and
3228
8. **Independent write-ups / explainers (blogs, medium, community posts)** — several third-party analyses summarizing VM internals, columnar parts, and comparative experiences (e.g., Medium posts, dbdb.io, Groundcover explainer). These provide independent corroboration and useful diagrams/interpretations. ([Medium][8])
3329
_Use for:_ alternative explanations, diagrams, and community benchmarks.
3430

35-
---
36-
3731
# Quick mapping: which references support which major claims
3832

3933
- **VM stores timestamps & values in separate per-part files (column-like layout):** vmstorage blog + articles. ([VictoriaMetrics][3])
@@ -42,18 +36,6 @@ Below are **8 focused, high-quality references** (official docs, deep dives, and
4236
- **Prometheus uses 2-hour blocks with per-block indexes (leads to duplicated index data):** Prometheus storage docs. ([prometheus.io][6])
4337
- **Real-world resource improvements after migrating from Prometheus to VM:** PingCAP TiDB migration case study. ([TiDB][7])
4438

45-
---
46-
47-
# If you want next steps
48-
49-
I can do any of the following (pick one and I’ll proceed):
50-
51-
- Compile a **downloadable pack** (PDF or Markdown) with excerpts and relevant snippets from the sources above (with full links and short quotes).
52-
- Extract and compare **specific on-disk file formats** (exact file names, byte layout) from VM docs vs Prometheus TSDB format.
53-
- Produce a **brief annotated benchmark** plan you can run (prometheus + remote_write → VM) with metrics to capture (disk, RAM, CPU, query latency) and expected outcomes.
54-
55-
Which of those would you like me to prepare now?
56-
5739
[1]: https://docs.victoriametrics.com/?utm_source=chatgpt.com "Welcome to VictoriaMetrics Docs"
5840
[2]: https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/?utm_source=chatgpt.com "VictoriaMetrics: Single-node version"
5941
[3]: https://victoriametrics.com/blog/vmstorage-retention-merging-deduplication/?utm_source=chatgpt.com "How vmstorage Processes Data: Retention, Merging, ..."
@@ -62,3 +44,102 @@ Which of those would you like me to prepare now?
6244
[6]: https://prometheus.io/docs/prometheus/latest/storage/?utm_source=chatgpt.com "Storage"
6345
[7]: https://www.pingcap.com/blog/tidb-observability-migrating-prometheus-victoriametrics/?utm_source=chatgpt.com "Moving from Prometheus to VictoriaMetrics"
6446
[8]: https://alexmarket.medium.com/victoriametrics-a-look-inside-its-innards-b00eaa2a1e32?utm_source=chatgpt.com "VictoriaMetrics: a look inside its innards. | by Alex - Medium"
47+
48+
---
49+
50+
# Architecture Deep Dive: VictoriaMetrics vs. Prometheus
51+
52+
## Executive Summary
53+
54+
While both systems share the same goal (ingesting and querying metrics), their internal architectures optimize for completely different patterns.
55+
56+
* **Prometheus** is optimized for **fast recent reads**. It sacrifices memory and write-flexibility to keep the "Head" of data instantly accessible in RAM.
57+
* **VictoriaMetrics** is optimized for **high-volume writes and storage efficiency**. It uses a Log-Structured Merge (LSM) tree approach (similar to ClickHouse or LevelDB) to decouple ingestion from storage, making it resilient to high churn and massive scale.
58+
59+
## 1. The Indexing Engine: Inverted Index vs. LSM Tree
60+
61+
The most critical difference lies in how they map a Label (e.g., `pod="A"`) to the data on disk.
62+
63+
### Prometheus: The Classic Inverted Index
64+
65+
Prometheus uses a search-engine style index designed for **immutable blocks**.
66+
67+
* **Structure:**
68+
* **Posting Lists:** A sorted list of Series IDs for every label value.
69+
* **Offset Table:** A lookup table pointing to where the Posting List begins on disk.
70+
* **The Workflow:**
71+
* **Write:** New series live in the **Head Block** (RAM). When this block flushes (every 2h), Prometheus must "stop the world" to rewrite the entire index sequentially.
72+
* **High Churn Penalty:** If you add 100k ephemeral pods, the Head Block explodes in size because it must track every mapping in memory. Flushing requires rewriting the entire Posting List structure, leading to massive I/O and CPU spikes.
73+
74+
### VictoriaMetrics: The Key-Value LSM Tree
75+
76+
VictoriaMetrics does **not** use a separate index file format. It treats index entries as simple Key-Value pairs in an LSM tree (the `MergeSet`).
77+
78+
* **Structure:**
79+
* **Keys:** `Prefix + LabelName + LabelValue + MetricID`.
80+
* **Storage:** These keys are appended to a log structure and sorted in the background.
81+
* **The Workflow:**
82+
* **Write:** Adding a new series is just an **append-only** operation. VM writes a new key to the end of the LSM tree.
83+
* **High Churn Advantage:** There is no "read-modify-write" penalty. Creating 100k new pods just means writing 100k small keys. The heavy lifting (sorting/merging) happens lazily in the background.
84+
85+
<!-- end list -->
86+
87+
```mermaid
88+
flowchart TD
89+
subgraph "Prometheus: Rigid Structure"
90+
A[Label: pod='A'] -->|Lookup Offset| B[Posting List]
91+
B -->|Contains| C[SeriesID_1, SeriesID_5, ...]
92+
C -->|Must rewrite entire list on update| D[High Churn = Pain]
93+
end
94+
95+
subgraph "VictoriaMetrics: LSM Stream"
96+
X[New Series: pod='A'] -->|Append Key| Y[LSM Tree Log]
97+
Y -->|Key: pod=A+MetricID_1| Z[Background Merge]
98+
Z -->|No Read penalty on Write| W[High Churn = Easy]
99+
end
100+
```
101+
102+
## 2. The I/O Path: WAL vs. Compressed Parts
103+
104+
This explains why VictoriaMetrics has "smoother" disk usage despite flushing more frequently.
105+
106+
### Prometheus: The WAL (Write Ahead Log)
107+
108+
* **Strategy:** Immediate durability via a WAL file.
109+
* **Write Pattern:** Every sample is appended to the WAL file on disk.
110+
* **Compression:** None/Low (for speed).
111+
* **Payload:** Large (Raw bytes).
112+
* **Fsync:** Infrequent (usually on segment rotation or checkpoint). Relying on OS page cache.
113+
* **Consequence:** High "Write Amplification" during compaction. The disk is constantly busy writing raw data, and then gets hammered every 2 hours when the Head Block flushes.
114+
115+
### VictoriaMetrics: The Buffered Flush
116+
117+
* **Strategy:** Periodic durability via compressed micro-parts.
118+
* **Write Pattern:** Data is buffered in RAM (`inmemoryPart`) and flushed every \~1-5 seconds.
119+
* **Compression:** High (ZSTD-like + Gorilla).
120+
* **Payload:** Tiny (Data is compressed *before* writing).
121+
* **Fsync:** **Frequent** (Every flush).
122+
* **Consequence:** Even though VM calls `fsync` every few seconds, the **payload is so small** (50KB vs 2MB) that modern SSDs handle it effortlessly. This avoids the "Stop the World" I/O spikes seen in Prometheus.
123+
124+
**Critical Trade-off:** VictoriaMetrics sacrifices the last \~5 seconds of data (held in RAM buffer) in the event of a hard crash (`kill -9`) to achieve this massive I/O gain. Prometheus recovers everything via WAL replay.
125+
126+
## 3. Resource Efficiency: Why VM is "Lighter"
127+
128+
| Feature | Prometheus (Single Node) | VictoriaMetrics (Single Node) |
129+
| :------------- | :--------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------- |
130+
| **RAM Usage** | **High.** Scales with Active Series + Ingestion Rate (Head Block bloat). | **Low.** Scales with Cache Size. Writes don't require massive RAM buffers. |
131+
| **CPU Usage** | **High.** Uses 1 Goroutine per target. Garbage Collector struggles with millions of small objects in Head. | **Optimized.** Uses a fixed-size worker pool. Optimized code reduces memory allocations, lightening the load on the GC. |
132+
| **Disk Space** | **Standard.** \~1.5 bytes per sample. | **Ultra-Low.** \~0.4 bytes per sample. Precision reduction + better compression algorithms. |
133+
| **Operation** | **Spiky.** Periodic heavy loads (Compaction/GC). | **Smooth.** Continuous small background merges. |
134+
135+
## Summary Recommendation
136+
137+
* **Stick with Prometheus if:** You have a small-to-medium static environment, you need 100% standard adherence, and you cannot tolerate even 1 second of data loss on a crash.
138+
* **Switch to VictoriaMetrics if:**
139+
1. **High Churn:** You run Kubernetes with frequent deployments or auto-scaling.
140+
2. **Long Retention:** You need to store months/years of data cheaply.
141+
3. **Performance Issues:** Your Prometheus is OOMing or using too much CPU.
142+
143+
### Final "Under the Hood" Visualization
144+
145+
This graph (referenced from our earlier discussion) summarizes the reality: Prometheus shows a "Sawtooth" pattern of resource usage (building up to a flush), whereas VictoriaMetrics shows a "Flat" line, making it far easier to capacity plan for production clusters.

0 commit comments

Comments
 (0)