Skip to content

Commit d495058

Browse files
committed
prometheus
1 parent 7b77ce7 commit d495058

File tree

2 files changed

+26
-1
lines changed

2 files changed

+26
-1
lines changed

prometheus/qa.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# (Dumb) Question and Answer
2+
3+
## 1. The 2-Hour problem
4+
5+
Prometheus stores incoming data in a mutable in-memory structure called the Head Block. It keeps data here for 2 hours before compressing it and writing it to a permanent "Block" on the disk.
6+
7+
- Without a WAL: If Prometheus crashed, you would lose up to 2 hours of data (everything currently in RAM).
8+
- With a WAL: Every single data point is written to the WAL on disk instantly as it arrives. If Prometheus crashes, it reads the WAL to restore those 2 hours of data into RAM.
9+
10+
## 2. Why doesn't Prometheus just flush to disk often (like VictoriaMetrics)?
11+
12+
The answer lies in **Indexing and Compression**.
13+
14+
- Prometheus Approach: It builds a heavy, highly optimized **Inverted Index** in RAM. This allows for extremely fast querying of massive datasets. However, this index is complex to construct. Writing this complex index to disk is an expensive operation. If Prometheus tried to write this to disk every second (or even every minute), the Disk I/O would be overwhelmed, and the server would stall.
15+
- VictoriaMetrics Approach: It uses an architecture similar to an **LSM Tree (Log-Structured Merge-tree)**. It writes data in small, simple "parts" that are easy to flush to disk quickly. It merges them later in the background. This makes writing fast and frequent flushes possible, eliminating the need for a WAL, but it requires a different approach to indexing (which VM handles via distinct index structures like mergeset).
16+
17+
## 3. Why doesn't Prometheus use the WAL for everything?
18+
19+
The WAL is a simple sequential log (like a text file of "Event A, Event B, Event C").
20+
21+
- Terrible for Reading: To find "CPU usage for Server X at 2:00 PM," you would have to scan the entire WAL file from start to finish. This is incredibly slow.
22+
- Terrible for Space: The WAL is uncompressed (or lightly compressed).
23+
- The Solution: Every 2 hours, Prometheus takes that data, organizes it into columns (for speed), compresses it heavily (XOR/Gorilla compression), and saves it as a Block. This makes queries fast and disk usage low.

prometheus/tsdb.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,7 @@
22

33
Source:
44

5-
- <https://youtu.be/35TAaAESLcc>
5+
- <https://youtu.be/LOZQFT8Dcq0>
66
- <https://ganeshvernekar.com/blog/prometheus-tsdb-the-head-block/>
7+
- <https://web.archive.org/web/20220205173824/https://fabxc.org/tsdb/>
8+
- <https://www.usenix.org/conference/srecon22apac/presentation/vernekar>

0 commit comments

Comments
 (0)