Skip to content

Commit 44901d3

Browse files
committed
add Thanos note
1 parent 9c82439 commit 44901d3

File tree

2 files changed

+94
-0
lines changed

2 files changed

+94
-0
lines changed
File renamed without changes.

thanos/guideline.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Thanos Guideline
2+
3+
## 1. Sizing
4+
5+
Source: <https://krisztianfekete.org/sizing-thanos-receive-and-prometheus-storage/>
6+
7+
## 2. Operation note
8+
9+
Source: <https://blog.devops.dev/streamlining-long-term-storage-query-performance-for-metrics-with-thanos-b44419c70cc4>
10+
11+
> Do one component at a time
12+
13+
### 2.1. Thanos query
14+
15+
- Query is responsible to take care of queries to all Store API.
16+
- By default, it relies on **Prometheus Query API**, which by definition, from the moment the query starts, begins by defining a series of operations in a tree-like structure to assemble the data to be returned.
17+
- But just recently was introduced the [**Thanos Query PromQL Engine**](https://github.com/thanos-community/promql-engine), which retrieves the data in a distributed way and thus speeds up the process of query execution.
18+
- In my case, after use Thanos PromQL Engine, the resource usage is significantly reduced. If you have trouble with the resource usage, it is worth a try here.
19+
20+
```shell
21+
--query.promql-engine=thanos
22+
```
23+
24+
### 2.2. Thanos compact
25+
26+
- Compact is about to compact and downsample metrics, but instead of doing in from memory, will do it from your Object storage of choice.
27+
- This is necessary to allow for querying longer periods of time since it reduces the amount of retrieved data. It will also be the component where you can define the retention of the metrics in storage.
28+
29+
```shell
30+
--retention.resolution-raw=90d
31+
--retention.resolution-5m=180d
32+
--retention.resolution-1h=1y
33+
```
34+
35+
- Also if the Compact runs out of blocks to work on it it will exit. For that either define the `--wait` flag in order to keep it running and waiting for more metrics tohandle, or `--wait-interval` if you prefer to specify a specific interval.
36+
- Always keep only **one instance running** to handle the metrics compacting and downsampling for one specific Prometheus/Object storage for that matter.
37+
- Always ensure it has enough disk space to avoid any [halting](https://thanos.io/tip/components/compact.md/#halting) due to lack of space.
38+
- Check the amount of resources:
39+
- The CPU number has to be in pair with `--compact.concurrency` flag.
40+
- Memory will be related to the amount of blocks.
41+
42+
### 2.3. Thanos store
43+
44+
- Store will be responsible to retrieve all data from storage based on the query's time range.
45+
- To better handle your querying distributions you could define Sharding for Store, `--min-time`, `--max-time` flags, to have dedicated Stores for each time window.
46+
47+
```shell
48+
# From now to ago to 30 days:
49+
- min-time=30d
50+
# From 30 days ago to 90 days:
51+
- min-time=90d
52+
- max-time=30d
53+
# From 90 days ago to 1 year:
54+
- min-time=1y
55+
- max-time=90d
56+
```
57+
58+
- You can shard in a similar way as you can do with Prometheus, with relabelings, and using hashmod or just keep or drop relabels.
59+
60+
```shell
61+
--selector.relabel-config=
62+
- action: hashmod
63+
source_labels: ["__block_id"]
64+
target_label: shard
65+
modulus: 2
66+
- action: keep
67+
source_labels: ["shard"]
68+
regex: 0
69+
```
70+
71+
- By default, Store has in memory caching -> you can configure Memcached or Redis to handle cache instead.
72+
73+
### 2.4. Thanos query frontend
74+
75+
- Query Frontend breaks you queries into multiple short queries, also has caching built in or supported by Memcached or Redis.
76+
77+
```yaml
78+
type: MEMCACHED
79+
config:
80+
addresses: [memcache-host:port]
81+
timeout: 3s
82+
max_idle_connections: 1024
83+
max_async_concurrency: 20
84+
max_item_size: 30MB
85+
max_async_buffer_size: 10000
86+
max_get_multi_concurrency: 200
87+
max_get_multi_batch_size: 0
88+
dns_provider_update_interval: 10s
89+
expiration: 24h
90+
```
91+
92+
### 2.5. Thanos receiver
93+
94+
- Hashring!

0 commit comments

Comments
 (0)