Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Commit f0794cf

Browse files
committed
set default numchunks to 7 + fix description of ringbuffer
this leaves 60min of data for all series. + make the description of the ringbuffer and chunk cache more nuanced.
1 parent a0123f4 commit f0794cf

File tree

8 files changed

+76
-49
lines changed

8 files changed

+76
-49
lines changed

docker/docker-cluster/metrictank.ini

+11-7
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ accounting-period = 5min
99
## data ##
1010

1111
# see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for more details
12+
1213
# duration of raw chunks. e.g. 10min, 30min, 1h, 90min...
14+
# must be valid value as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
1315
chunkspan = 10min
1416
# number of raw chunks to keep in in-memory ring buffer
15-
# note that the chunk-cache (settings further down) is a more effective method to cache data and alleviate workload for cassandra.
16-
# but this allows secondary nodes to keep serving data in case the primary is not able to save data upto chunkspan*numchunks")
17-
numchunks = 5
17+
# See https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for details and trade-offs, especially when compared to chunk-cache
18+
# (settings further down) which may be a more effective method to cache data and alleviate workload for cassandra.
19+
numchunks = 7
1820
# minimum wait before raw metrics are removed from storage
1921
ttl = 35d
2022

@@ -33,11 +35,13 @@ warm-up-period = 1h
3335
# settings for rollups (aggregation for archives)
3436
# comma-separated list of archive specifications.
3537
# archive specification is of the form: aggSpan:chunkSpan:numChunks:TTL[:ready as bool. default true]
36-
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false
37-
# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
38-
# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
38+
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false you get:
39+
# - 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
40+
# - 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
3941
# When running a cluster of metrictank instances, all instances should have the same agg-settings.
40-
# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
42+
# Note:
43+
# * chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
44+
# * numchunks -like the global setting- has nuanced use compared to chunk cache. see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md
4145
agg-settings =
4246

4347
## metric data storage in cassandra ##

docker/docker-dev-custom-cfg-kafka/metrictank.ini

+10-6
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,13 @@ accounting-period = 5min
99
## data ##
1010

1111
# see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for more details
12+
1213
# duration of raw chunks. e.g. 10min, 30min, 1h, 90min...
14+
# must be valid value as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
1315
chunkspan = 2min
1416
# number of raw chunks to keep in in-memory ring buffer
15-
# note that the chunk-cache (settings further down) is a more effective method to cache data and alleviate workload for cassandra.
16-
# but this allows secondary nodes to keep serving data in case the primary is not able to save data upto chunkspan*numchunks")
17+
# See https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for details and trade-offs, especially when compared to chunk-cache
18+
# (settings further down) which may be a more effective method to cache data and alleviate workload for cassandra.
1719
numchunks = 2
1820
# minimum wait before raw metrics are removed from storage
1921
ttl = 35d
@@ -33,11 +35,13 @@ warm-up-period = 1h
3335
# settings for rollups (aggregation for archives)
3436
# comma-separated list of archive specifications.
3537
# archive specification is of the form: aggSpan:chunkSpan:numChunks:TTL[:ready as bool. default true]
36-
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false
37-
# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
38-
# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
38+
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false you get:
39+
# - 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
40+
# - 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
3941
# When running a cluster of metrictank instances, all instances should have the same agg-settings.
40-
# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
42+
# Note:
43+
# * chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
44+
# * numchunks -like the global setting- has nuanced use compared to chunk cache. see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md
4145
agg-settings =
4246

4347
## metric data storage in cassandra ##

docs/config.md

+10-7
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,12 @@ accounting-period = 5min
3636
```
3737
# see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for more details
3838
# duration of raw chunks. e.g. 10min, 30min, 1h, 90min...
39+
# must be valid value as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
3940
chunkspan = 10min
4041
# number of raw chunks to keep in in-memory ring buffer
41-
# note that the chunk-cache (settings further down) is a more effective method to cache data and alleviate workload for cassandra.
42-
# but this allows secondary nodes to keep serving data in case the primary is not able to save data upto chunkspan*numchunks")
43-
numchunks = 5
42+
# See https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for details and trade-offs, especially when compared to chunk-cache
43+
# (settings further down) which may be a more effective method to cache data and alleviate workload for cassandra.
44+
numchunks = 7
4445
# minimum wait before raw metrics are removed from storage
4546
ttl = 35d
4647
# max age for a chunk before to be considered stale and to be persisted to Cassandra
@@ -56,11 +57,13 @@ warm-up-period = 1h
5657
# settings for rollups (aggregation for archives)
5758
# comma-separated list of archive specifications.
5859
# archive specification is of the form: aggSpan:chunkSpan:numChunks:TTL[:ready as bool. default true]
59-
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false
60-
# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
61-
# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
60+
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false you get:
61+
# - 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
62+
# - 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
6263
# When running a cluster of metrictank instances, all instances should have the same agg-settings.
63-
# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
64+
# Note:
65+
# * chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
66+
# * numchunks -like the global setting- has nuanced use compared to chunk cache. see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md
6467
agg-settings =
6568
```
6669

docs/memory-server.md

+11-7
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,9 @@ It has two mechanisms to support this: the ring buffers, and the chunk-cache. T
99

1010
The ring buffer is simply a list of chunks - one for each series - that holds the latest data for each series that has been ingested (or generated, for rollup series).
1111
You can configure how many chunks to retain (`numchunks`).
12-
* The main function of the ring buffers is to keep secondaries able to satisfy queries from RAM, even if the primary is not able to save its chunks instantly, or if the primary
13-
crashed and needs to be restarted. Effectively, the more data in your ring buffer, the longer outages of a primary you can sustain. (up to `(numchunks-1) * chunkspan` in duration)
14-
* For keeping a "hot cache" of frequently accessed data, this is not necessarily an effective solution, since the same `numchunks` is applied to all raw series
15-
(and aggregation settings are applied to all series in the same fashion, so a given rollup frequency will have the same `numchunks` for all series)
16-
So unless you're confident your metrics are all subject to queries of the same timeranges, and that they are predictable, you should look at the chunk cache below.
12+
The ring buffer can be useful to assure data that may be needed is in memory, in these cases:
13+
* you know a majority of your queries hits the most recent data of a given time window (e.g. last 2 hours, last day), you know this is unlikely to change and true for the vast majority of your metrics.
14+
* keep secondaries able to satisfy queries from RAM for the most recent data of cold (infrequently queried) series, even if the primary is not able to save its chunks instantly, if it crashed and needs to be restarted or if you're having a cassandra outage so that chunks can't be loaded or saved. Note that this does not apply for hot data: data queried frequently enough (at least as frequent as their chunkspan) will be added to the chunk cache automatically (see below) and not require cassandra lookups.
1715

1816
Note:
1917
* the last (current) chunk is always a "work in progress", so depending on what time it is, it may be anywhere between empty and full.
@@ -22,10 +20,16 @@ Note:
2220

2321
Both of these make it tricky to articulate how much data is in the ringbuffer for a given series. But `(numchunks-1) * chunkspan` is the conservative approximation which is valid in the typical case (a warmed up metrictank that's ingesting fresh data).
2422

23+
For keeping a "hot cache" of frequently accessed data in a more flexible way, this is not an effective solution, since the same `numchunks` is applied to all raw series
24+
(and aggregation settings are applied to all series in the same fashion, so a given rollup frequency will have the same `numchunks` for all series)
25+
So unless you're confident your metrics are all subject to queries of the same timeranges, and that they are predictable, you should look at the chunk cache below.
26+
2527
### Chunk Cache
2628

2729
The goal of the chunk cache is to offload as much read workload from cassandra as possible.
2830
Any data chunks fetched from Cassandra are added to the chunk cache.
31+
But also, more interestingly, chunks expired out of the ring buffers will automatically be added to the chunk cache if the chunk before it is also in the cache.
32+
In other words, for series we know to be "hot" (queried frequently enough so that their data is kept in the chunk cache) we will try to avoid a roundtrip to Cassandra before adding the chunks to the cache. This can be especially useful when it takes long for the primary to save data to cassandra, or when there is a cassandra outage.
2933
The chunk cache has a configurable [maximum size](https://github.com/raintank/metrictank/blob/master/docs/config.md#chunk-cache),
3034
within that size it tries to always keep the most often queried data by using an LRU mechanism that evicts the Least Recently Used chunks.
3135

@@ -92,8 +96,8 @@ We plan to keep working on performance and memory management and hope to make th
9296

9397
In principle, you need just 1 chunk for each series.
9498
However:
95-
* when the data stream moves into a new chunk, secondary nodes would drop the previous chunk and query Cassandra. But the primary needs some time to save the chunk to Cassandra. Based on your deployment this could take anywhere between milliseconds or many minutes. As you don't want to slam Cassandra with requests at each chunk clear, you should probably use a numchunks of 2, or a numchunks that lets you retain data in memory for however long it takes to flush data to cassandra.
96-
* The ringbuffers are a great tool to let you deal with crashes or outages of your primary node. If your primary went down, or for whatever reason cannot save data to Cassandra, then you won't even feel it if the ringbuffers can "clear the gap" between in memory data and older data in cassandra. So we advise to think about how fast your organisation could resolve a potential primary outage, and then set your parameters such that `(numchunks-1) * chunkspan` is more than that.
99+
* when the data stream moves into a new chunk, secondary nodes would drop the previous chunk and query Cassandra. But the primary needs some time to save the chunk to Cassandra. Based on your deployment this could take anywhere between milliseconds or many minutes. Possibly even an hour or more. As you don't want to slam Cassandra with requests at each chunk clear, you should probably use a numchunks of 2, or a numchunks that lets you retain data in memory for however long it takes to flush data to cassandra. (though the chunk cache alleviates this concern for hot data, see above).
100+
* The ringbuffers can be useful to let you deal with crashes or outages of your primary node. If your primary went down, or for whatever reason cannot save data to Cassandra, then you won't even feel it if the ringbuffers can "clear the gap" between in memory data and older data in cassandra. So we advise to think about how fast your organisation could resolve a potential primary outage, and then set your parameters such that `(numchunks-1) * chunkspan` is more than that. (again, with a sufficiently large cache, this is only a concern for cold data)
97101

98102
#### Rollups remove the need to keep large number of higher resolution chunks
99103

metrictank-sample.ini

+11-7
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,14 @@ accounting-period = 5min
1212
## data ##
1313

1414
# see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for more details
15+
1516
# duration of raw chunks. e.g. 10min, 30min, 1h, 90min...
17+
# must be valid value as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
1618
chunkspan = 10min
1719
# number of raw chunks to keep in in-memory ring buffer
18-
# note that the chunk-cache (settings further down) is a more effective method to cache data and alleviate workload for cassandra.
19-
# but this allows secondary nodes to keep serving data in case the primary is not able to save data upto chunkspan*numchunks")
20-
numchunks = 5
20+
# See https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for details and trade-offs, especially when compared to chunk-cache
21+
# (settings further down) which may be a more effective method to cache data and alleviate workload for cassandra.
22+
numchunks = 7
2123
# minimum wait before raw metrics are removed from storage
2224
ttl = 35d
2325

@@ -36,11 +38,13 @@ warm-up-period = 1h
3638
# settings for rollups (aggregation for archives)
3739
# comma-separated list of archive specifications.
3840
# archive specification is of the form: aggSpan:chunkSpan:numChunks:TTL[:ready as bool. default true]
39-
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false
40-
# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
41-
# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
41+
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false you get:
42+
# - 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
43+
# - 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
4244
# When running a cluster of metrictank instances, all instances should have the same agg-settings.
43-
# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
45+
# Note:
46+
# * chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
47+
# * numchunks -like the global setting- has nuanced use compared to chunk cache. see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md
4448
agg-settings =
4549

4650
## metric data storage in cassandra ##

metrictank.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ var (
5858

5959
// Data:
6060
chunkSpanStr = flag.String("chunkspan", "10min", "duration of raw chunks")
61-
numChunksInt = flag.Int("numchunks", 5, "number of raw chunks to keep in in-memory ring buffer. note that the chunk-cache is a more effective method to cache data and alleviate workload for cassandra. but this allows secondary nodes to keep serving data in case the primary is not able to save data upto chunkspan*numchunks")
61+
numChunksInt = flag.Int("numchunks", 7, "number of raw chunks to keep in in-memory ring buffer. See https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for details and trade-offs, especially when compared to chunk-cache")
6262
ttlStr = flag.String("ttl", "35d", "minimum wait before metrics are removed from storage")
6363

6464
chunkMaxStaleStr = flag.String("chunk-max-stale", "1h", "max age for a chunk before to be considered stale and to be persisted to Cassandra.")

scripts/config/metrictank-docker.ini

+11-7
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,14 @@ accounting-period = 5min
99
## data ##
1010

1111
# see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for more details
12+
1213
# duration of raw chunks. e.g. 10min, 30min, 1h, 90min...
14+
# must be valid value as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
1315
chunkspan = 10min
1416
# number of raw chunks to keep in in-memory ring buffer
15-
# note that the chunk-cache (settings further down) is a more effective method to cache data and alleviate workload for cassandra.
16-
# but this allows secondary nodes to keep serving data in case the primary is not able to save data upto chunkspan*numchunks")
17-
numchunks = 5
17+
# See https://github.com/raintank/metrictank/blob/master/docs/memory-server.md for details and trade-offs, especially when compared to chunk-cache
18+
# (settings further down) which may be a more effective method to cache data and alleviate workload for cassandra.
19+
numchunks = 7
1820
# minimum wait before raw metrics are removed from storage
1921
ttl = 35d
2022

@@ -33,11 +35,13 @@ warm-up-period = 1h
3335
# settings for rollups (aggregation for archives)
3436
# comma-separated list of archive specifications.
3537
# archive specification is of the form: aggSpan:chunkSpan:numChunks:TTL[:ready as bool. default true]
36-
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false
37-
# 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
38-
# 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
38+
# with these aggregation rules: 5min:1h:2:3mon,1h:6h:2:1y:false you get:
39+
# - 5 min of data, store in a chunk that lasts 1hour, keep 2 chunks in in-memory ring buffer, keep for 3months in cassandra
40+
# - 1hr worth of data, in chunks of 6 hours, 2 chunks in in-memory ring buffer, keep for 1 year, but this series is not ready yet for querying.
3941
# When running a cluster of metrictank instances, all instances should have the same agg-settings.
40-
# chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
42+
# Note:
43+
# * chunk spans must be valid values as described here https://github.com/raintank/metrictank/blob/master/docs/memory-server.md#valid-chunk-spans
44+
# * numchunks -like the global setting- has nuanced use compared to chunk cache. see https://github.com/raintank/metrictank/blob/master/docs/memory-server.md
4145
agg-settings =
4246

4347
## metric data storage in cassandra ##

0 commit comments

Comments
 (0)