Skip to content

Commit f9529d0

Browse files
authored
Add matching fairness simulator (#8158)
## What changed? Add a new cli tool to run the matching fairness simulator, and some tests for basic fairness behavior. ## Why? So users can tell how it will behave for their workloads.
1 parent 71658b9 commit f9529d0

6 files changed

Lines changed: 961 additions & 0 deletions

File tree

Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,10 @@ tdbg: $(ALL_SRC)
356356
@printf $(COLOR) "Build tdbg with CGO_ENABLED=$(CGO_ENABLED) for $(GOOS)/$(GOARCH)..."
357357
CGO_ENABLED=$(CGO_ENABLED) go build $(BUILD_TAG_FLAG) -o tdbg ./cmd/tools/tdbg
358358

359+
fairsim: $(ALL_SRC)
360+
@printf $(COLOR) "Build fairsim with CGO_ENABLED=$(CGO_ENABLED) for $(GOOS)/$(GOARCH)..."
361+
CGO_ENABLED=$(CGO_ENABLED) go build $(BUILD_TAG_FLAG) -o fairsim ./cmd/tools/fairsim
362+
359363
temporal-cassandra-tool: $(ALL_SRC)
360364
@printf $(COLOR) "Build temporal-cassandra-tool with CGO_ENABLED=$(CGO_ENABLED) for $(GOOS)/$(GOARCH)..."
361365
CGO_ENABLED=$(CGO_ENABLED) go build $(BUILD_TAG_FLAG) -o temporal-cassandra-tool ./cmd/tools/cassandra

cmd/tools/fairsim/main.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
package main
2+
3+
import (
4+
"fmt"
5+
"os"
6+
7+
"go.temporal.io/server/tools/fairsim"
8+
)
9+
10+
func main() {
11+
if err := fairsim.RunTool(os.Args[1:]); err != nil {
12+
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
13+
os.Exit(1)
14+
}
15+
}

fairsim

3.01 MB
Binary file not shown.

tools/fairsim/README.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
2+
# fairsim
3+
4+
This tool simulates the behavior of Temporal's fair task queues, for use in
5+
evaluating different parameters.
6+
7+
## Build
8+
run `make fairsim`
9+
10+
## Usage
11+
12+
There are two main modes:
13+
14+
- **Generation**: Generates tasks by random distribution and then dispatches
15+
them.
16+
- **Script**: Process a script with queue pushes and pops. This can be used to
17+
test with actual data or a custom distribution, and with continuous task
18+
creation/dispatch.
19+
20+
### Generation
21+
22+
By default, fairsim generates tasks with fairness keys following a Zipf
23+
distribution.
24+
25+
Examples:
26+
27+
```bash
28+
# Generate a million tasks with 20 keys
29+
fairsim -- -tasks=1000000 -keys=20
30+
31+
# Generate a million tasks with a much more lopsided distribution
32+
fairsim -- -tasks=1000000 -keys=50 -zipf-s=3 -zipf-v=1.1
33+
34+
# Try alternate counter paramters
35+
for w in 1 10 100 1000 10000; do
36+
fairsim -counter-params <(echo '{"CMS":{"W":'$w'}}') -- -tasks=1000000 -keys=1000 | grep p90s: | tail -1
37+
done
38+
39+
# Disable fairness to compare to fifo
40+
fairsim -fair=0 -- -tasks=500
41+
42+
# Use only one partition (default is 4)
43+
fairsim -partitions=1 -- -tasks=500
44+
```
45+
46+
### Script
47+
48+
Examples:
49+
50+
```bash
51+
# Priority order
52+
{
53+
echo "task -pri=4 -payload four"
54+
echo "task -pri=2 -payload two"
55+
echo "task -pri=3 -payload three"
56+
echo "task -pri=5 -payload five"
57+
echo "task -pri=1 -payload one"
58+
} | fairsim -script=/dev/stdin -partitions=1
59+
# should see one, two, three, four, five
60+
61+
# Fairness
62+
{
63+
echo "task -fkey a"
64+
echo "task -fkey a"
65+
echo "task -fkey a"
66+
echo "task -fkey a"
67+
echo "task -fkey b"
68+
} | fairsim -script=/dev/stdin -partitions=1
69+
# should see a, b, a, a, a
70+
71+
# Alternating
72+
{
73+
echo "task -fkey a"
74+
echo "task -fkey a"
75+
echo "task -fkey a"
76+
echo "task -fkey a"
77+
echo "task -fkey b"
78+
echo poll # gets a
79+
echo poll # gets b
80+
echo "task -fkey c"
81+
echo "task -fkey c"
82+
} | fairsim -script=/dev/stdin -partitions=1
83+
# should see a, b, c, a, c, a, a
84+
85+
# Weight
86+
{
87+
for i in {1..20}; do echo "task -fkey a"; done
88+
for i in {1..20}; do echo "task -fkey b -fweight 5"; done
89+
} | fairsim -script=/dev/stdin -partitions=1
90+
# should see five b's for each a until b's are done
91+
```
92+
93+
## Interpreting output
94+
95+
By default, only **Percentile of percentiles** are printed. Add `-verbose` for full output.
96+
97+
### Task section
98+
99+
First, fairsim will print one line for each task dispatched:
100+
101+
```
102+
task idx: 33 dsp: 15 lat: -18 pri: 3 fkey: "key1" fweight: 1 part: 2 payload:"external-key-32"
103+
104+
```
105+
106+
- `idx`: Creation index. Tasks are assigned an incrementing index as they are created.
107+
- `dsp`: Dispatch index. Order this task was dispatched in.
108+
- `lat`: "Latency": dispatch index minus creation index. For a FIFO queue this
109+
would always be zero. Negative means the task was moved earlier compared to
110+
FIFO, positive means it was penalized.
111+
- `pri`: Task priority.
112+
- `fkey`: Fairness key.
113+
- `fweight`: Fairness weight.
114+
- `part`: Partition task was assigned to.
115+
- `payload`: User-defined payload (could be used for correlation).
116+
117+
### Statistics
118+
119+
Next are statistics on the "latency" values. In general, lower numbers are
120+
better, since that means more tasks were moved ahead of where they would be in a
121+
FIFO queue.
122+
123+
**Raw Latency Statistics**
124+
125+
Basic stats on the "latency" values. Mean must always be zero since it's a
126+
permutation.
127+
128+
**Normalized Latency Statistics**
129+
130+
When looking at per-key latencies, we expect a "heavy" fairness key with more
131+
tasks to have worse latency than a "light" key, since the light key's tasks get
132+
pushed in front of the heavy one. This is desirable, though, and doesn't really
133+
reflect how much the heavy one was "penalized". We can normalize the latency by
134+
dividing by the number of tasks for that key, and futher normalize by the number
135+
of keys.
136+
137+
Both the raw and normalized can be useful to look at.
138+
139+
**Per-task Statistics**
140+
141+
The raw and normalized latency stats are printed for each key, along with the
142+
count.
143+
144+
**Percentile of percentiles**
145+
146+
Raw and normalized percentile stats are printed to give a summary of latency
147+
across different keys:
148+
149+
Rows are percentiles of latency for each key, and columns are percentiles across
150+
those percentiles (counting each key once). E.g. the "p90s" row describes the
151+
90th percentile latency for each key. The "@50" column of that is the median of
152+
those 90th percentile latencies.
153+

0 commit comments

Comments
 (0)