-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Requirments
-
The benchmarks should be entirely written in rust.
-
The benchmarks should be portable and not rely on the presence of platform defined dictionary files.
-
The benchmarks should have the ability to be run with specific parameters
- Number of input lines
- Fraction of duplicates
- Distribution of input line length
- Char set (binary/text)
-
The benchmarks should still be able to run against all the preexisting commands (
sort|uniq).
Design
A CLI application should be written that produces a set of random tokens according to the parameters specified on the CLI:
genbench --charset ascii/binary --delim CHAR --number NUM --duplicates PERCENTAGE --short LEN --long LENThe short/long parameters each indicate the 90% percentile of string lengths, using a gaussian distribution.
For the actual benchmark we should write a benchmark executor that runs each of the implementations with a variety of parameters handed to genbench.
Tests
We can reuse the same strategy for testing by generating test data with genbench and then comparing the output of the full huniq and a super naive, unoptimized huniq implementation. We should specifically make sure, that buffer growing is tested (supply some very long, >20kb strings).