@apeleghq/benchmark

A statistically rigorous benchmarking library with paired t-tests, baseline correction, and confidence intervals.

Features

No runtime dependencies (except picocolor with some built-in reporters)
Runtime and framework agnostic (ESM and CJS dual-package).
Automatically injects a no-op baseline function to subtract loop and call overhead from every measurement on a per-trial basis.
Runs all functions within the same trial to enable paired statistical tests — shared noise sources (GC pauses, thermal throttling, etc.) cancel out in per-trial differences $d_i = a_i - b_i$.
Computes descriptive statistics per function: mean, median, standard deviation, SEM, min, max, and percentiles ($p_5$, $p_{25}$, $p_{75}$, $p_{95}$).
Reports 95% confidence intervals for all means.
Performs all $\binom{k}{2}$ pairwise paired t-tests with two-tailed p-values and significance flags.
Randomises execution order within each trial to reduce systematic bias.
Per-benchmark and suite-level setup/teardown hooks with shared typed context.
Configurable warmup iterations, iterations per trial, and number of trials.
Includes three built-in reporters: simple, advanced, and xUnit.
Fully JSON-serialisable output for custom reporting or CI integration.

Installation

npm install @apeleghq/benchmark

or

yarn add @apeleghq/benchmark

Usage

Basic suite

import { runSuite } from '@apeleghq/benchmark';
import simpleReport from '@apeleghq/benchmark/reporters/simple';

type Ctx = {
  array: unknown[];
};

const result = await runSuite<Ctx>({
  name: 'Array copy comparison',
  setup() {
    this.array = [1, 2, 3];
  },
  functions: [
    {
      name: 'Array.from',
      fn() {
        Array.from(this.array);
      },
    },
    {
      name: 'Spread operator',
      fn() {
        [...this.array];
      },
    },
  ],
});

simpleReport(result);

Advanced reporter

import advancedReport from '@apeleghq/benchmark/reporters/advanced';

advancedReport(result);

xUnit reporter (for CI integration)

import xunitReport from '@apeleghq/benchmark/reporters/xunit';

console.log(xunitReport(result));

Configuration options

All fields in ISuiteConfig beyond name and functions are optional:

const result = await runSuite({
  name: 'My suite',
  warmupIterations: 10,    // default: 10
  iterationsPerTrial: 1000, // default: 1000
  trials: 30,               // default: 30
  setup() { /* suite-level context setup */ },
  teardown() { /* suite-level context teardown */ },
  functions: [
    {
      name: 'my-fn',
      fn() { /* ... */ },
      setup() { /* function-level setup, runs before warmup+measurement */ },
      teardown() { /* function-level teardown, runs after measurement */ },
    },
  ],
});

Using the statistics module directly

The stats subpackage exposes the underlying statistical primitives:

import { mean, median, stdDev, tDistPValue } from '@apeleghq/benchmark/stats';

How it works

Statistical methodology

Baseline correction

A no-op function (internally named @@null) is automatically added to every suite. Its per-trial measurement captures pure loop and call overhead. This baseline value is subtracted from every other function's measurement on a per-trial basis before any statistics are computed:

$$\hat{x}{i,\text{fn}} = x{i,\text{fn}} - x_{i,\text{null}}$$

All reported aggregate statistics (mean, median, standard deviation, confidence intervals) are computed from these baseline-corrected samples.

Paired t-test

Because all functions are measured within the same trial, comparisons use a paired (dependent) t-test rather than an independent-samples test. Per-trial differences are:

$$d_i = a_i - b_i$$

The t-statistic is:

$$t = \frac{\bar{d}}{s_d / \sqrt{n}}$$

where $\bar{d}$ is the mean of the differences and $s_d$ is their sample standard deviation. This is more powerful than an unpaired test because shared noise sources cancel out.

Confidence intervals

The 95% confidence interval for the true mean difference is:

$$\bar{d} \pm t_{\alpha/2,, n-1} \cdot \frac{s_d}{\sqrt{n}}$$

where $t_{\alpha/2,, n-1}$ is the critical value from the Student's t-distribution with $n-1$ degrees of freedom.

Output format

runSuite returns a fully JSON-serialisable ISuiteReport object:

Field	Description
`name`	Suite name
`config`	Effective configuration (iterations, trials, warmup)
`trials`	Every trial with raw timings and execution order
`functions`	Per-function aggregate statistics (corrected and raw samples)
`comparisons`	All pairwise paired t-test results
`baselineName`	Name of the injected null baseline (`@@null`)

Each entry in comparisons includes:

Field	Description
`meanDifference`	$\bar{d}$ — mean of paired differences (ms)
`relativeDifference`	$\bar{d} / \bar{b}$ — relative difference as a ratio
`tStatistic`	Computed t-statistic
`pValue`	Two-tailed p-value
`significant`	`true` if `pValue < 0.05`
`confidenceInterval`	95% CI for the true mean difference (ms)

Exports

Export path	Description
`@apeleghq/benchmark`	`runSuite` — main entry point
`@apeleghq/benchmark/stats`	Statistical primitives (mean, stdDev, t-test, etc.)
`@apeleghq/benchmark/reporters/simple`	Simple console reporter
`@apeleghq/benchmark/reporters/advanced`	Advanced console reporter with full statistics
`@apeleghq/benchmark/reporters/xunit`	xUnit XML reporter for CI systems

Contributing

Contributions welcome. Please open issues or pull requests on the repository. Consider adding unit tests for edge cases and additional reporters if extending the library.

License

This project is licensed under the Apache 2.0 License with the LLVM exception. You are free to use this package in compliance with the terms of the license. For more information, see the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
@types		@types
assets		assets
example		example
src		src
test		test
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.cjs		.prettierrc.cjs
LICENSE		LICENSE
README.md		README.md
SECURITY		SECURITY
esbuild.ts		esbuild.ts
eslint.config.mjs		eslint.config.mjs
loader.mjs		loader.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@apeleghq/benchmark

Features

Installation

Usage

Basic suite

Advanced reporter

xUnit reporter (for CI integration)

Configuration options

Using the statistics module directly

How it works

Baseline correction

Paired t-test

Confidence intervals

Output format

Exports

Contributing

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@apeleghq/benchmark

Features

Installation

Usage

Basic suite

Advanced reporter

xUnit reporter (for CI integration)

Configuration options

Using the statistics module directly

How it works

Baseline correction

Paired t-test

Confidence intervals

Output format

Exports

Contributing

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages