Single-threaded, cache-conscious C++20 limit order book. Designed as a learning project that takes HFT-style microoptimizations seriously without pretending to be production trading infrastructure.
Measured on a typical x86-64 desktop with -O3 -march=native, ~10k resting orders, no CPU pinning:
Latency (nanoseconds)
| Operation | avg | p99 | p99.9 |
|---|---|---|---|
| Passive limit add | ~67 ns | ~101 ns | ~2.0 us |
| Cancel | ~69 ns | ~140 ns | ~350 ns |
| Modify (quantity) | ~68 ns | ~143 ns | ~310 ns |
| Modify (price + quantity) | ~74 ns | ~150 ns | ~370 ns |
| Quote refresh (4 ops batch) | ~67 ns | ~73 ns | ~95 ns |
| Market fill (deep book) | ~27 ns | ~31 ns | ~63 ns |
CPU cycles (measured via __rdtscp, same runs as above)
| Operation | avg | p99 | p99.9 |
|---|---|---|---|
| Passive limit add | ~401 | ~548 | ~6 800 |
| Cancel | ~406 | ~642 | ~1 306 |
| Modify (quantity) | ~405 | ~652 | ~1 242 |
| Modify (price + quantity) | ~422 | ~674 | ~1 284 |
| Quote refresh (4 ops batch) | ~296 | ~316 | ~388 |
| Market fill (deep book) | ~187 | ~194 | ~284 |
Cycle counts are more stable across CPU frequency / turbo state than wall-clock nanoseconds -- useful when comparing changes to the engine without a controlled environment. Numbers come from bench/main.cpp, written to benchmark_reports/. They jitter +/-5-10 ns / +/-20 cycles run-to-run on a noisy desktop. Pin a core (taskset -c 2) and disable turbo if you want stable numbers.
The full benchmark (lifecycle / quote refresh / market fill / stress matrix across 1k / 10k / 100k order books) runs in a few seconds.
+-----------------------------+
| Client |
| (strategy / market data) |
+--------------+--------------+
Command | ^ OrderPointer + events()
v |
+----------------------------------------------------------------------+
| OrderBook::Process(Command&) noexcept -- std::visit dispatch -- |
+----------------------------------------------------------------------+
| |
| +---------+ +---------+ +---------+ +---------+ |
| | Add | | Modify | | Cancel | |CancelStop| |
| | Limit / | |(handle) | |(handle) | | (id) | |
| |Market / | +----+----+ +----+----+ +----+----+ |
| | Stop | | | | |
| +----+----+ | | | |
| | | | | |
| +------+-------+--------------+ | |
| v v |
| +--------------+ +------------------+ |
| | fill() |-- push FillEvent ->| events_ | |
| | (matching) | | (per-call buffer)| |
| +--+--------+--+ +------------------+ |
| v v |
| +----------+ +----------+ |
| | bids_[] | | asks_[] | <- std::array<Level, 200k>, O(1) by tick|
| | Level | | Level | each Level is an intrusive list of |
| | (head/ | | (head/ | Order* (FIFO time priority) |
| | tail) | | tail) | |
| +----+-----+ +----+-----+ |
| v v |
| +----------+ +----------+ |
| | bidBits_ | | askBits_ | <- 1 bit / level, ~25 KB each |
| | (bitmap) | | (bitmap) | updateBest* uses ctzll/clzll → |
| +----------+ +----------+ O(1) next-best lookup |
| |
| +----------------------------+ +----------------------------+ |
| | buyStops_ / sellStops_ | | stopHandler_ | |
| | std::vector<PendingStop> | | unordered_map<id, locator>| |
| | sorted; pop_back triggers | | for cancel-by-id | |
| +----------------------------+ +----------------------------+ |
| |
| +------------------------------------------------------------+ |
| | orderPool_ (slab + index freelist, all Order* live here) | |
| +------------------------------------------------------------+ |
| |
+----------------------------------------------------------------------+
| Component | Type / size | Why this shape |
|---|---|---|
bids_ / asks_ |
std::array<Level, 200'000> per side |
O(1) tick → level. No hashing. ~6 MB total -- out of L2 but each access is one cacheline. |
Level |
head + tail Order* pair |
Intrusive doubly-linked list. Add/remove are O(1). |
bidBits_ / askBits_ |
LevelBitmap<200'000> (~25 KB each) |
One bit per level. Find-next-best via __builtin_ctzll / clzll on a single 64-bit word. |
orderPool_ |
Slab + std::vector<int32_t> freelist |
All Order objects live contiguously. Zero per-order heap allocation. |
buyStops_ / sellStops_ |
std::vector<PendingStop>, sorted |
Next-to-trigger at back(). pop_back is O(1). |
stopHandler_ |
unordered_map<OrderId, StopLocator> |
Stop cancel-by-id (only off-hot-path hash lookup left). |
events_ |
std::vector<FillEvent> |
Cleared each Process(). Consumer reads after the call. |
Hot path: Process → std::visit → command-specific function → reads/writes one Level cell + maintains the bitmap bit + pushes any FillEvents. The handle-based API means Cancel and Modify go straight from Order* to the level -- no map lookup, just one pointer dereference to read side and price.
Direct array of Levels, indexed by tick. A std::array<Level, MAX_LEVEL> (2 x 200 000 cells) for bids and asks. O(1) lookup by price, no hashing. Each Level is a small intrusive doubly-linked list of Order*.
Bitmap for best-price scan. One bit per level on each side. updateBestBid/updateBestAsk find the next non-empty level via __builtin_clzll/ctzll on a single 64-bit word -- constant-time regardless of how sparse the book gets. Replaces what would otherwise be a linear scan across thousands of empty cells after a sweep.
Handle-based API, no per-order hashing on the hot path. OrderBook::Process() returns the Order* for limit adds. Subsequent CancelCommand / ModifyCommand carry that pointer directly. The engine reads side and price straight off the Order -- no unordered_map<OrderId, ...> lookup. Cuts cancel/modify latency roughly in half.
Slab-allocated order pool. All Order objects live in a single malloc'd arena, with a freelist of slot indices. No per-order heap allocation. Free indices are reused, so steady-state memory is bounded by poolSize.
__rdtsc() for timestamps. std::chrono::steady_clock::now() was ~20-30 ns per call (vDSO syscall). __rdtsc() is a single instruction, ~5 ns. Stamps are stored as opaque cycle counts.
Sorted vector for stop orders. Stops are kept in a std::vector<PendingStop> per side, sorted so the next-to-trigger sits at the back. Triggering is O(1) pop_back. Replaced an earlier std::multimap (rb-tree, node-allocated, terrible cache behavior).
No exceptions on the hot path. Every member of OrderBook is noexcept; OrderPool::getOrder calls std::abort() on exhaustion (programmer bug, not recoverable). The compiler skips emitting unwinding tables for these functions.
Fill events. Every match emits a FillEvent (maker handle, price, qty, taker side) into a buffer that's exposed via OrderBook::events(). Cleared at the start of every Process() call.
#include <orderbook/orderbook.hpp>
OrderBook book(/*poolSize=*/100'000);
// Add a passive bid at price 9999, qty 10.
AddCommand add{LimitOrder{9999, 0}, /*orderId=*/1, /*qty=*/10, Side::Buy};
Command cmd = add;
OrderPointer h = book.Process(cmd); // h is the handle to the resting order
// Send an aggressive sell that hits the bid.
AddCommand mkt{MarketOrder{0}, /*orderId=*/2, /*qty=*/4, Side::Sell};
Command sellCmd = mkt;
book.Process(sellCmd);
for (const FillEvent& f : book.events()) {
// f.price, f.qty, f.takerSide, f.makerHandle (nullptr if maker fully consumed)
}
// Cancel the rest of the bid.
CancelCommand cxl{h};
Command cxlCmd = cxl;
book.Process(cxlCmd);make # builds bench, runs perf counters
make run # builds + runs bench (no perf required)
make lib # just builds build/liborderbook.a
make cleancmake -B build-cmake -DCMAKE_BUILD_TYPE=Release
cmake --build build-cmake
./build-cmake/orderbook_benchTo skip the bench when consuming as a library:
cmake -B build-cmake -DORDERBOOK_BUILD_BENCH=OFF# After installing (cmake --install build-cmake --prefix /your/prefix):
find_package(orderbook REQUIRED)
target_link_libraries(your_target PRIVATE orderbook::orderbook)Or vendor as a submodule and use add_subdirectory():
add_subdirectory(third_party/orderbook)
target_link_libraries(your_target PRIVATE orderbook::orderbook)g++ -std=c++20 -O3 -march=native \
-I/path/to/orderbook/include \
your_code.cpp \
/path/to/orderbook/build/liborderbook.a \
-o your_appinclude/orderbook/ # public headers (consume via <orderbook/orderbook.hpp>)
src/ # library implementations
bench/ # benchmark harness (separate from the library)
cmake/ # CMake package config template
benchmark_reports/ # benchmark output (gitignored)
- Limit -- rests on the book if not fully matched on entry.
- Market -- sweeps until filled or the book is empty.
- Stop-limit -- registered until
bestAskreaches buy-stop price (orbestBidreaches sell-stop price), then injected as a limit atlimitPrice.
Modify re-crosses: a price change that pushes through the spread will match against the book like a fresh aggressive limit.
This is a project, not a production engine. Things you would need before doing anything real with it:
- No tests. Correctness has been hand-verified through the benchmark harness; there are no unit tests.
- No threading. Single-threaded only. No publisher thread, no lock-free queues, no NUMA pinning.
- No order types beyond Limit / Market / Stop-limit. Missing: IOC, FOK, GTC, post-only, hidden, iceberg, peg.
- No self-trade prevention, participant IDs, or priority categories.
- No persistence / replay. Fill events are exposed in-memory but not logged.
- No fee / pricing logic, no auctions, no circuit breakers.
- Pool uses
malloc, notmmapwith huge pages. Fine for the project, but you'd want hugepages for a large book.
MIT.