Skip to content

Francklin9999/orderbook

Repository files navigation

orderbook

Single-threaded, cache-conscious C++20 limit order book. Designed as a learning project that takes HFT-style microoptimizations seriously without pretending to be production trading infrastructure.

Headline numbers

Measured on a typical x86-64 desktop with -O3 -march=native, ~10k resting orders, no CPU pinning:

Latency (nanoseconds)

Operation avg p99 p99.9
Passive limit add ~67 ns ~101 ns ~2.0 us
Cancel ~69 ns ~140 ns ~350 ns
Modify (quantity) ~68 ns ~143 ns ~310 ns
Modify (price + quantity) ~74 ns ~150 ns ~370 ns
Quote refresh (4 ops batch) ~67 ns ~73 ns ~95 ns
Market fill (deep book) ~27 ns ~31 ns ~63 ns

CPU cycles (measured via __rdtscp, same runs as above)

Operation avg p99 p99.9
Passive limit add ~401 ~548 ~6 800
Cancel ~406 ~642 ~1 306
Modify (quantity) ~405 ~652 ~1 242
Modify (price + quantity) ~422 ~674 ~1 284
Quote refresh (4 ops batch) ~296 ~316 ~388
Market fill (deep book) ~187 ~194 ~284

Cycle counts are more stable across CPU frequency / turbo state than wall-clock nanoseconds -- useful when comparing changes to the engine without a controlled environment. Numbers come from bench/main.cpp, written to benchmark_reports/. They jitter +/-5-10 ns / +/-20 cycles run-to-run on a noisy desktop. Pin a core (taskset -c 2) and disable turbo if you want stable numbers.

The full benchmark (lifecycle / quote refresh / market fill / stress matrix across 1k / 10k / 100k order books) runs in a few seconds.

Architecture

                        +-----------------------------+
                        |           Client            |
                        | (strategy / market data)    |
                        +--------------+--------------+
                              Command  |  ^ OrderPointer + events()
                                       v  |
   +----------------------------------------------------------------------+
   |  OrderBook::Process(Command&) noexcept  -- std::visit dispatch --    |
   +----------------------------------------------------------------------+
   |                                                                      |
   |   +---------+    +---------+    +---------+    +---------+           |
   |   |   Add   |    | Modify  |    | Cancel  |    |CancelStop|          |
   |   | Limit / |    |(handle) |    |(handle) |    |  (id)   |           |
   |   |Market / |    +----+----+    +----+----+    +----+----+           |
   |   |  Stop   |         |              |              |                |
   |   +----+----+         |              |              |                |
   |        |              |              |              |                |
   |        +------+-------+--------------+              |                |
   |               v                                     v                |
   |       +--------------+                    +------------------+       |
   |       |   fill()     |-- push FillEvent ->|    events_       |       |
   |       | (matching)   |                    | (per-call buffer)|       |
   |       +--+--------+--+                    +------------------+       |
   |          v        v                                                  |
   |   +----------+ +----------+                                          |
   |   | bids_[]  | | asks_[]  |   <- std::array<Level, 200k>, O(1) by tick|
   |   |  Level   | |  Level   |     each Level is an intrusive list of   |
   |   | (head/   | | (head/   |     Order* (FIFO time priority)          |
   |   |  tail)   | |  tail)   |                                          |
   |   +----+-----+ +----+-----+                                          |
   |        v            v                                                |
   |   +----------+ +----------+                                          |
   |   | bidBits_ | | askBits_ |   <- 1 bit / level, ~25 KB each           |
   |   | (bitmap) | | (bitmap) |     updateBest* uses ctzll/clzll →       |
   |   +----------+ +----------+     O(1) next-best lookup                |
   |                                                                      |
   |   +----------------------------+   +----------------------------+    |
   |   |  buyStops_ / sellStops_    |   |  stopHandler_              |    |
   |   |  std::vector<PendingStop>  |   |  unordered_map<id, locator>|    |
   |   |  sorted; pop_back triggers |   |  for cancel-by-id          |    |
   |   +----------------------------+   +----------------------------+    |
   |                                                                      |
   |   +------------------------------------------------------------+     |
   |   |  orderPool_  (slab + index freelist, all Order* live here) |     |
   |   +------------------------------------------------------------+     |
   |                                                                      |
   +----------------------------------------------------------------------+
Component Type / size Why this shape
bids_ / asks_ std::array<Level, 200'000> per side O(1) tick → level. No hashing. ~6 MB total -- out of L2 but each access is one cacheline.
Level head + tail Order* pair Intrusive doubly-linked list. Add/remove are O(1).
bidBits_ / askBits_ LevelBitmap<200'000> (~25 KB each) One bit per level. Find-next-best via __builtin_ctzll / clzll on a single 64-bit word.
orderPool_ Slab + std::vector<int32_t> freelist All Order objects live contiguously. Zero per-order heap allocation.
buyStops_ / sellStops_ std::vector<PendingStop>, sorted Next-to-trigger at back(). pop_back is O(1).
stopHandler_ unordered_map<OrderId, StopLocator> Stop cancel-by-id (only off-hot-path hash lookup left).
events_ std::vector<FillEvent> Cleared each Process(). Consumer reads after the call.

Hot path: Processstd::visit → command-specific function → reads/writes one Level cell + maintains the bitmap bit + pushes any FillEvents. The handle-based API means Cancel and Modify go straight from Order* to the level -- no map lookup, just one pointer dereference to read side and price.

Design choices

Direct array of Levels, indexed by tick. A std::array<Level, MAX_LEVEL> (2 x 200 000 cells) for bids and asks. O(1) lookup by price, no hashing. Each Level is a small intrusive doubly-linked list of Order*.

Bitmap for best-price scan. One bit per level on each side. updateBestBid/updateBestAsk find the next non-empty level via __builtin_clzll/ctzll on a single 64-bit word -- constant-time regardless of how sparse the book gets. Replaces what would otherwise be a linear scan across thousands of empty cells after a sweep.

Handle-based API, no per-order hashing on the hot path. OrderBook::Process() returns the Order* for limit adds. Subsequent CancelCommand / ModifyCommand carry that pointer directly. The engine reads side and price straight off the Order -- no unordered_map<OrderId, ...> lookup. Cuts cancel/modify latency roughly in half.

Slab-allocated order pool. All Order objects live in a single malloc'd arena, with a freelist of slot indices. No per-order heap allocation. Free indices are reused, so steady-state memory is bounded by poolSize.

__rdtsc() for timestamps. std::chrono::steady_clock::now() was ~20-30 ns per call (vDSO syscall). __rdtsc() is a single instruction, ~5 ns. Stamps are stored as opaque cycle counts.

Sorted vector for stop orders. Stops are kept in a std::vector<PendingStop> per side, sorted so the next-to-trigger sits at the back. Triggering is O(1) pop_back. Replaced an earlier std::multimap (rb-tree, node-allocated, terrible cache behavior).

No exceptions on the hot path. Every member of OrderBook is noexcept; OrderPool::getOrder calls std::abort() on exhaustion (programmer bug, not recoverable). The compiler skips emitting unwinding tables for these functions.

Fill events. Every match emits a FillEvent (maker handle, price, qty, taker side) into a buffer that's exposed via OrderBook::events(). Cleared at the start of every Process() call.

Quick start

#include <orderbook/orderbook.hpp>

OrderBook book(/*poolSize=*/100'000);

// Add a passive bid at price 9999, qty 10.
AddCommand add{LimitOrder{9999, 0}, /*orderId=*/1, /*qty=*/10, Side::Buy};
Command cmd = add;
OrderPointer h = book.Process(cmd);   // h is the handle to the resting order

// Send an aggressive sell that hits the bid.
AddCommand mkt{MarketOrder{0}, /*orderId=*/2, /*qty=*/4, Side::Sell};
Command sellCmd = mkt;
book.Process(sellCmd);

for (const FillEvent& f : book.events()) {
    // f.price, f.qty, f.takerSide, f.makerHandle (nullptr if maker fully consumed)
}

// Cancel the rest of the bid.
CancelCommand cxl{h};
Command cxlCmd = cxl;
book.Process(cxlCmd);

Build

Make

make           # builds bench, runs perf counters
make run       # builds + runs bench (no perf required)
make lib       # just builds build/liborderbook.a
make clean

CMake

cmake -B build-cmake -DCMAKE_BUILD_TYPE=Release
cmake --build build-cmake
./build-cmake/orderbook_bench

To skip the bench when consuming as a library:

cmake -B build-cmake -DORDERBOOK_BUILD_BENCH=OFF

Consuming this library

From CMake (recommended)

# After installing (cmake --install build-cmake --prefix /your/prefix):
find_package(orderbook REQUIRED)
target_link_libraries(your_target PRIVATE orderbook::orderbook)

Or vendor as a submodule and use add_subdirectory():

add_subdirectory(third_party/orderbook)
target_link_libraries(your_target PRIVATE orderbook::orderbook)

Manually

g++ -std=c++20 -O3 -march=native \
    -I/path/to/orderbook/include \
    your_code.cpp \
    /path/to/orderbook/build/liborderbook.a \
    -o your_app

Repository layout

include/orderbook/   # public headers (consume via <orderbook/orderbook.hpp>)
src/                 # library implementations
bench/               # benchmark harness (separate from the library)
cmake/               # CMake package config template
benchmark_reports/   # benchmark output (gitignored)

Supported order types

  • Limit -- rests on the book if not fully matched on entry.
  • Market -- sweeps until filled or the book is empty.
  • Stop-limit -- registered until bestAsk reaches buy-stop price (or bestBid reaches sell-stop price), then injected as a limit at limitPrice.

Modify re-crosses: a price change that pushes through the spread will match against the book like a fresh aggressive limit.

Limitations / not implemented

This is a project, not a production engine. Things you would need before doing anything real with it:

  • No tests. Correctness has been hand-verified through the benchmark harness; there are no unit tests.
  • No threading. Single-threaded only. No publisher thread, no lock-free queues, no NUMA pinning.
  • No order types beyond Limit / Market / Stop-limit. Missing: IOC, FOK, GTC, post-only, hidden, iceberg, peg.
  • No self-trade prevention, participant IDs, or priority categories.
  • No persistence / replay. Fill events are exposed in-memory but not logged.
  • No fee / pricing logic, no auctions, no circuit breakers.
  • Pool uses malloc, not mmap with huge pages. Fine for the project, but you'd want hugepages for a large book.

License

MIT.

About

A cache-conscious single-threaded C++20 limit order book built for low-latency benchmarking and matching-engine experimentation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors