A lightweight, cross-platform C++20 profiling library with Chrome DevTools and Optick backend support.
- Zero overhead when disabled - all macros compile to nothing
- Chrome DevTools integration - output JSON viewable in
chrome://tracingor Perfetto - Optick support — optional backend for the Optick profiler
- Thread-safe — lock-free per-thread binary event buffers with minimal contention (see Threading Model)
- RAII-based — automatic scope timing via
PROFILER_SCOPEandPROFILER_FUNCTION - Cross-platform — Windows, Linux, macOS, Android (NDK), iOS
- Easy CMake integration —
add_subdirectory,FetchContent, orfind_package
add_subdirectory:
add_subdirectory(path/to/Profiler)
target_link_libraries(YourTarget PRIVATE Profiler::Profiler)FetchContent:
include(FetchContent)
FetchContent_Declare(
Profiler
GIT_REPOSITORY https://github.com/your-username/Profiler.git
GIT_TAG main
)
FetchContent_MakeAvailable(Profiler)
target_link_libraries(YourTarget PRIVATE Profiler::Profiler)#include <Profiler/ProfilerMacros.h>
int main()
{
// Profiler is auto-initialized — no setup needed
PROFILER_BEGIN_SESSION("MySession", "output/profile");
{
PROFILER_SCOPE("MyScope");
// ... code to profile ...
}
// Writes output/profile.json
PROFILER_END_SESSION();
}
void MyFunction()
{
PROFILER_FUNCTION(); // profiles the entire function
// ... function body ...
}Open the generated .json file in:
- Chrome/Edge: navigate to
chrome://tracing(oredge://tracing) and load the file - Perfetto: upload at ui.perfetto.dev
| Option | Default | Description |
|---|---|---|
PROFILER_ENABLED |
ON |
Enable profiling instrumentation |
PROFILER_USE_OPTICK |
OFF |
Use Optick backend (auto-fetched via FetchContent) |
PROFILER_BUILD_EXAMPLE |
OFF |
Build the example application |
PROFILER_BUILD_TESTS |
OFF |
Build the test suite (Google Test) |
Set PROFILER_ENABLED to OFF for zero runtime cost — all macros expand to nothing:
set(PROFILER_ENABLED OFF CACHE BOOL "" FORCE)For game engines or frame-based applications, the profiler can auto-end a session after a fixed number of frames:
// Profile 5 frames, then auto-stop and write to disk
PROFILER_BEGIN_SESSION("Gameplay", "profiling/gameplay", 5);
// In your game loop:
while (running)
{
PROFILER_FRAME("MainThread"); // names the thread
PROFILER_SCOPE("Frame");
// ... frame logic ...
PROFILER_TICK(); // advances frame counter, auto-ends after 5
}| Macro | Description |
|---|---|
PROFILER |
Access the active profiler instance |
PROFILER_FUNCTION() |
Profile the current function |
PROFILER_SCOPE(name) |
Profile a named scope |
PROFILER_BEGIN_SESSION(name, ...) |
Start a profiling session (path, maxFrames, callback are optional) |
PROFILER_END_SESSION() |
End the current session and flush output |
PROFILER_TICK() |
Advance frame counter (for frame-based profiling) |
PROFILER_THREAD(name) |
Name the calling thread in the trace output |
PROFILER_FRAME(name) |
Name the calling thread (Optick: also marks frame boundary) |
profiler::ServiceLocator — global profiler management (namespace)
| Function | Description |
|---|---|
GetProfiler() |
Access the profiler (concrete type, no virtual dispatch) |
The backend is selected at compile time (PROFILER_USE_OPTICK flag). GetProfiler() returns the concrete
type (GoogleProfiler& or OptickProfiler&), enabling the compiler to devirtualize all calls. To disable
profiling at runtime, simply call EndSession() or don't call BeginSession() — WriteProfile is a no-op
when no session is active.
PROFILER_SCOPE, PROFILER_FUNCTION, and WriteProfile are safe to call concurrently from any number
of threads — each thread writes to its own lock-free buffer with no contention.
Session lifecycle functions (BeginSession, EndSession, Tick) are not thread-safe. They must be
called from a single thread (typically the main thread), and no other thread may be calling WriteProfile
at the same time. In practice this means:
PROFILER_BEGIN_SESSION("Game", "profiling/game"); // main thread, before spawning workers
// ... workers run, call PROFILER_SCOPE / PROFILER_FUNCTION freely ...
// join all worker threads first, then:
PROFILER_END_SESSION(); // main thread, after joining workersCalling BeginSession or EndSession while worker threads are actively writing profile events is
undefined behavior.
# Configure with tests and example
cmake -B build -DPROFILER_BUILD_TESTS=ON -DPROFILER_BUILD_EXAMPLE=ON
# Build
cmake --build build --config Release
# Run tests
cd build && ctest -C Release --output-on-failure
# Run example
./example/Release/ProfilerExampleinclude/Profiler/ Public headers
Profiler.h Base profiler class & ProfileResult
InstrumentorTimer.h RAII scope timer
ServiceLocator.h Global instance management & PROFILER macro
ProfilerMacros.h Instrumentation macros (main include)
GoogleProfiler.h Chrome Trace Event Format backend
OptickProfiler.h Optick backend (optional)
src/ Implementation
GoogleProfiler.cpp Google backend implementation
OptickProfiler.cpp Optick backend implementation (optional)
ServiceLocator.cpp Backend initialization
example/ Example application (6 demos)
tests/ Google Test suite (46 tests + 5 benchmarks)
cmake/ CMake package config
We benchmark the profiler itself to make sure the instrumentation overhead stays minimal. The goal is to quantify exactly how much time your application spends inside the profiler so you can make informed decisions about where to instrument.
Each benchmark runs the measured operation 10 times. The single highest and single lowest samples are discarded to remove outliers caused by OS scheduling, CPU frequency scaling, or filesystem cache cold-starts. The remaining 8 samples are averaged to produce the reported number.
All benchmarks are compiled in Release mode (/O2 on MSVC) and run on the same machine in a single
process. The benchmark source lives in tests/bench_GoogleProfiler.cpp and runs as part of the Google
Test suite (filter with --gtest_filter="ProfilerBench.*").
| Benchmark | What it does | Why it matters |
|---|---|---|
| WriteProfile | Calls GoogleProfiler::WriteProfile() 100k times on a single thread. Each call stores a binary TraceEvent struct into a pre-reserved per-thread buffer. |
This is the hot path. Every PROFILER_SCOPE and PROFILER_FUNCTION ultimately calls WriteProfile once when the scope ends. This number tells you the per-event cost of recording a trace event. |
| WriteProfile (4 threads) | Same as above but split across 4 threads (25k calls each), measuring wall-clock time. | Validates that the thread-local buffer design scales under contention. Each thread writes to its own ThreadBuffer, so the only lock acquisition happens once per thread per session (on first write). |
| InstrumentorTimer RAII | Constructs and destructs an InstrumentorTimer 50k times. Each cycle records a start time (constructor) and emits a single complete event (destructor). |
This is what PROFILER_SCOPE("name") and PROFILER_FUNCTION() actually expand to. Measures the real end-to-end cost including two chrono::high_resolution_clock::now() calls, one WriteProfile dispatch through the ServiceLocator, and RAII overhead. |
| FlushToString (10k events) | Buffers 10k trace events, then calls FlushToString() 100 times (read-only, does not clear the buffer). |
Measures the cost of converting binary event buffers to JSON. Relevant for streaming use-cases (e.g. Emscripten, live viewers) where you call FlushToString() periodically without ending the session. |
| Session lifecycle | Runs 1000 full cycles of: construct GoogleProfiler -> BeginSession (with file path) -> WriteProfile (1 event) -> EndSession (writes .json to disk). |
Measures the worst-case path that includes filesystem I/O (directory creation, file open, write, close). This is not the hot path — it only happens once at the start and end of a profiling session — but it bounds the cost of session management. |
Measured on Windows 11, MSVC 19.x, Release build (/O2), AMD Ryzen 7 5800H.
| Benchmark | Calls | Avg time | Per-call |
|---|---|---|---|
| WriteProfile (single thread) | 100,000 | 2.9 ms | 29 ns/call |
| WriteProfile (4 threads) | 100,000 | 3.0 ms | 30 ns/call |
| InstrumentorTimer RAII | 50,000 | 5.8 ms | 116 ns/cycle |
| FlushToString (10k events) | 100 | 280.4 ms | 2.80 ms/call |
| Session lifecycle (with file I/O) | 1,000 | 1,640.8 ms | 1,641 us/cycle |
- InstrumentorTimer at ~116 ns/cycle is the real-world cost of
PROFILER_SCOPE(). On a 60fps frame (~16.6 ms), profiling 100 scopes costs ~0.012 ms, or about 0.07% of your frame budget. - Multi-threaded scaling shows near-linear improvement (30 ns/call with 4 threads vs 29 ns single-threaded) because each thread writes to its own lock-free buffer.
- Session lifecycle is dominated by filesystem I/O (~1.6 ms per cycle). This only happens when
you call
BeginSession/EndSession, not during normal profiling. - FlushToString converts binary event buffers to JSON on demand. The cost scales with the number of buffered events and only runs when you explicitly flush or end a session.
cmake -B build -DPROFILER_BUILD_TESTS=ON
cmake --build build --config Release
./build/tests/Release/ProfilerTests --gtest_filter="ProfilerBench.*"- C++20 compiler (MSVC 2019 16.10+, GCC 10+, Clang 10+, Apple Clang 13+)
- CMake 3.16+
- (Optional) Optick for the Optick backend