Skip to content

Conversation

@mcourteaux
Copy link
Contributor

@mcourteaux mcourteaux commented Jun 7, 2024

  • Enable with GENIE --with-profiler-tracy.
  • Vulkan backend done (more instrumentation possible): designing the macros was a bit tough to integrate, due to non-scoped way of working.
  • OpenGL backend done (more instrumentation possible)
  • One extra GL function added to the imports header.
  • General purpose profiler macros redirect to Tracy stuff (if Tracy is enabled).
  • Still works with --with-profiler and also works without any profiling.
  • Glue code in 3rdparty/tracy
  • Official Tracy code in 3rdparty/tracy/public, totally stripped down to what we need: the client.
  • Automatic update script to update Tracy.
  • Works out of the box with the examples (see screenshots below).

Vulkan screenshot:
image

OpenGL screenshot:
image

Tracy associates GPU zones to CPU zones:

image

In the above screenshot, I'm hovering on the rendererSubmit GPU zone with the cursor, and it highlights the time-zone from where this came in the CPU timeline.

@mcourteaux mcourteaux requested a review from bkaradzic as a code owner June 7, 2024 15:26
@bkaradzic
Copy link
Owner

How do you build & run client?

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 7, 2024

That's your task, separately. When using Tracy, you typically clone it and compile the UI once, and install it. Compiling the UI is quite a long process, and makes a big build folder. I use Tracy in several projects, and I have one compiled Tracy UI installed on my system. An explanation in the bgfx documentation for that is probably welcome.

But, to be complete: this is documented in the Tracy Manual, in section 2.3 "Building the server". Note that the Tracy UI (what you called the client) is in the Tracy project referred to as "the server", which is confusing, because it doesn't run a server from a networking perspective.

@bkaradzic
Copy link
Owner

What's preferred integration is that bgfx takes all responsiblity of collecting data, and then have way to pass all that data to Tracy or other profiler, without any modifications to bgfx.

Most of stuff that this function implements is already available in bx+bgfx and no need to use Tracy's implementation of it.

tracy_force_inline void __bgfx_tracy_vulkan_begin_literal( tracy::VkCtx* m_ctx, const tracy::SourceLocationData* srcloc, VkCommandBuffer cmdbuf)
{
	using namespace tracy;
	const auto queryId = m_ctx->NextQueryId();
	CONTEXT_VK_FUNCTION_WRAPPER( vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, m_ctx->GetQueryPool(), queryId ) );

	auto item = Profiler::QueueSerial();
	MemWrite( &item->hdr.type, QueueType::GpuZoneBeginSerial );
	MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
	MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
	MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
	MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
	MemWrite( &item->gpuZoneBegin.context, m_ctx->GetId() );
	Profiler::QueueSerialFinish();
}

Instead integrating Tracy or other profilers directly into bgfx.

@bkaradzic bkaradzic closed this Jun 7, 2024
@bkaradzic
Copy link
Owner

I already had similar style integration with Remotery (see: f0971ed), and I found that it's mistake to do deep integrations like that. It becomes too dependent on specific profiler, and it's inflexible. That's why I think it's better to collect all data by bgfx, and then pass it to whatever profiler user chooses to use.

@mcourteaux
Copy link
Contributor Author

What's preferred integration is that bgfx takes all responsiblity of collecting data, and then have way to pass all that data to Tracy or other profiler, without any modifications to bgfx.

I'm a bit puzzled by your concern. The goal was to not change bgfx, which I didn't for a very few very specific places. Of course if you want to optionally enable something and let the user choose, you will need a few macros. The place where some more specific glue-code is needed, is nicely tucked away far outside of the main bgfx codebase. Do you want to delete Tracy? Just delete the folder and the GENIE option and the 4 macros in the code that conditionally enable it. That's how lightly coupled I tried to make it.

That's why I think it's better to collect all data by bgfx, and then pass it to whatever profiler user chooses to use.

I would like to bring to your attention that it's absolutely non-trivial to have "non-deep integration" with any GPU-aware profiler, and still have the profilers correctly link the CPU and GPU zones. Tracy registers IDs for the GPU timer queries, and associates those with the timed scope of the CPU work. Imagine you want to forbid these "deep" integrations: how would you be able to correctly associate CPU and GPU time for any profiler framework? One potential "generic interface" might work for one framework, but might require a whole lot of workaround-bookkeeping to manage the information correctly before it can be sent to the profiler framework, simply because it didn't fit the style the data is recorded and passed in the "generic interface".

In general, I think the 3 macros: PROFILER_(BEGIN; BEGIN_LITERAL; END) are a decent abstraction, and you can tell by the fact that this works, because your profiler interface, Tracy, and Remotery all fitted it. That's why I think a profiler-specific integration is useful: it can be mapped to the macros, but the macros can hide a lot more interesting behavior that would be otherwise hard (cpu-gpu association) or impossible (not copying over names, and line numbers, and filenames, etc in case of begin_literal) to achieve. For example, Tracy uses a highly optimized strategy of communicating which scope is referred to. That is possible due to macros that cleverly insert static constexpr that contains source code location information, which Tracy sends over the network connection once, and afterwards refers to it using the address of the static variable. Tricks like this require deep integration.

Additionally, I invite you to take a look at the work that Tracy does regarding calibrating the GPU timestamps in the Vulkan implementation (public/tracy/TracyVulkan.hpp). That's the sort of work I wouldn't want to have to rethink and reimplement.

Regarding the code fragment you highlighted: this is just a copy-paste from a part of the Tracy code, but adapted painfully, because bgfx doesn't nicely use scoped blocks to be profiled. Making everything profiled with scoped profiling blocks would remove pretty much all of the glue code I added now (which you highlighted). The reason I did that is because the goal was to not change bgfx.

It's very easy to make a few changes to bgfx and have the profiling follow the scoping approach; in which case stuff would become much simpler.

Most of stuff that this function implements is already available in bx+bgfx and no need to use Tracy's implementation of it.

Well, someone would have to do it. Tracy is an excellent profiler in my opinion, and I think that using it makes sense. Reinventing the wheel here is a lot more work than this 2 days I worked on this.

@bkaradzic
Copy link
Owner

I would like to bring to your attention that it's absolutely non-trivial to have "non-deep integration" with any GPU-aware profiler, and still have the profilers correctly link the CPU and GPU zones.

CPU/GPU "linked" profiling already exist here:

bgfx/src/renderer.h

Lines 497 to 532 in b66f60c

void begin(uint16_t _view)
{
if (m_enabled)
{
ViewStats& viewStats = m_frame->m_perfStats.viewStats[m_numViews];
viewStats.cpuTimeBegin = bx::getHPCounter();
m_queryIdx = m_gpuTimer.begin(_view, m_frame->m_frameNum);
viewStats.view = ViewId(_view);
bx::strCopy(viewStats.name
, BGFX_CONFIG_MAX_VIEW_NAME
, &m_viewName[_view][BGFX_CONFIG_MAX_VIEW_NAME_RESERVED]
);
}
}
void end()
{
if (m_enabled
&& UINT32_MAX != m_queryIdx)
{
m_gpuTimer.end(m_queryIdx);
ViewStats& viewStats = m_frame->m_perfStats.viewStats[m_numViews];
const typename Ty::Result& result = m_gpuTimer.m_result[viewStats.view];
viewStats.cpuTimeEnd = bx::getHPCounter();
viewStats.gpuTimeBegin = result.m_begin;
viewStats.gpuTimeEnd = result.m_end;
viewStats.gpuFrameNum = result.m_frameNum;
++m_numViews;
m_queryIdx = UINT32_MAX;
}
}

That is possible due to macros that cleverly insert static constexpr that contains source code location information, which Tracy sends over the network connection once, and afterwards refers to it using the address of the static variable. Tricks like this require deep integration.

That doesn't require deep integration. _filePath and _line are always literals. And bgfx also has concept where name is literal too.

bgfx/include/bgfx/bgfx.h

Lines 490 to 526 in b66f60c

/// Profiler region begin.
///
/// @param[in] _name Region name, contains dynamic string.
/// @param[in] _abgr Color of profiler region.
/// @param[in] _filePath File path where `profilerBegin` was called.
/// @param[in] _line Line where `profilerBegin` was called.
///
/// @remarks
/// Not thread safe and it can be called from any thread.
///
/// @attention C99's equivalent binding is `bgfx_callback_vtbl.profiler_begin`.
///
virtual void profilerBegin(
const char* _name
, uint32_t _abgr
, const char* _filePath
, uint16_t _line
) = 0;
/// Profiler region begin with string literal name.
///
/// @param[in] _name Region name, contains string literal.
/// @param[in] _abgr Color of profiler region.
/// @param[in] _filePath File path where `profilerBeginLiteral` was called.
/// @param[in] _line Line where `profilerBeginLiteral` was called.
///
/// @remarks
/// Not thread safe and it can be called from any thread.
///
/// @attention C99's equivalent binding is `bgfx_callback_vtbl.profiler_begin_literal`.
///
virtual void profilerBeginLiteral(
const char* _name
, uint32_t _abgr
, const char* _filePath
, uint16_t _line
) = 0;

Well, someone would have to do it. Tracy is an excellent profiler in my opinion, and I think that using it makes sense. Reinventing the wheel here is a lot more work than this 2 days I worked on this.

This is not argument about Tracy. It's about integration with bgfx...

@mcourteaux
Copy link
Contributor Author

That doesn't require deep integration. _filePath and _line are always literals.

That still doesn't allow you to do optimized Tracy profiling. Tracy's ZoneScopedNC does this:

#define ZoneNamedNC( varname, name, color, active )                                          \
   static constexpr tracy::SourceLocationData TracyConcat(__tracy_source_location,TracyLine) \
      { name, TracyFunction,  TracyFile, (uint32_t)TracyLine, color };                       \
   tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,TracyLine), active )

You see how the information about what is profiled, is kept separately in a static constexpr SourceLocationData, which is later only referenced by address.

This sort of trick, you can't do without macros expanding into something more interesting than a virtual function call.

CPU/GPU "linked" profiling already exist here:

I don't know what that does, as it looks conceptually broken... How can a GPU timer be ended, and the result immediately stored in a ViewStats struct? The Vulkan backend has 3 frames in flight, and so when a timer is ended on CPU, it'll end only 50ms later on GPU. This can't work...


What do you suggest? Define an interface that will work with any profiler you'd like to hook up, that doesn't require profiler-specific macros, and I'll make it.

@bkaradzic
Copy link
Owner

@mcourteaux See this: https://github.com/simco50/TimelineProfiler

Just an idea... It would be great to have this timeline in bgfx, with option to swap with Tracy.

@mcourteaux
Copy link
Contributor Author

That would still require macros expanding to Profiler specific code. You didn't like that last time.

@bkaradzic
Copy link
Owner

What I didn't like is that it's not easy to drop in Tracy. And internals were too dependent on Tracy.

I wouldn't mind if there was alternative macro that provide timeline profiler in bgfx repo, and then ability for user to declare macros in the way that it reroutes to their profiler.

@mcourteaux
Copy link
Contributor Author

If we put some effort in this, I'd like to not have Tracy source be in this repo. Something like the way it's the users responsibility to clone bx and bimg next to bgfx, I'd like to have them clone Tracy or simco50/TimeProfiler the same way. Then a single flag in GENie to enable a profiler. And one file that manages the profiler macros to redirect stuff to the appropriate profiler. Thoughts?

@bkaradzic
Copy link
Owner

I'll look into integrating this TimelineProfiler UI to bgfx.

@bkaradzic
Copy link
Owner

bkaradzic commented Nov 21, 2025

I added Superluminal support here: #3504

No changes to API are needed (at least for CPU profiler).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants