-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Tracy Integration. #3308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracy Integration. #3308
Conversation
|
How do you build & run client? |
|
That's your task, separately. When using Tracy, you typically clone it and compile the UI once, and install it. Compiling the UI is quite a long process, and makes a big build folder. I use Tracy in several projects, and I have one compiled Tracy UI installed on my system. An explanation in the bgfx documentation for that is probably welcome. But, to be complete: this is documented in the Tracy Manual, in section 2.3 "Building the server". Note that the Tracy UI (what you called the client) is in the Tracy project referred to as "the server", which is confusing, because it doesn't run a server from a networking perspective. |
|
What's preferred integration is that bgfx takes all responsiblity of collecting data, and then have way to pass all that data to Tracy or other profiler, without any modifications to bgfx. Most of stuff that this function implements is already available in bx+bgfx and no need to use Tracy's implementation of it. tracy_force_inline void __bgfx_tracy_vulkan_begin_literal( tracy::VkCtx* m_ctx, const tracy::SourceLocationData* srcloc, VkCommandBuffer cmdbuf)
{
using namespace tracy;
const auto queryId = m_ctx->NextQueryId();
CONTEXT_VK_FUNCTION_WRAPPER( vkCmdWriteTimestamp( cmdbuf, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, m_ctx->GetQueryPool(), queryId ) );
auto item = Profiler::QueueSerial();
MemWrite( &item->hdr.type, QueueType::GpuZoneBeginSerial );
MemWrite( &item->gpuZoneBegin.cpuTime, Profiler::GetTime() );
MemWrite( &item->gpuZoneBegin.srcloc, (uint64_t)srcloc );
MemWrite( &item->gpuZoneBegin.thread, GetThreadHandle() );
MemWrite( &item->gpuZoneBegin.queryId, uint16_t( queryId ) );
MemWrite( &item->gpuZoneBegin.context, m_ctx->GetId() );
Profiler::QueueSerialFinish();
}Instead integrating Tracy or other profilers directly into bgfx. |
|
I already had similar style integration with Remotery (see: f0971ed), and I found that it's mistake to do deep integrations like that. It becomes too dependent on specific profiler, and it's inflexible. That's why I think it's better to collect all data by bgfx, and then pass it to whatever profiler user chooses to use. |
I'm a bit puzzled by your concern. The goal was to not change bgfx, which I didn't for a very few very specific places. Of course if you want to optionally enable something and let the user choose, you will need a few macros. The place where some more specific glue-code is needed, is nicely tucked away far outside of the main bgfx codebase. Do you want to delete Tracy? Just delete the folder and the GENIE option and the 4 macros in the code that conditionally enable it. That's how lightly coupled I tried to make it.
I would like to bring to your attention that it's absolutely non-trivial to have "non-deep integration" with any GPU-aware profiler, and still have the profilers correctly link the CPU and GPU zones. Tracy registers IDs for the GPU timer queries, and associates those with the timed scope of the CPU work. Imagine you want to forbid these "deep" integrations: how would you be able to correctly associate CPU and GPU time for any profiler framework? One potential "generic interface" might work for one framework, but might require a whole lot of workaround-bookkeeping to manage the information correctly before it can be sent to the profiler framework, simply because it didn't fit the style the data is recorded and passed in the "generic interface". In general, I think the 3 macros: PROFILER_(BEGIN; BEGIN_LITERAL; END) are a decent abstraction, and you can tell by the fact that this works, because your profiler interface, Tracy, and Remotery all fitted it. That's why I think a profiler-specific integration is useful: it can be mapped to the macros, but the macros can hide a lot more interesting behavior that would be otherwise hard (cpu-gpu association) or impossible (not copying over names, and line numbers, and filenames, etc in case of begin_literal) to achieve. For example, Tracy uses a highly optimized strategy of communicating which scope is referred to. That is possible due to macros that cleverly insert Additionally, I invite you to take a look at the work that Tracy does regarding calibrating the GPU timestamps in the Vulkan implementation ( Regarding the code fragment you highlighted: this is just a copy-paste from a part of the Tracy code, but adapted painfully, because bgfx doesn't nicely use scoped blocks to be profiled. Making everything profiled with scoped profiling blocks would remove pretty much all of the glue code I added now (which you highlighted). The reason I did that is because the goal was to not change bgfx. It's very easy to make a few changes to bgfx and have the profiling follow the scoping approach; in which case stuff would become much simpler.
Well, someone would have to do it. Tracy is an excellent profiler in my opinion, and I think that using it makes sense. Reinventing the wheel here is a lot more work than this 2 days I worked on this. |
CPU/GPU "linked" profiling already exist here: Lines 497 to 532 in b66f60c
That doesn't require deep integration. Lines 490 to 526 in b66f60c
This is not argument about Tracy. It's about integration with bgfx... |
That still doesn't allow you to do optimized Tracy profiling. Tracy's #define ZoneNamedNC( varname, name, color, active ) \
static constexpr tracy::SourceLocationData TracyConcat(__tracy_source_location,TracyLine) \
{ name, TracyFunction, TracyFile, (uint32_t)TracyLine, color }; \
tracy::ScopedZone varname( &TracyConcat(__tracy_source_location,TracyLine), active )You see how the information about what is profiled, is kept separately in a This sort of trick, you can't do without macros expanding into something more interesting than a virtual function call.
I don't know what that does, as it looks conceptually broken... How can a GPU timer be ended, and the result immediately stored in a ViewStats struct? The Vulkan backend has 3 frames in flight, and so when a timer is ended on CPU, it'll end only 50ms later on GPU. This can't work... What do you suggest? Define an interface that will work with any profiler you'd like to hook up, that doesn't require profiler-specific macros, and I'll make it. |
|
@mcourteaux See this: https://github.com/simco50/TimelineProfiler Just an idea... It would be great to have this timeline in bgfx, with option to swap with Tracy. |
|
That would still require macros expanding to Profiler specific code. You didn't like that last time. |
|
What I didn't like is that it's not easy to drop in Tracy. And internals were too dependent on Tracy. I wouldn't mind if there was alternative macro that provide timeline profiler in bgfx repo, and then ability for user to declare macros in the way that it reroutes to their profiler. |
|
If we put some effort in this, I'd like to not have Tracy source be in this repo. Something like the way it's the users responsibility to clone bx and bimg next to bgfx, I'd like to have them clone Tracy or simco50/TimeProfiler the same way. Then a single flag in GENie to enable a profiler. And one file that manages the profiler macros to redirect stuff to the appropriate profiler. Thoughts? |
|
I'll look into integrating this TimelineProfiler UI to bgfx. |
|
I added Superluminal support here: #3504 No changes to API are needed (at least for CPU profiler). |
--with-profiler-tracy.--with-profilerand also works without any profiling.3rdparty/tracy3rdparty/tracy/public, totally stripped down to what we need: the client.Vulkan screenshot:

OpenGL screenshot:

Tracy associates GPU zones to CPU zones:
In the above screenshot, I'm hovering on the
rendererSubmitGPU zone with the cursor, and it highlights the time-zone from where this came in the CPU timeline.