Add trace points in xrt classes#9734
Conversation
c68c43a to
ca9ca60
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
stsoe
left a comment
There was a problem hiding this comment.
Why all these tracepoints? How about adding tracepoints strategically where we want to measure bottlenecks? Tracepoints are not meant to show function calls. The dtor tracepoints are completely useless, they measure nothing at all.
okay Soren, I will make changes accordingly and have tracepoints at needed places only. Thanks |
Hi @stsoe , |
Thanks @chvamshi-xilinx . It sounds like you adding trace points for debugging function calls, I don't think that is intended purpose? My understanding is that tracepoints are to locate bottlenecks. Yes, they are no-ops (at least on Linux), but my concern is the noise it generates by adding uninteresting tracepoints to the tracelog. |
Got it. Thanks @stsoe |
Signed-off-by: rahul <rbramand@amd.com>
Signed-off-by: rahul <rbramand@amd.com>
ca9ca60 to
90dd121
Compare
|
Hi @stsoe, added trace points only at places where we need to identify bottlenecks as suggested. Please review. Thanks |
|
clang-tidy review says "All clean, LGTM! 👍" |
stsoe
left a comment
There was a problem hiding this comment.
When I look at all these tracepoints, I think they are added without much thought about what we want to measure. There are tracepoints in Alveo obsolete code, .e.g update arg functions, why were they added?
I am dubious about the value of many of these tracepoints. Profiling of user space code is not through tracepoints, but with a profiler, e.g. valgrind.
In the past we have added tracepoints to show bottlenecks through kernel module code paths. It make a lot of sense to trace API overhead in UMD code paths that involve critical KMD code paths. Is this the thinking behind all these tracepoints added here? Otherwise just use valgrind or some other profiling tool to optimize the code if we think there are bottlenecks.
| void | ||
| dump_uc_log_buffer() | ||
| { | ||
| XRT_TRACE_POINT_SCOPE(xrt_hw_context_dump_uc_log_buffer); |
There was a problem hiding this comment.
Isn't it the dump() function you want to trace?
| static std::shared_ptr<runlist_impl> | ||
| alloc_runlist(xrt::hw_context hwctx) | ||
| { | ||
| XRT_TRACE_POINT_SCOPE(xrt_runlist_alloc); |
There was a problem hiding this comment.
What are we measuring? The constructor is trivial with 0 overhead. I understand why alloc_runlist was created, it was to ensure that you can measure the time it takes to create runlist_impl, which does all initialization in its initializer list. This makes sense if we truly want to measure the construction time, but do we?
The function is kind of like an observer of hwctx, where the sink is really runlist_impl ctor. Shouldn't alloc_runlist take hwctx by const ref and leave the copying to the runlist_impl ctor?
| runlist:: | ||
| add(const xrt::run& run) | ||
| { | ||
| XRT_TRACE_POINT_SCOPE(xrt_runlist_add); |
There was a problem hiding this comment.
Why are we measuring here and not in && method. If we want to trace anything here, then it should be runlist_impl::add().
| runlist:: | ||
| execute() | ||
| { | ||
| XRT_TRACE_POINT_SCOPE(xrt_runlist_execute); |
There was a problem hiding this comment.
This wasn't added n this PR, but probably shouldn't even be here, it probably should be in runlist_imp::execute().
Problem solved by the commit
Added new trace points in xrt classes and its member functions
Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
Its an EOU/Enhancement change
How problem was solved, alternative solutions (if any) and why they were rejected
Added trace points in xrt objects alloc functions with _ctor in name and also in impl destructors with _dtor in name
Risks (if any) associated the changes in the commit
Low
What has been tested and how, request additional testing if necessary
Tested with perf tool on ve2 and used a python script to pretty print the statistics and it works as expected
Documentation impact (if any)
NA