Skip to content

Commit e4045f4

Browse files
dgaliffiAMDkcossett-amdmarantic-amdmradosav-amdprbasyal-amd
authored
[7.1] Fix rocPD data collection (#1439)
* [rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd (#1016) * Add OMPT to ROCpd * Use correct category * Added wrapper functions for future control * Formatting * Fix naming * Comment change * Remove ompt_get_cb_args * Switched to using region_sample for OMPT * Remove relic function * Remove get_use_rocpd that was used in this pr (one still remains) * Rename ompt_get_args_string and reuse in tool_tracing_callback_stop * Make lock init and destroy cb instant * [Prototype] ROCPD Name fix * [Prototype] ROCPD Name fix P1 * [Prototype] ROCPD Name fix P2 * ROCPD Name fix * Var name changes * Rewrite cb overwrite to single function * [Important] Use parallel_data as key for parallel callback map * Fix workflow failure * Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0 * Add missing ROCPROFILER_VERSION check * Improve readability * Make ompt storage maps thread local * Part 1: Variable name fix, memory cleanup, and fixed asserts * Part 2: Add comments * Part 3: Add CI_THROW * Part 4: Formatting * Part 5: Move #include to cpp * Add missing counter events handling for ROCPD (#1305) * Add missing counter events handling for ROCPD * Update projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp * Update projects/rocprofiler-systems/source/lib/rocprof-sys/library/rocprofiler-sdk/counters.cpp * Fixed formatting Signed-off-by: David Galiffi <David.Galiffi@amd.com> --------- Signed-off-by: Marjan Antic <Marjan.Antic@amd.com> Co-authored-by: David Galiffi <David.Galiffi@amd.com> * Update VERSION to 1.2.1 * Update CHANGELOG.md * Add caching of category region for rocpd (#1420) * Add caching of category region Fix vaapi traces Remove region_with_name * Applied suggestions from code review * Add some simple rocpd testing Signed-off-by: David Galiffi <David.Galiffi@amd.com> * Adjust `rocpd_string` validation parameters * Adjust `rocm_marker_api` validation parameters * Update projects/rocprofiler-systems/CHANGELOG.md Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update projects/rocprofiler-systems/CHANGELOG.md Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> * Update projects/rocprofiler-systems/CHANGELOG.md Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> --------- Signed-off-by: Marjan Antic <Marjan.Antic@amd.com> Signed-off-by: David Galiffi <David.Galiffi@amd.com> Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com> Co-authored-by: marantic-amd <marantic@amd.com> Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com> Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
1 parent 9a59a07 commit e4045f4

37 files changed

Lines changed: 2973 additions & 92 deletions

.pre-commit-config.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,12 @@ repos:
3030
- repo: https://github.com/pre-commit/pre-commit-hooks
3131
rev: v5.0.0
3232
hooks:
33+
- id: check-json # Check JSON files for syntax errors
3334
- id: check-yaml # Check YAML files for syntax errors
3435
- id: trailing-whitespace # Remove trailing whitespace
3536
- id: end-of-file-fixer # Fix files to have a newline at the end
37+
- id: pretty-format-json # Pretty-format JSON files
38+
args: ['--indent', '4', '--autofix']
3639

3740
- repo: https://github.com/pre-commit/mirrors-clang-format
3841
rev: v18.1.8 # Version 18 as specified in contributor guide

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@
44

55
Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/).
66

7+
## ROCm Systems Profiler 1.2.1 for ROCm 7.1.1
8+
9+
### Resolved issues
10+
11+
- Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the `rocpd` output.
12+
713
## ROCm Systems Profiler 1.2.0 for ROCm 7.1.0
814

915
### Added

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.2.0
1+
1.2.1

source/lib/core/trace_cache/buffer_storage.cpp

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,6 @@ buffer_storage::buffer_storage(pid_t _pid)
9696
}
9797
};
9898

99-
ROCPROFSYS_DEBUG("Starting buffered storage flushing thread for pid %d",
100-
static_cast<int>(_pid));
10199
m_created_process = _pid;
102100
std::mutex _shutdown_condition_mutex;
103101
while(m_running)

source/lib/core/trace_cache/metadata_registry.cpp

Lines changed: 113 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
// SOFTWARE.
2222

2323
#include "metadata_registry.hpp"
24+
#include "core/debug.hpp"
2425
#include <algorithm>
2526
#include <cstdint>
2627

@@ -210,7 +211,7 @@ metadata_registry::get_string_list() const
210211
return result;
211212
}
212213

213-
#if ROCPROFSYS_USE_ROCM
214+
#if ROCPROFSYS_USE_ROCM > 0
214215

215216
void
216217
metadata_registry::add_code_object(
@@ -278,6 +279,102 @@ metadata_registry::get_kernel_symbol_list() const
278279
return result;
279280
}
280281

282+
// As the underlying implementation of callback_name_info_t resizes the category storage
283+
// during emplace, this special method is required
284+
void
285+
metadata_registry::overwrite_callback_names(
286+
std::initializer_list<
287+
std::pair<rocprofiler_callback_tracing_kind_t, callback_rename_map_t>>
288+
rename_table)
289+
{
290+
if(rename_table.size() == 0) return;
291+
292+
using callback_kind_t = rocprofiler_callback_tracing_kind_t;
293+
using operation_names_t = std::vector<std::string_view>;
294+
295+
auto category_names = std::vector<std::string_view>{};
296+
auto modified_ops = std::map<callback_kind_t, operation_names_t>{};
297+
298+
auto extract_operations = [&](callback_kind_t cat) -> operation_names_t {
299+
auto items = m_callback_tracing_info.items();
300+
const auto* target_category = items[static_cast<size_t>(cat)];
301+
302+
auto operations_data = target_category->items();
303+
operation_names_t operation_names;
304+
operation_names.reserve(operations_data.size());
305+
306+
for(const auto& [op_idx, op_name] : operations_data)
307+
operation_names.push_back(*op_name);
308+
309+
return operation_names;
310+
};
311+
312+
// Store category names
313+
category_names.resize(ROCPROFILER_CALLBACK_TRACING_LAST);
314+
for(callback_kind_t i = ROCPROFILER_CALLBACK_TRACING_NONE;
315+
i < ROCPROFILER_CALLBACK_TRACING_LAST;
316+
i = static_cast<callback_kind_t>(static_cast<int>(i) + 1))
317+
{
318+
category_names[i] = m_callback_tracing_info.at(i);
319+
}
320+
321+
// Process list
322+
for(const auto& category_info : rename_table)
323+
{
324+
auto callback_kind = category_info.first;
325+
// Store operations of all following categories
326+
// as they will be deleted
327+
for(callback_kind_t i =
328+
static_cast<callback_kind_t>(static_cast<int>(callback_kind) + 1);
329+
i < ROCPROFILER_CALLBACK_TRACING_LAST;
330+
i = static_cast<callback_kind_t>(static_cast<int>(i) + 1))
331+
{
332+
if(modified_ops.find(i) != modified_ops.end()) break;
333+
modified_ops[i] = extract_operations(i);
334+
}
335+
336+
ROCPROFSYS_CI_THROW(modified_ops.find(callback_kind) != modified_ops.end(),
337+
"Overwriting a previously overwritten entry is forbidden");
338+
339+
ROCPROFSYS_CI_THROW(!modified_ops.empty() &&
340+
callback_kind >= modified_ops.begin()->first,
341+
"Category must have a larger enum value than all previously "
342+
"modified_ops categories");
343+
344+
// Overwrite desired category
345+
auto operation_names = extract_operations(callback_kind);
346+
for(const auto& [index, new_value] : category_info.second)
347+
{
348+
ROCPROFSYS_CI_THROW(index < 0 ||
349+
static_cast<size_t>(index) >= operation_names.size(),
350+
"Index is invalid");
351+
operation_names[index] = new_value;
352+
}
353+
modified_ops[callback_kind] = std::move(operation_names);
354+
}
355+
if(modified_ops.empty()) return;
356+
357+
// Emplace the changed category operations
358+
for(callback_kind_t i = modified_ops.begin()->first;
359+
i < ROCPROFILER_CALLBACK_TRACING_LAST;
360+
i = static_cast<callback_kind_t>(static_cast<int>(i) + 1))
361+
{
362+
auto renaming_entry = modified_ops.find(i);
363+
364+
ROCPROFSYS_CI_THROW(renaming_entry == modified_ops.end(),
365+
"A category that needs to be emplaced is missing");
366+
367+
const auto& operations_vec = renaming_entry->second;
368+
m_callback_tracing_info.emplace(i, category_names.at(i).data());
369+
for(size_t op_idx = 0; op_idx < operations_vec.size(); ++op_idx)
370+
{
371+
m_callback_tracing_info.emplace(
372+
i, static_cast<rocprofiler_tracing_operation_t>(op_idx),
373+
operations_vec[op_idx].data());
374+
}
375+
}
376+
}
377+
281378
rocprofiler::sdk::buffer_name_info_t<const char*>
282379
metadata_registry::get_buffer_name_info() const
283380
{
@@ -292,5 +389,20 @@ metadata_registry::get_callback_tracing_info() const
292389

293390
#endif
294391

392+
metadata_registry::metadata_registry()
393+
{
394+
#if ROCPROFSYS_USE_ROCM > 0
395+
overwrite_callback_names({
396+
# if(ROCPROFILER_VERSION >= 600)
397+
{ ROCPROFILER_CALLBACK_TRACING_OMPT,
398+
{ { ROCPROFILER_OMPT_ID_thread_begin, "omp_thread" },
399+
{ ROCPROFILER_OMPT_ID_thread_end, "omp_thread" },
400+
{ ROCPROFILER_OMPT_ID_parallel_begin, "omp_parallel" },
401+
{ ROCPROFILER_OMPT_ID_parallel_end, "omp_parallel" } } }
402+
# endif
403+
});
404+
#endif
405+
}
406+
295407
} // namespace trace_cache
296408
} // namespace rocprofsys

source/lib/core/trace_cache/metadata_registry.hpp

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@
3535
# include <rocprofiler-sdk/callback_tracing.h>
3636
# include <rocprofiler-sdk/cxx/name_info.hpp>
3737
#endif
38+
#include <initializer_list>
39+
#include <map>
3840
#include <set>
3941
#include <sstream>
4042
#include <stdint.h>
@@ -215,7 +217,7 @@ struct metadata_registry
215217

216218
private:
217219
friend class cache_manager;
218-
metadata_registry() = default;
220+
metadata_registry();
219221
common::synchronized<info::process> m_process;
220222
common::synchronized<
221223
std::unordered_set<info::pmc, info::pmc_info_hash, info::pmc_info_equal>>
@@ -240,6 +242,14 @@ struct metadata_registry
240242
rocprofiler::sdk::callback_name_info_t<const char*> m_callback_tracing_info{
241243
rocprofiler::sdk::get_callback_tracing_names<const char*>()
242244
};
245+
246+
using callback_rename_map_t =
247+
std::map<rocprofiler_tracing_operation_t, std::string_view>;
248+
249+
void overwrite_callback_names(
250+
std::initializer_list<
251+
std::pair<rocprofiler_callback_tracing_kind_t, callback_rename_map_t>>
252+
rename_table);
243253
#endif
244254
};
245255

source/lib/core/trace_cache/rocpd_post_processing.cpp

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,11 @@ rocpd_post_processing::get_region_callback() const
279279
return tokens;
280280
};
281281

282+
if(arg_str.empty())
283+
{
284+
return args;
285+
}
286+
282287
auto tokens = split(arg_str, delimiter);
283288

284289
// Ensure the number of tokens is a multiple of 4
@@ -307,12 +312,7 @@ rocpd_post_processing::get_region_callback() const
307312
auto thread_primary_key =
308313
data_processor.map_thread_id_to_primary_key(_rs.thread_id);
309314

310-
auto callback_tracing_info = m_metadata.get_callback_tracing_info();
311-
auto _name = std::string{ callback_tracing_info.at(
312-
static_cast<rocprofiler_callback_tracing_kind_t>(_rs.kind),
313-
static_cast<rocprofiler_tracing_operation_t>(_rs.operation)) };
314-
auto name_primary_key = data_processor.insert_string(_name.c_str());
315-
315+
auto name_primary_key = data_processor.insert_string(_rs.name.c_str());
316316
auto category_primary_key = data_processor.insert_string(_rs.category.c_str());
317317

318318
size_t stack_id = _rs.correlation_id_internal;

source/lib/core/trace_cache/sample_type.hpp

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -131,13 +131,12 @@ struct memory_allocate_sample : storage_parsed_type_base
131131
struct region_sample : storage_parsed_type_base
132132
{
133133
region_sample() = default;
134-
region_sample(uint64_t _thread_id, int32_t _kind, int32_t _operation,
134+
region_sample(uint64_t _thread_id, std::string _name,
135135
uint64_t _correlation_id_internal, uint64_t _correlation_id_ancestor,
136136
uint64_t _start_timestamp, uint64_t _end_timestamp,
137137
std::string _call_stack, std::string _args_str, std::string _category)
138138
: thread_id(_thread_id)
139-
, kind(_kind)
140-
, operation(_operation)
139+
, name(std::move(_name))
141140
, correlation_id_internal(_correlation_id_internal)
142141
, correlation_id_ancestor(_correlation_id_ancestor)
143142
, start_timestamp(_start_timestamp)
@@ -147,10 +146,8 @@ struct region_sample : storage_parsed_type_base
147146
, category(std::move(_category))
148147
{}
149148

150-
// Identification fields
151-
uint64_t thread_id;
152-
int32_t kind;
153-
int32_t operation;
149+
uint64_t thread_id;
150+
std::string name;
154151

155152
// Correlation fields
156153
uint64_t correlation_id_internal;

source/lib/core/trace_cache/storage_parser.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,8 +164,7 @@ storage_parser::consume_storage()
164164
case entry_type::region:
165165
{
166166
region_sample _region_sample;
167-
parse_data(sample.data(), _region_sample.thread_id, _region_sample.kind,
168-
_region_sample.operation,
167+
parse_data(sample.data(), _region_sample.thread_id, _region_sample.name,
169168
_region_sample.correlation_id_internal,
170169
_region_sample.correlation_id_ancestor,
171170
_region_sample.start_timestamp, _region_sample.end_timestamp,

source/lib/rocprof-sys/library/components/category_region.hpp

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,99 @@
2626
#include "core/defines.hpp"
2727
#include "core/state.hpp"
2828
#include "core/timemory.hpp"
29+
#include "core/trace_cache/cache_manager.hpp"
2930
#include "library/causal/data.hpp"
3031
#include "library/runtime.hpp"
32+
#include "library/thread_info.hpp"
3133
#include "library/tracing.hpp"
3234
#include "library/tracing/annotation.hpp"
3335

36+
#include <map>
37+
#include <thread>
3438
#include <timemory/components/gotcha/backends.hpp>
3539
#include <timemory/hash/types.hpp>
3640
#include <timemory/mpl/concepts.hpp>
3741
#include <timemory/mpl/types.hpp>
3842
#include <timemory/utility/types.hpp>
3943

4044
#include <string_view>
45+
#include <utility>
46+
47+
namespace
48+
{
49+
50+
void
51+
cache_region(uint64_t thread_id, const std::string& name, uint64_t start_ts,
52+
uint64_t end_ts, const std::string& category)
53+
{
54+
constexpr size_t NO_CORRELATION_ID = 0;
55+
constexpr const char* CALLSTACK = "";
56+
constexpr const char* ARGUMENTS = "";
57+
rocprofsys::trace_cache::get_buffer_storage().store(
58+
rocprofsys::trace_cache::entry_type::region, thread_id, name.c_str(),
59+
NO_CORRELATION_ID, NO_CORRELATION_ID, start_ts, end_ts, CALLSTACK, ARGUMENTS,
60+
category.c_str());
61+
}
62+
63+
struct entry_key
64+
{
65+
std::string name;
66+
std::string category;
67+
68+
friend bool operator<(const entry_key& lhs, const entry_key& rhs)
69+
{
70+
if(lhs.name != rhs.name)
71+
{
72+
return lhs.name < rhs.name;
73+
}
74+
75+
return lhs.category < rhs.category;
76+
}
77+
};
78+
79+
using timestamp_t = uint64_t;
80+
81+
thread_local std::map<entry_key, timestamp_t> map_name_to_args;
82+
83+
template <typename CategoryT, typename... Args>
84+
void
85+
cache_start(const char* name)
86+
{
87+
const auto start_ts =
88+
static_cast<timestamp_t>(rocprofsys::comp::wall_clock::record());
89+
map_name_to_args[{ name, rocprofsys::trait::name<CategoryT>::value }] = start_ts;
90+
}
91+
92+
template <typename CategoryT>
93+
void
94+
cache_stop(const char* name)
95+
{
96+
entry_key key{ name, rocprofsys::trait::name<CategoryT>::value };
97+
auto x = map_name_to_args.find(key);
98+
if(x != map_name_to_args.end())
99+
{
100+
map_name_to_args.erase(key);
101+
auto timestamp = x->second;
102+
103+
const auto end_ts =
104+
static_cast<timestamp_t>(rocprofsys::comp::wall_clock::record());
105+
uint64_t thread_id = 0;
106+
107+
const auto& extended_info =
108+
rocprofsys::thread_info::get(std::this_thread::get_id());
109+
if(extended_info.has_value() && extended_info->index_data.has_value())
110+
{
111+
constexpr size_t UNKNOWN_TIME = 0;
112+
thread_id = extended_info->index_data->system_value;
113+
rocprofsys::trace_cache::get_metadata_registry().add_thread_info(
114+
{ getppid(), getpid(), thread_id, UNKNOWN_TIME, UNKNOWN_TIME, "{}" });
115+
}
116+
117+
cache_region(thread_id, name, timestamp, end_ts,
118+
rocprofsys::trait::name<CategoryT>::value);
119+
}
120+
}
121+
} // namespace
41122

42123
namespace tim
43124
{
@@ -192,6 +273,8 @@ category_region<CategoryT>::start(std::string_view name, Args&&... args)
192273
tracing::push_perfetto(CategoryT{}, name.data(), std::forward<Args>(args)...);
193274
}
194275
}
276+
277+
cache_start<CategoryT>(name.data());
195278
}
196279

197280
template <typename CategoryT>
@@ -257,6 +340,8 @@ category_region<CategoryT>::stop(std::string_view name, Args&&... args)
257340
if(get_use_causal()) causal::pop_progress_point(name);
258341
}
259342
}
343+
344+
cache_stop<CategoryT>(name.data());
260345
}
261346
else
262347
{

0 commit comments

Comments
 (0)