You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* IWYU
* Add version notes
* Fix rst parse error
* REVERTME: work around doxygen 1.14.0 bug
See doxygen/doxygen#11607, to be fixed in 1.14.1
* Improve profiling documentation
Line 2 specifies the APIs to be captured: in this case, CUDA calls, NVTX
64
+
ranges, and OS runtime libraries.
62
65
To use the NVTX ranges, you must enable the ``CELER_ENABLE_PROFILING`` variable
63
-
and use the NVTX "capture" option (lines 1 and 3). The ``celer-sim`` range in
64
-
the ``celeritas`` domain (line 4) enables profiling over the whole application.
65
-
Additional system backtracing is specified in line 5; line 6 writes (and
66
-
overwrites) to a particular output file; the final line invokes the
66
+
in addition to using the NVTX "trace" option (lines 1 and 2).
67
+
The capture domain in line 3 restricts profiling to the Celeritas application.
68
+
(You can use, e.g., ``--nvtx-capture celer-sim@celeritas`` to capture a smaller
69
+
range.)
70
+
Additional frame-pointer-based backtracing is specified in line 5; line 6
71
+
writes (and overwrites) to a particular output file; the final line invokes the
67
72
application.
68
73
69
74
On AMD hardware using the ROCProfiler_, here's an example that writes out timeline information:
@@ -103,12 +108,13 @@ the `Perfetto documentation`_. Root access on the system is required.
103
108
Integration with user applications
104
109
----------------------------------
105
110
106
-
When using a CUDA or HIP backend, there is nothing that needs to be done on the user side.
107
-
The commands shown in the previous sections can be used to profile your application. If your application
108
-
already uses NVTX, or ROCTX, you can exclude Celeritas events by excluding the "celeritas" domain.
111
+
When using a CUDA or HIP backend, **no additional code is needed in the user
112
+
application**.
113
+
The commands shown in the previous sections can be used to profile your application.
114
+
If your application already uses NVTX, or ROCTX, you can exclude Celeritas events by excluding the ``celeritas`` domain.
109
115
110
-
When using Perfetto, you need to create a ``TracingSession``
111
-
instance. The profiling session needs to be explictitly started, and will end when the object goes out of scope,
116
+
When using Perfetto for CPU profiling, you need to create a ``TracingSession``
117
+
instance. The profiling session needs to be explicitly started, and will end when the object goes out of scope,
112
118
but it can be moved to extend its lifetime.
113
119
114
120
.. sourcecode:: cpp
@@ -129,7 +135,7 @@ The system-level profiling requires starting external services. Details on how t
129
135
When the tracing session is started with a filename, the application-level profiling is used and written to the specified file.
130
136
Omitting the filename will use the system-level profiling, in which case you must have the external Perfetto tracing processes started. The container in ``scripts/docker/interactive`` provides an example Perfetto configuration for tracing both system-level and celeritas events.
131
137
132
-
As with NVTX and ROCTX, if your application already uses Perfetto, you can exclude Celeritas events by excluding the "celeritas" category.
138
+
As with NVTX and ROCTX, if your application already uses Perfetto, you can exclude Celeritas events by excluding the ``celeritas`` category.
0 commit comments