Skip to content

Commit 31d39b6

Browse files
authored
gpu: Add InstrumentedSamplingConfig to GpuCounterConfig (#5358)
Add counter_names field to GpuCounterConfig as an alternative to counter_ids for selecting counters by name. Add InstrumentedSamplingConfig with activity filtering support: - ActivityNameFilter message with per-filter NameBase enum and name_glob field for matching mangled or demangled kernel names. - TX range include/exclude globs for filtering by in-process annotations (e.g. NVTX ranges for CUDA). - ActivityRange for skip/count based sampling of matching activities. - Three-step filtering pipeline: activity name filters, TX range include/exclude, then range-based sampling. Add instrumented sampling documentation to docs/data-sources/gpu.md.
1 parent 502c730 commit 31d39b6

5 files changed

Lines changed: 338 additions & 9 deletions

File tree

docs/data-sources/gpu.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,12 @@ data_sources: {
8787

8888
`counter_period_ns` sets the desired sampling interval.
8989

90+
Alternatively, counters can be selected by name using `counter_names`. Use one
91+
or the other, not both. Not all producers support this — check
92+
`supports_counter_names` in the `GpuCounterDescriptor` data source descriptor.
93+
Glob patterns may be used in `counter_names` to match multiple counters by
94+
name; check `supports_counter_name_globs` in the descriptor for support.
95+
9096
### GPU memory
9197

9298
Total GPU memory usage per process is collected via ftrace:
@@ -167,6 +173,56 @@ data_sources: {
167173
}
168174
```
169175

176+
For more control over which GPU activities are instrumented, use
177+
`instrumented_sampling_config` instead of the `instrumented_sampling` bool.
178+
This enables a pipeline of filters applied in the following order:
179+
180+
1. **Activity name filtering**: If `activity_name_filters` is non-empty, the
181+
activity must match at least one filter. Each filter requires a `name_glob`
182+
pattern and an optional `name_base` (defaults to `MANGLED_KERNEL_NAME` if
183+
not specified). If empty, all activities pass this step.
184+
185+
2. **TX range filtering**: If `activity_tx_include_globs` is non-empty, the
186+
activity must fall within a TX range (e.g. NVTX range for CUDA) matching
187+
one of the include globs. Activities in TX ranges matching
188+
`activity_tx_exclude_globs` are excluded (excludes take precedence over
189+
includes). TX ranges can be nested, and an activity matches if any range
190+
in its nesting hierarchy matches. If both are empty, all activities pass
191+
this step.
192+
193+
3. **Range-based sampling**: If `activity_ranges` is non-empty, only
194+
activities within the specified skip/count ranges are instrumented.
195+
`skip` defaults to 0 and `count` defaults to UINT32\_MAX (all remaining
196+
activities) when not specified. If empty, all activities that passed the
197+
previous steps are instrumented.
198+
199+
Example configuration that instruments only activities with demangled kernel
200+
names matching `"myKernel*"` within TX ranges matching `"training*"`,
201+
skipping the first 10 matching activities and then instrumenting 5:
202+
203+
```
204+
data_sources: {
205+
config {
206+
name: "gpu.counters"
207+
gpu_counter_config {
208+
counter_names: "sm__cycles_elapsed.avg"
209+
counter_names: "sm__cycles_active.avg"
210+
instrumented_sampling_config {
211+
activity_name_filters {
212+
name_glob: "myKernel*"
213+
name_base: DEMANGLED_KERNEL_NAME
214+
}
215+
activity_tx_include_globs: "training*"
216+
activity_ranges {
217+
skip: 10
218+
count: 5
219+
}
220+
}
221+
}
222+
}
223+
}
224+
```
225+
170226
Counter descriptor mode 2 is recommended for GPGPU use-cases: the producer
171227
emits an `InternedGpuCounterDescriptor` referenced by IID, giving each
172228
trusted sequence its own scoped counter IDs. This avoids the global

protos/perfetto/common/gpu_counter_descriptor.proto

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,15 @@ message GpuCounterDescriptor {
100100
// command buffer.
101101
optional bool supports_instrumented_sampling = 5;
102102

103+
// optional. The producer supports selecting counters by name via
104+
// GpuCounterConfig.counter_names. Not all producers support this; Android
105+
// GPU producers typically do not.
106+
optional bool supports_counter_names = 7;
107+
108+
// optional. The producer supports glob patterns in
109+
// GpuCounterConfig.counter_names for matching multiple counters by name.
110+
optional bool supports_counter_name_globs = 8;
111+
103112
// next id: 41
104113
enum MeasureUnit {
105114
NONE = 0;

protos/perfetto/config/gpu/gpu_counter_config.proto

Lines changed: 85 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,94 @@ message GpuCounterConfig {
2222
// Desired sampling interval for counters.
2323
optional uint64 counter_period_ns = 1;
2424

25-
// List of counters to be sampled. Counter IDs correspond to the ones
26-
// described in GpuCounterSpec in the data source descriptor.
25+
// Selects which counters to sample. Use either counter_ids or counter_names,
26+
// not both. Counter IDs and names correspond to the ones described in
27+
// GpuCounterSpec in the data source descriptor.
28+
29+
// List of counter IDs to be sampled.
2730
repeated uint32 counter_ids = 2;
2831

32+
// List of counter names to be sampled. Requires producer support; check
33+
// GpuCounterDescriptor.supports_counter_names in the data source descriptor.
34+
// Glob patterns may be used to match multiple counters by name; check
35+
// GpuCounterDescriptor.supports_counter_name_globs for support.
36+
repeated string counter_names = 6;
37+
38+
// Configuration for sampling counters by instrumenting command buffers.
39+
//
40+
// When instrumented_sampling_config is used (instead of the
41+
// instrumented_sampling bool), the following steps determine whether
42+
// instrumented counters are enabled for a given GPU activity:
43+
//
44+
// 1. Activity name filtering: If activity_name_filters is non-empty, the
45+
// activity must match at least one filter. If empty, all activities
46+
// pass this step.
47+
// 2. TX range filtering: If activity_tx_include_globs is non-empty, the
48+
// activity must fall within a matching TX range. Activities in TX
49+
// ranges matching activity_tx_exclude_globs are excluded (excludes
50+
// take precedence over includes). If both are empty, all activities
51+
// pass this step.
52+
// 3. Range-based sampling: If activity_ranges is non-empty, only
53+
// activities within the specified skip/count ranges are instrumented.
54+
// If empty, all activities that passed the previous steps are
55+
// instrumented.
56+
message InstrumentedSamplingConfig {
57+
// Filters GPU activities by name. Each filter specifies a glob pattern
58+
// and the basis for matching (mangled or demangled kernel name).
59+
message ActivityNameFilter {
60+
enum NameBase {
61+
MANGLED_KERNEL_NAME = 0;
62+
DEMANGLED_KERNEL_NAME = 1;
63+
}
64+
65+
// required. Glob pattern to use for GPU activity name filtering.
66+
optional string name_glob = 1;
67+
68+
// Basis for name filtering. Defaults to MANGLED_KERNEL_NAME if not
69+
// specified.
70+
optional NameBase name_base = 2;
71+
}
72+
73+
// GPU activity name filters. An activity matches if it matches any filter.
74+
repeated ActivityNameFilter activity_name_filters = 3;
75+
76+
// Glob patterns to use for including GPU activities in TX ranges. TX
77+
// ranges are in-process annotations that mark different sections of GPU
78+
// work (e.g. NVTX ranges for CUDA). TX ranges can be nested, and an
79+
// activity is included if any range in its nesting hierarchy matches.
80+
// Only activities that fall within a matching TX range will be
81+
// instrumented.
82+
repeated string activity_tx_include_globs = 6;
83+
84+
// Glob patterns to use for excluding GPU activities from TX ranges.
85+
// TX ranges can be nested, and an activity is excluded if any range
86+
// in its nesting hierarchy matches. Excludes take precedence over
87+
// includes.
88+
repeated string activity_tx_exclude_globs = 7;
89+
90+
// Defines a range of GPU activities to instrument.
91+
message ActivityRange {
92+
// Number of GPU activities to skip before starting to instrument
93+
// command buffers. Defaults to 0 if not specified.
94+
optional uint32 skip = 1;
95+
96+
// Limit for the number of GPU activities to sample counters for by
97+
// instrumenting command buffers. Defaults to UINT32_MAX (all
98+
// remaining activities) if not specified.
99+
optional uint32 count = 2;
100+
}
101+
102+
// Ranges of GPU activities to instrument. Applied after activity name
103+
// and TX range filters. If empty, all activities that passed the
104+
// previous filters are instrumented.
105+
repeated ActivityRange activity_ranges = 5;
106+
}
107+
29108
// Sample counters by instrumenting command buffers.
30-
optional bool instrumented_sampling = 3;
109+
oneof instrumented_sampling_mode {
110+
bool instrumented_sampling = 3;
111+
InstrumentedSamplingConfig instrumented_sampling_config = 5;
112+
}
31113

32114
// Fix gpu clock rate during trace session.
33115
optional bool fix_gpu_clock = 4;

protos/perfetto/config/perfetto_config.proto

Lines changed: 94 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,15 @@ message GpuCounterDescriptor {
113113
// command buffer.
114114
optional bool supports_instrumented_sampling = 5;
115115

116+
// optional. The producer supports selecting counters by name via
117+
// GpuCounterConfig.counter_names. Not all producers support this; Android
118+
// GPU producers typically do not.
119+
optional bool supports_counter_names = 7;
120+
121+
// optional. The producer supports glob patterns in
122+
// GpuCounterConfig.counter_names for matching multiple counters by name.
123+
optional bool supports_counter_name_globs = 8;
124+
116125
// next id: 41
117126
enum MeasureUnit {
118127
NONE = 0;
@@ -1688,12 +1697,94 @@ message GpuCounterConfig {
16881697
// Desired sampling interval for counters.
16891698
optional uint64 counter_period_ns = 1;
16901699

1691-
// List of counters to be sampled. Counter IDs correspond to the ones
1692-
// described in GpuCounterSpec in the data source descriptor.
1700+
// Selects which counters to sample. Use either counter_ids or counter_names,
1701+
// not both. Counter IDs and names correspond to the ones described in
1702+
// GpuCounterSpec in the data source descriptor.
1703+
1704+
// List of counter IDs to be sampled.
16931705
repeated uint32 counter_ids = 2;
16941706

1707+
// List of counter names to be sampled. Requires producer support; check
1708+
// GpuCounterDescriptor.supports_counter_names in the data source descriptor.
1709+
// Glob patterns may be used to match multiple counters by name; check
1710+
// GpuCounterDescriptor.supports_counter_name_globs for support.
1711+
repeated string counter_names = 6;
1712+
1713+
// Configuration for sampling counters by instrumenting command buffers.
1714+
//
1715+
// When instrumented_sampling_config is used (instead of the
1716+
// instrumented_sampling bool), the following steps determine whether
1717+
// instrumented counters are enabled for a given GPU activity:
1718+
//
1719+
// 1. Activity name filtering: If activity_name_filters is non-empty, the
1720+
// activity must match at least one filter. If empty, all activities
1721+
// pass this step.
1722+
// 2. TX range filtering: If activity_tx_include_globs is non-empty, the
1723+
// activity must fall within a matching TX range. Activities in TX
1724+
// ranges matching activity_tx_exclude_globs are excluded (excludes
1725+
// take precedence over includes). If both are empty, all activities
1726+
// pass this step.
1727+
// 3. Range-based sampling: If activity_ranges is non-empty, only
1728+
// activities within the specified skip/count ranges are instrumented.
1729+
// If empty, all activities that passed the previous steps are
1730+
// instrumented.
1731+
message InstrumentedSamplingConfig {
1732+
// Filters GPU activities by name. Each filter specifies a glob pattern
1733+
// and the basis for matching (mangled or demangled kernel name).
1734+
message ActivityNameFilter {
1735+
enum NameBase {
1736+
MANGLED_KERNEL_NAME = 0;
1737+
DEMANGLED_KERNEL_NAME = 1;
1738+
}
1739+
1740+
// required. Glob pattern to use for GPU activity name filtering.
1741+
optional string name_glob = 1;
1742+
1743+
// Basis for name filtering. Defaults to MANGLED_KERNEL_NAME if not
1744+
// specified.
1745+
optional NameBase name_base = 2;
1746+
}
1747+
1748+
// GPU activity name filters. An activity matches if it matches any filter.
1749+
repeated ActivityNameFilter activity_name_filters = 3;
1750+
1751+
// Glob patterns to use for including GPU activities in TX ranges. TX
1752+
// ranges are in-process annotations that mark different sections of GPU
1753+
// work (e.g. NVTX ranges for CUDA). TX ranges can be nested, and an
1754+
// activity is included if any range in its nesting hierarchy matches.
1755+
// Only activities that fall within a matching TX range will be
1756+
// instrumented.
1757+
repeated string activity_tx_include_globs = 6;
1758+
1759+
// Glob patterns to use for excluding GPU activities from TX ranges.
1760+
// TX ranges can be nested, and an activity is excluded if any range
1761+
// in its nesting hierarchy matches. Excludes take precedence over
1762+
// includes.
1763+
repeated string activity_tx_exclude_globs = 7;
1764+
1765+
// Defines a range of GPU activities to instrument.
1766+
message ActivityRange {
1767+
// Number of GPU activities to skip before starting to instrument
1768+
// command buffers. Defaults to 0 if not specified.
1769+
optional uint32 skip = 1;
1770+
1771+
// Limit for the number of GPU activities to sample counters for by
1772+
// instrumenting command buffers. Defaults to UINT32_MAX (all
1773+
// remaining activities) if not specified.
1774+
optional uint32 count = 2;
1775+
}
1776+
1777+
// Ranges of GPU activities to instrument. Applied after activity name
1778+
// and TX range filters. If empty, all activities that passed the
1779+
// previous filters are instrumented.
1780+
repeated ActivityRange activity_ranges = 5;
1781+
}
1782+
16951783
// Sample counters by instrumenting command buffers.
1696-
optional bool instrumented_sampling = 3;
1784+
oneof instrumented_sampling_mode {
1785+
bool instrumented_sampling = 3;
1786+
InstrumentedSamplingConfig instrumented_sampling_config = 5;
1787+
}
16971788

16981789
// Fix gpu clock rate during trace session.
16991790
optional bool fix_gpu_clock = 4;

0 commit comments

Comments
 (0)