Skip to content

Commit 3d04676

Browse files
committed
Add pprof profile visualization tool
- Introduced a new tool for symbolizing and visualizing pprof profiles using debug symbols from S3. - Updated `requirements.txt` to include `protobuf==5.29.3`.
1 parent eb41914 commit 3d04676

File tree

6 files changed

+531
-0
lines changed

6 files changed

+531
-0
lines changed

ci/builder/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ pdoc==15.0.3
3636
# We can revert back to standard pg8000 versions once https://github.com/tlocke/pg8000/pull/161 is released
3737
pg8000@git+https://github.com/tlocke/pg8000@46c00021ade1d19466b07ed30392386c5f0a6b8e
3838
prettytable==3.16.0
39+
protobuf==5.29.3
3940
psutil==7.0.0
4041
# psycopg 3.2.8 causes Scalability test failures
4142
psycopg==3.2.7

ci/deploy_mz-debug/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,15 @@ You can manually deploy by following steps 1-2 above and running the following c
2222
```bash
2323
# Set a tag version.
2424
export BUILDKITE_TAG=mz-debug-vx.y.z
25+
export AWS_PROFILE=...
2526

2627
# macOS
2728
bin/pyactivate -m ci.deploy_mz-debug.macos
2829

2930
# Linux
3031
bin/pyactivate -m ci.deploy_mz-debug.linux
3132
```
33+
where AWS_PROFILE is the profile with access to the materialize-binaries S3 bucket in the Materialize Core account.
3234

3335
**Important Notes:**
3436
- When running on macOS, modify `linux.py` to use `target` instead of `target-xcompile`
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Materialize pprof Profile Visualization Tool
2+
3+
This tool allows you to symbolize and visualize pprof profiles offline using debug symbols stored in Materialize's S3 bucket.
4+
5+
## Prerequisites
6+
7+
- Docker installed and running on your system
8+
- AWS credentials with access to the materialize-debuginfo S3 bucket in the Materialize Core account
9+
10+
## Setup
11+
12+
1. Set up your AWS credentials:
13+
```bash
14+
export AWS_PROFILE=<your-profile>
15+
```
16+
Where `<your-profile>` is your AWS profile with access to the materialize-debuginfo S3 bucket in the Materialize Core account.
17+
18+
## Usage
19+
20+
```bash
21+
python3 visualize_pprof_profile.py <path-to-profile> [--port PORT]
22+
```
23+
24+
### Arguments
25+
26+
- `<path-to-profile>`: Path to your pprof.gz profile file (required)
27+
- `--port`: Port number to run the pprof web UI (optional, defaults to 8080)
28+
29+
## How It Works
30+
31+
1. The tool reads your pprof profile and extracts the build ID
32+
2. It automatically fetches the corresponding debug symbols from S3
33+
3. Creates a Docker container with the necessary tools (pprof, graphviz)
34+
4. Starts a web UI where you can analyze the profile
35+
36+
## Important Notes
37+
38+
- The web UI will be available at `http://localhost:<port>` (default: http://localhost:8080)
39+
- Initial symbolization might take a few moments - wait until you see "Serving web UI on http://localhost:8080" message
40+
- The Docker container continues running even after you quit the program
41+
- The profile.proto file is sourced from: https://raw.githubusercontent.com/google/pprof/main/proto/profile.proto
42+
43+
## Example
44+
45+
```bash
46+
# Set up AWS credentials
47+
export AWS_PROFILE=mz-cloud-production-engineering-on-call
48+
49+
# Run the visualization tool
50+
python3 visualize_pprof_profile.py /path/to/your/profile.pprof.gz
51+
```
52+
53+
After running the command, open your web browser and navigate to http://localhost:8080 to view the profile visualization.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
// Copyright 2016 Google Inc. All Rights Reserved.
2+
//
3+
// Licensed under the Apache License, Version 2.0 (the "License");
4+
// you may not use this file except in compliance with the License.
5+
// You may obtain a copy of the License at
6+
//
7+
// http://www.apache.org/licenses/LICENSE-2.0
8+
//
9+
// Unless required by applicable law or agreed to in writing, software
10+
// distributed under the License is distributed on an "AS IS" BASIS,
11+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
// See the License for the specific language governing permissions and
13+
// limitations under the License.
14+
15+
// Profile is a common stacktrace profile format.
16+
//
17+
// Measurements represented with this format should follow the
18+
// following conventions:
19+
//
20+
// - Consumers should treat unset optional fields as if they had been
21+
// set with their default value.
22+
//
23+
// - When possible, measurements should be stored in "unsampled" form
24+
// that is most useful to humans. There should be enough
25+
// information present to determine the original sampled values.
26+
//
27+
// - On-disk, the serialized proto must be gzip-compressed.
28+
//
29+
// - The profile is represented as a set of samples, where each sample
30+
// references a sequence of locations, and where each location belongs
31+
// to a mapping.
32+
// - There is a N->1 relationship from sample.location_id entries to
33+
// locations. For every sample.location_id entry there must be a
34+
// unique Location with that id.
35+
// - There is an optional N->1 relationship from locations to
36+
// mappings. For every nonzero Location.mapping_id there must be a
37+
// unique Mapping with that id.
38+
39+
syntax = "proto3";
40+
41+
package perftools.profiles;
42+
43+
option java_package = "com.google.perftools.profiles";
44+
option java_outer_classname = "ProfileProto";
45+
46+
message Profile {
47+
// A description of the samples associated with each Sample.value.
48+
// For a cpu profile this might be:
49+
// [["cpu","nanoseconds"]] or [["wall","seconds"]] or [["syscall","count"]]
50+
// For a heap profile, this might be:
51+
// [["allocations","count"], ["space","bytes"]],
52+
// If one of the values represents the number of events represented
53+
// by the sample, by convention it should be at index 0 and use
54+
// sample_type.unit == "count".
55+
repeated ValueType sample_type = 1;
56+
// The set of samples recorded in this profile.
57+
repeated Sample sample = 2;
58+
// Mapping from address ranges to the image/binary/library mapped
59+
// into that address range. mapping[0] will be the main binary.
60+
repeated Mapping mapping = 3;
61+
// Locations referenced by samples.
62+
repeated Location location = 4;
63+
// Functions referenced by locations.
64+
repeated Function function = 5;
65+
// A common table for strings referenced by various messages.
66+
// string_table[0] must always be "".
67+
repeated string string_table = 6;
68+
// frames with Function.function_name fully matching the following
69+
// regexp will be dropped from the samples, along with their successors.
70+
int64 drop_frames = 7; // Index into string table.
71+
// frames with Function.function_name fully matching the following
72+
// regexp will be kept, even if it matches drop_frames.
73+
int64 keep_frames = 8; // Index into string table.
74+
75+
// The following fields are informational, do not affect
76+
// interpretation of results.
77+
78+
// Time of collection (UTC) represented as nanoseconds past the epoch.
79+
int64 time_nanos = 9;
80+
// Duration of the profile, if a duration makes sense.
81+
int64 duration_nanos = 10;
82+
// The kind of events between sampled occurrences.
83+
// e.g [ "cpu","cycles" ] or [ "heap","bytes" ]
84+
ValueType period_type = 11;
85+
// The number of events between sampled occurrences.
86+
int64 period = 12;
87+
// Free-form text associated with the profile. The text is displayed as is
88+
// to the user by the tools that read profiles (e.g. by pprof). This field
89+
// should not be used to store any machine-readable information, it is only
90+
// for human-friendly content. The profile must stay functional if this field
91+
// is cleaned.
92+
repeated int64 comment = 13; // Indices into string table.
93+
// Index into the string table of the type of the preferred sample
94+
// value. If unset, clients should default to the last sample value.
95+
int64 default_sample_type = 14;
96+
// Documentation link for this profile. The URL must be absolute,
97+
// e.g., http://pprof.example.com/cpu-profile.html
98+
//
99+
// The URL may be missing if the profile was generated by older code or code
100+
// that did not bother to supply a link.
101+
int64 doc_url = 15; // Index into string table.
102+
}
103+
104+
// ValueType describes the semantics and measurement units of a value.
105+
message ValueType {
106+
int64 type = 1; // Index into string table.
107+
int64 unit = 2; // Index into string table.
108+
}
109+
110+
// Each Sample records values encountered in some program
111+
// context. The program context is typically a stack trace, perhaps
112+
// augmented with auxiliary information like the thread-id, some
113+
// indicator of a higher level request being handled etc.
114+
message Sample {
115+
// The ids recorded here correspond to a Profile.location.id.
116+
// The leaf is at location_id[0].
117+
repeated uint64 location_id = 1;
118+
// The type and unit of each value is defined by the corresponding
119+
// entry in Profile.sample_type. All samples must have the same
120+
// number of values, the same as the length of Profile.sample_type.
121+
// When aggregating multiple samples into a single sample, the
122+
// result has a list of values that is the element-wise sum of the
123+
// lists of the originals.
124+
repeated int64 value = 2;
125+
// label includes additional context for this sample. It can include
126+
// things like a thread id, allocation size, etc.
127+
//
128+
// NOTE: While possible, having multiple values for the same label key is
129+
// strongly discouraged and should never be used. Most tools (e.g. pprof) do
130+
// not have good (or any) support for multi-value labels. And an even more
131+
// discouraged case is having a string label and a numeric label of the same
132+
// name on a sample. Again, possible to express, but should not be used.
133+
repeated Label label = 3;
134+
}
135+
136+
message Label {
137+
// Index into string table. An annotation for a sample (e.g.
138+
// "allocation_size") with an associated value.
139+
// Keys with "pprof::" prefix are reserved for internal use by pprof.
140+
int64 key = 1;
141+
142+
// At most one of the following must be present
143+
int64 str = 2; // Index into string table
144+
int64 num = 3;
145+
146+
// Should only be present when num is present.
147+
// Specifies the units of num.
148+
// Use arbitrary string (for example, "requests") as a custom count unit.
149+
// If no unit is specified, consumer may apply heuristic to deduce the unit.
150+
// Consumers may also interpret units like "bytes" and "kilobytes" as memory
151+
// units and units like "seconds" and "nanoseconds" as time units,
152+
// and apply appropriate unit conversions to these.
153+
int64 num_unit = 4; // Index into string table
154+
}
155+
156+
message Mapping {
157+
// Unique nonzero id for the mapping.
158+
uint64 id = 1;
159+
// Address at which the binary (or DLL) is loaded into memory.
160+
uint64 memory_start = 2;
161+
// The limit of the address range occupied by this mapping.
162+
uint64 memory_limit = 3;
163+
// Offset in the binary that corresponds to the first mapped address.
164+
uint64 file_offset = 4;
165+
// The object this entry is loaded from. This can be a filename on
166+
// disk for the main binary and shared libraries, or virtual
167+
// abstractions like "[vdso]".
168+
int64 filename = 5; // Index into string table
169+
// A string that uniquely identifies a particular program version
170+
// with high probability. E.g., for binaries generated by GNU tools,
171+
// it could be the contents of the .note.gnu.build-id field.
172+
int64 build_id = 6; // Index into string table
173+
174+
// The following fields indicate the resolution of symbolic info.
175+
bool has_functions = 7;
176+
bool has_filenames = 8;
177+
bool has_line_numbers = 9;
178+
bool has_inline_frames = 10;
179+
}
180+
181+
// Describes function and line table debug information.
182+
message Location {
183+
// Unique nonzero id for the location. A profile could use
184+
// instruction addresses or any integer sequence as ids.
185+
uint64 id = 1;
186+
// The id of the corresponding profile.Mapping for this location.
187+
// It can be unset if the mapping is unknown or not applicable for
188+
// this profile type.
189+
uint64 mapping_id = 2;
190+
// The instruction address for this location, if available. It
191+
// should be within [Mapping.memory_start...Mapping.memory_limit]
192+
// for the corresponding mapping. A non-leaf address may be in the
193+
// middle of a call instruction. It is up to display tools to find
194+
// the beginning of the instruction if necessary.
195+
uint64 address = 3;
196+
// Multiple line indicates this location has inlined functions,
197+
// where the last entry represents the caller into which the
198+
// preceding entries were inlined.
199+
//
200+
// E.g., if memcpy() is inlined into printf:
201+
// line[0].function_name == "memcpy"
202+
// line[1].function_name == "printf"
203+
repeated Line line = 4;
204+
// Provides an indication that multiple symbols map to this location's
205+
// address, for example due to identical code folding by the linker. In that
206+
// case the line information above represents one of the multiple
207+
// symbols. This field must be recomputed when the symbolization state of the
208+
// profile changes.
209+
bool is_folded = 5;
210+
}
211+
212+
message Line {
213+
// The id of the corresponding profile.Function for this line.
214+
uint64 function_id = 1;
215+
// Line number in source code.
216+
int64 line = 2;
217+
// Column number in source code.
218+
int64 column = 3;
219+
}
220+
221+
message Function {
222+
// Unique nonzero id for the function.
223+
uint64 id = 1;
224+
// Name of the function, in human-readable form if available.
225+
int64 name = 2; // Index into string table
226+
// Name of the function, as identified by the system.
227+
// For instance, it can be a C++ mangled name.
228+
int64 system_name = 3; // Index into string table
229+
// Source file containing the function.
230+
int64 filename = 4; // Index into string table
231+
// Line number in source file.
232+
int64 start_line = 5;
233+
}

misc/python/materialize/visualize_pprof_profile/profile_pb2.py

+49
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)