Skip to content

Add pprof profile offline symbolization tool #32496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

SangJunBak
Copy link
Contributor

@SangJunBak SangJunBak commented May 15, 2025

See commit messages for details

Motivation

Followup of #32423 and https://github.com/MaterializeInc/database-issues/issues/8908 where the debug tool scrapes the profiles and us, internally, can run this python script internally to spin up a docker container with a UI similar to pprof.me.

To easily test:

# Start the emulator with internal http port exposed 
docker rm materialized && docker run -d --name materialized -p
 6878:6878 materialize/materialized:latest

# Download the heap profile
curl -o envd.pprof.gz http://localhost:6878/prof/heap
# login 
aws sso login --profile mz-cloud-production-engineering-on-call
export AWS_PROFILE=mz-cloud-production-engineering-on-call

# Run the script and wait for "Serving web UI on http://localhost:8080"
bin/pyactivate -m materialize.visualize_pprof_profile.visualize_pprof_profile envd.pprof.gz

Open to suggestions given it's my first time writing a script like this!

Note: We still need to implement CPU profiling in an unsymbolized format then backport it into LTS. We only have memory profiles.

We also have to figure out why the latest binary for the emulator is like over a gigabyte if they're supposed to be stripped 😅

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@SangJunBak SangJunBak requested review from def- and teskje May 15, 2025 01:34
@SangJunBak SangJunBak marked this pull request as ready for review May 15, 2025 01:44
@SangJunBak SangJunBak requested a review from a team as a code owner May 15, 2025 01:44
@SangJunBak SangJunBak force-pushed the jun/#8908/symbolize-offline branch from 0dc3c62 to 3d04676 Compare May 15, 2025 01:45
@@ -36,6 +36,7 @@ pdoc==15.0.3
# We can revert back to standard pg8000 versions once https://github.com/tlocke/pg8000/pull/161 is released
pg8000@git+https://github.com/tlocke/pg8000@46c00021ade1d19466b07ed30392386c5f0a6b8e
prettytable==3.16.0
protobuf==5.29.3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically used this version since that's the version used in profile.proto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be auto-upgraded by dependabot. Is that a problem?


```bash
# Set up AWS credentials
export AWS_PROFILE=mz-cloud-production-engineering-on-call
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for me, this is the only AWS profile with access to that bucket. For some reason, mz-core-admin doesn't work. Let me know if I should change it but I assume this is the same for most and although it's kinda suspicious using an overprivileged profile, I'd rather have one that works than not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some devs won't have access to this profile. We should probably make the S3 bucket more available.

@@ -0,0 +1,233 @@
// Copyright 2016 Google Inc. All Rights Reserved.
Copy link
Contributor Author

@SangJunBak SangJunBak May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in the readme, this is received from a CURL. Not sure if we need to do something to account for this apache license.

A potential worry is the pprof.gz provided doesn't match this proto, but given protobufs are supposed to be backwards compatible and because this .proto file (one used for pprof files) is most likely stable, I thought it'd be okay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticing this is failing the check-copyright lint test. I can fix this via changing the header to

// Copyright Materialize, Inc. and contributors. All rights reserved.
//
// Use of this software is governed by the Business Source License
// included in the LICENSE file.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0.

but this seems quite suspicious to just override a copyright header like that. Any advice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an ignore for the copyright check in the linters.

@SangJunBak SangJunBak force-pushed the jun/#8908/symbolize-offline branch 2 times, most recently from 647f017 to 6bbea6f Compare May 15, 2025 01:53
@SangJunBak
Copy link
Contributor Author

SangJunBak commented May 15, 2025

Would appreciate a review from @def- since you know this code the best and @teskje for an optional stakeholder/code review!

- Introduced a new tool for symbolizing and visualizing pprof profiles using debug symbols from S3.
- Updated `requirements.txt` to include `protobuf==5.29.3`.
@SangJunBak SangJunBak force-pushed the jun/#8908/symbolize-offline branch from 6bbea6f to fa2e919 Compare May 15, 2025 03:06
@def- def- force-pushed the jun/#8908/symbolize-offline branch from 3ee9379 to e352783 Compare May 15, 2025 03:41
Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm a bit confused of what the purpose of this is. Is it just doing the same as the instructions in "Analyzing pprof for a release build" section of https://www.notion.so/materialize/analyzing-pprof-for-a-release-build-3fa5a68aef994d90b3c94bca6eea4da8 ? That always worked well for me, I'm not sure if we need another tool for it.


# macOS
bin/pyactivate -m ci.deploy_mz-debug.macos

# Linux
bin/pyactivate -m ci.deploy_mz-debug.linux
```
where AWS_PROFILE is the profile with access to the materialize-binaries S3 bucket in the Materialize Core account.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think devs have access to the Materialize Core account normally? At least I don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I always used to use is https://debuginfo.dev.materialize.com/buildid/d6a8b86c62ce2b7a6fd146a048ebba77/executable, which is publicly available. I think that's easier instead of using S3.

)

# Install graphviz in the container
subprocess.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be a Dockerfile?

@SangJunBak
Copy link
Contributor Author

SangJunBak commented May 15, 2025

Maybe I'm a bit confused of what the purpose of this is. Is it just doing the same as the instructions in "Analyzing pprof for a release build" section of notion.so/materialize/analyzing-pprof-for-a-release-build-3fa5a68aef994d90b3c94bca6eea4da8 ? That always worked well for me, I'm not sure if we need another tool for it.

@def- oh no way I didn't know this existed!

I think if the public URL didn't exist, I'd argue for a tool. But given we'll most likely need a doc that points to this tool after someone generates the mz-debug zip anyways, it makes sense to close this in favor of the doc.

Another thing that isn't clear is the binaries in the debuginfo s3 bucket don't seem stripped (i.e. materilaized:latest's binary is over a gigabyte) but an extra step that the guide doesn't account for is not only supplying the path to the $PPROF_BINARY_PATH/buildid/binary, but also the debuginfo $PPROF_BINARY_PATH/debuginfo.debug. Regardless, we can probably add this step as a code block in the notion doc.

@SangJunBak
Copy link
Contributor Author

SangJunBak commented May 15, 2025

On second thought: I think a good meet in the middle solution would be to generate a bash script per profile in the debug tool and expect users (us when debugging internally) to have pprof, graphviz, and curl installed. That way users can just run the script and don't have to look up a notion doc.

I have an ongoing notion doc for using the debug tool and I'll be sure to document the script as well as the manual way (the notion doc you posted). I'll create a separate PR for this and close this one out!

@SangJunBak SangJunBak closed this May 15, 2025
@def-
Copy link
Contributor

def- commented May 15, 2025

One thing that would be nice in the script is to be able to pass all kinds of parameters to pprof that it supports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants