test: add vectorizer benchmark tools #570

alejandrodnm · 2025-03-18T16:18:23Z

New benchmark just recipes for the vectorizer:

just pgai benchmark-mem: runs the vectorizer on a queue of 500
items, generates a memray profile, and a memory usage flamegraph.
just pgai benchmark-cpu: runs the vectorizer on a queue of 500
items, generates a CPU usage flamegraph using py-spy.
just pgai benchmark-cpu-top: runs the vectorizer on a queue of 500
items, displays a top like interface of CPU usage.
just pgai benchmark-queue-count: shows the queue count of the
running benchmark. Should be executed in a separate terminal. The
count is updated in an interval.

The benchmark DB uses the wiki.dump stored in the repository, creates an
openAI vectorizer, and runs the benchmark tool storing the results in
projects/pgai/benchmark/results.

There's a new command for the cli pgai vectorizer worker-benchmark.
It's the same as the regular worker, but it's wrapped in a vcr client
that replays openAI requests. This command is used by the CPU benchmarks
in order to have constant request/response times to the API, and have
more deterministic time results. The cassette file is tracked using
git-lfs.

Memory benchmarks don't use the vcr wrapped command because they pollute
VCR will use considerable memory to handle the API calls.

https://www.loom.com/share/5538c8fa4bb243d88a246ca6b82e7cc7?sid=89bf1ed6-8def-4e6e-8fb2-711f42660b7f

https://www.loom.com/share/733fca2f5008460f9d763a72544db5fb?sid=e56f1a72-69f3-41c8-9b3f-36f859f556fc

JamesGuthrie · 2025-04-11T07:57:32Z

projects/pgai/.python-version

@@ -1 +1 @@
-3.10
+3.12.9


Why did this change?

I'm reverting it back, I updated it for a test.

We are using 3.12 for the docker container and I wanted the same,

projects/pgai/pgai/cli.py

alejandrodnm · 2025-04-11T14:32:19Z

@JamesGuthrie I moved the part of vcr to the benchmark repo, and updated the recipes and create_vectorizer.sql to be more customizable.

New benchmark just recipes for the vectorizer: - `just pgai benchmark-mem`: runs the vectorizer on a queue of 500 items, generates a memray profile, and a memory usage flamegraph. - `just pgai benchmark-cpu`: runs the vectorizer on a queue of 500 items, generates a CPU usage flamegraph using py-spy. - `just pgai benchmark-cpu-top`: runs the vectorizer on a queue of 500 items, displays a top like interface of CPU usage. - `just pgai benchmark-queue-count`: shows the queue count of the running benchmark. Should be executed in a separate terminal. The count is updated in an interval. The benchmark DB uses the wiki.dump stored in the repository, creates an openAI vectorizer, and runs the benchmark tool storing the results in `projects/pgai/benchmark/results`. There's a new command for the cli `pgai vectorizer worker-benchmark`. It's the same as the regular worker, but it's wrapped in a vcr client that replays openAI requests. This command is used by the CPU benchmarks in order to have constant request/response times to the API, and have more deterministic time results. The cassette file is tracked using git-lfs. Memory benchmarks don't use the vcr wrapped command because they pollute VCR will use considerable memory to handle the API calls.

alejandrodnm temporarily deployed to internal-contributors March 18, 2025 16:18 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 958bc9e to 1927017 Compare March 18, 2025 16:21

alejandrodnm temporarily deployed to internal-contributors March 18, 2025 16:22 — with GitHub Actions Inactive

alejandrodnm marked this pull request as ready for review March 18, 2025 16:28

alejandrodnm requested a review from a team as a code owner March 18, 2025 16:28

alejandrodnm force-pushed the adn/benchmark branch from 1927017 to 526f1d2 Compare March 18, 2025 17:18

alejandrodnm temporarily deployed to internal-contributors March 18, 2025 17:18 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 526f1d2 to 5991277 Compare March 18, 2025 17:19

alejandrodnm temporarily deployed to internal-contributors March 18, 2025 17:19 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 5991277 to 19479e3 Compare March 19, 2025 07:32

alejandrodnm temporarily deployed to internal-contributors March 19, 2025 07:32 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 19479e3 to aea31b9 Compare March 19, 2025 10:30

alejandrodnm temporarily deployed to internal-contributors March 19, 2025 10:31 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from aea31b9 to 57345b3 Compare March 19, 2025 12:09

alejandrodnm temporarily deployed to internal-contributors March 19, 2025 12:09 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 57345b3 to 92f9dae Compare March 20, 2025 10:54

alejandrodnm temporarily deployed to internal-contributors March 20, 2025 10:54 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 92f9dae to e3be49b Compare April 10, 2025 17:21

alejandrodnm temporarily deployed to internal-contributors April 10, 2025 17:21 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from e3be49b to 1f73c73 Compare April 10, 2025 17:29

alejandrodnm temporarily deployed to internal-contributors April 10, 2025 17:29 — with GitHub Actions Inactive

JamesGuthrie reviewed Apr 11, 2025

View reviewed changes

alejandrodnm force-pushed the adn/benchmark branch from 1f73c73 to 7088078 Compare April 11, 2025 13:57

alejandrodnm temporarily deployed to internal-contributors April 11, 2025 13:57 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 7088078 to b3b1788 Compare April 11, 2025 14:28

alejandrodnm temporarily deployed to internal-contributors April 11, 2025 14:28 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from b3b1788 to 27df05f Compare April 11, 2025 14:30

alejandrodnm temporarily deployed to internal-contributors April 11, 2025 14:31 — with GitHub Actions Inactive

alejandrodnm requested a review from JamesGuthrie April 11, 2025 15:22

JamesGuthrie approved these changes May 5, 2025

View reviewed changes

alejandrodnm force-pushed the adn/benchmark branch from 27df05f to 7f0be23 Compare May 8, 2025 11:24

alejandrodnm temporarily deployed to internal-contributors May 8, 2025 11:24 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from 7f0be23 to d72a042 Compare May 8, 2025 12:16

alejandrodnm temporarily deployed to internal-contributors May 8, 2025 12:16 — with GitHub Actions Inactive

alejandrodnm force-pushed the adn/benchmark branch from d72a042 to 52f6210 Compare May 13, 2025 12:25

alejandrodnm temporarily deployed to internal-contributors May 13, 2025 12:25 — with GitHub Actions Inactive

alejandrodnm merged commit fa70d83 into main May 13, 2025
15 checks passed

alejandrodnm deleted the adn/benchmark branch May 13, 2025 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add vectorizer benchmark tools #570

test: add vectorizer benchmark tools #570

Uh oh!

alejandrodnm commented Mar 18, 2025 •

edited

Loading

Uh oh!

JamesGuthrie Apr 11, 2025

Uh oh!

alejandrodnm Apr 11, 2025

Uh oh!

Uh oh!

alejandrodnm commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -1 +1 @@
		3.10
		3.12.9

test: add vectorizer benchmark tools #570

test: add vectorizer benchmark tools #570

Uh oh!

Conversation

alejandrodnm commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JamesGuthrie Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

alejandrodnm Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alejandrodnm commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

alejandrodnm commented Mar 18, 2025 •

edited

Loading