-
Notifications
You must be signed in to change notification settings - Fork 247
test: add vectorizer benchmark tools #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
958bc9e
to
1927017
Compare
1927017
to
526f1d2
Compare
526f1d2
to
5991277
Compare
5991277
to
19479e3
Compare
19479e3
to
aea31b9
Compare
aea31b9
to
57345b3
Compare
57345b3
to
92f9dae
Compare
92f9dae
to
e3be49b
Compare
e3be49b
to
1f73c73
Compare
projects/pgai/.python-version
Outdated
@@ -1 +1 @@ | |||
3.10 | |||
3.12.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reverting it back, I updated it for a test.
We are using 3.12 for the docker container and I wanted the same,
1f73c73
to
7088078
Compare
7088078
to
b3b1788
Compare
New benchmark just recipes for the vectorizer: - `just pgai benchmark-mem`: runs the vectorizer on a queue of 500 items, generates a memray profile, and a memory usage flamegraph. - `just pgai benchmark-cpu`: runs the vectorizer on a queue of 500 items, generates a CPU usage flamegraph using py-spy. - `just pgai benchmark-cpu-top`: runs the vectorizer on a queue of 500 items, displays a top like interface of CPU usage. - `just pgai benchmark-queue-count`: shows the queue count of the running benchmark. Should be executed in a separate terminal. The count is updated in an interval. The benchmark DB uses the wiki.dump stored in the repository, creates an openAI vectorizer, and runs the benchmark tool storing the results in `projects/pgai/benchmark/results`. There's a new command for the cli `pgai vectorizer worker-benchmark`. It's the same as the regular worker, but it's wrapped in a vcr client that replays openAI requests. This command is used by the CPU benchmarks in order to have constant request/response times to the API, and have more deterministic time results. The cassette file is tracked using git-lfs. Memory benchmarks don't use the vcr wrapped command because they pollute VCR will use considerable memory to handle the API calls.
b3b1788
to
27df05f
Compare
@JamesGuthrie I moved the part of vcr to the benchmark repo, and updated the recipes and create_vectorizer.sql to be more customizable. |
New benchmark just recipes for the vectorizer:
just pgai benchmark-mem
: runs the vectorizer on a queue of 500items, generates a memray profile, and a memory usage flamegraph.
just pgai benchmark-cpu
: runs the vectorizer on a queue of 500items, generates a CPU usage flamegraph using py-spy.
just pgai benchmark-cpu-top
: runs the vectorizer on a queue of 500items, displays a top like interface of CPU usage.
just pgai benchmark-queue-count
: shows the queue count of therunning benchmark. Should be executed in a separate terminal. The
count is updated in an interval.
The benchmark DB uses the wiki.dump stored in the repository, creates an
openAI vectorizer, and runs the benchmark tool storing the results in
projects/pgai/benchmark/results
.There's a new command for the cli
pgai vectorizer worker-benchmark
.It's the same as the regular worker, but it's wrapped in a vcr client
that replays openAI requests. This command is used by the CPU benchmarks
in order to have constant request/response times to the API, and have
more deterministic time results. The cassette file is tracked using
git-lfs.
Memory benchmarks don't use the vcr wrapped command because they pollute
VCR will use considerable memory to handle the API calls.
https://www.loom.com/share/5538c8fa4bb243d88a246ca6b82e7cc7?sid=89bf1ed6-8def-4e6e-8fb2-711f42660b7f
https://www.loom.com/share/733fca2f5008460f9d763a72544db5fb?sid=e56f1a72-69f3-41c8-9b3f-36f859f556fc