Skip to content

Add metadata benchmarks#1055

Open
turan18 wants to merge 1 commit into
awslabs:mainfrom
turan18:add_metadata_benchmarks
Open

Add metadata benchmarks#1055
turan18 wants to merge 1 commit into
awslabs:mainfrom
turan18:add_metadata_benchmarks

Conversation

@turan18
Copy link
Copy Markdown
Contributor

@turan18 turan18 commented Jan 29, 2024

Issue #, if available:

Description of changes:

Add benchmark tests that benchmark metadata DB insertion performance. Added a helper function to generate random TAR file (TOC) with given number of files/entries.

Testing performed:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@turan18 turan18 force-pushed the add_metadata_benchmarks branch 15 times, most recently from c7bfdda to d5bba30 Compare February 5, 2024 16:48
@turan18 turan18 marked this pull request as ready for review February 5, 2024 16:53
@turan18 turan18 requested a review from a team as a code owner February 5, 2024 16:53
Comment thread util/testutil/util.go Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File names must be random since we use them to traverse the fs tree when creating the DB and so RandString does not use our seeded random. We still use a fixed size, so there shouldn't be any variance between runs.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this statement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we used the seeded random all our filenames would be the same and the only thing differentiating them would be depth level (eg: file vs file/file vs file/file/file).

When looping through the TOC we maintain a map of node ID to metadata entry. The metadata entry has a map of the nodes children where the child name is the key. When we are adding children to the metadata entry of a node we will end up overwriting any existing children if they share the same name, which never happens in practice since you cannot multiple children nodes/files with the same name under a single parent node/directory. This means that our metadata/nodes bucket will not be fully populated, since a parent can really only have 1 child.

To avoid this, we use rand so we can get an actual pseudo random string. We still use a fixed length of 10 for the filename, to ensure their isn't any variance in bbolt write performance between benchmark runs. (bbolt doesn't care about the content of a KV pair since they are just interpreted as byte slices; the length, however, does matter since it controls how nodes/pages are split before writing to disk).

@turan18 turan18 force-pushed the add_metadata_benchmarks branch from d5bba30 to df9ec63 Compare February 7, 2024 14:26
Comment thread metadata/reader_test.go Outdated
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Since we want to measure write performance to disk, we have to write to non tmpfs location

@turan18 turan18 force-pushed the add_metadata_benchmarks branch from df9ec63 to 8f8eb12 Compare February 7, 2024 14:36
Copy link
Copy Markdown
Contributor

@sondavidb sondavidb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, a lot of minor changes that I just want a bit of attention on before approving. Overall the functionality looks great and I think it's a pretty cool addition to our testing suite.

Comment thread metadata/reader_test.go Outdated
Comment thread metadata/reader_test.go Outdated
Comment thread metadata/reader_test.go Outdated
Comment thread metadata/reader_test.go Outdated
Comment thread metadata/util_test.go Outdated
Comment thread metadata/reader_test.go Outdated
@turan18 turan18 force-pushed the add_metadata_benchmarks branch 2 times, most recently from f2d02b9 to 692b984 Compare February 13, 2024 21:36
Comment thread metadata/reader_test.go Outdated
Comment thread util/testutil/util.go Outdated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this statement.

Comment thread util/testutil/tar.go Outdated
@turan18 turan18 force-pushed the add_metadata_benchmarks branch from 692b984 to 86bf1fa Compare February 14, 2024 02:00
Add benchmarks functions that benchmark sequential and concurrent
writes to the underlying metadata db.

Signed-off-by: Yasin Turan <turyasin@amazon.com>
@turan18 turan18 force-pushed the add_metadata_benchmarks branch from 86bf1fa to 9ffd6e7 Compare February 16, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants