Skip to content

Use Bytes instead of Vec<u8> to reduce memory allocations and improve performance #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

lonless9
Copy link
Contributor

@lonless9 lonless9 commented Apr 4, 2025

This PR contains serval commits that replace Vec with Bytes to reduce memory allocations and improve performance.

Optimization Rationale

In the original code, extensive use of Vec leads to frequent memory allocations and copying operations, especially in key-value storage systems where these operations occur frequently. By using Bytes, we can:

  • Reduce unnecessary memory allocations
  • Use reference counting instead of deep copying
  • Improve mainly iteration, batch write and compaction.

@dermesser
Copy link
Owner

That's amazing! In the write-a-lot example, do you also have numbers regarding the throughput? How many writes per second are enabled by this change compared to the status quo?

@lonless9
Copy link
Contributor Author

That's amazing! In the write-a-lot example, do you also have numbers regarding the throughput? How many writes per second are enabled by this change compared to the status quo?

I am writing some Criterion based tests to explain these behaviors — a proper evaluation metric might be needed to measure the following series of changes.

The content in the screenshot above was compiled in debug mode, so there are significant differences shown. Although the compiler has drastically reduced the differences between them, I still get some positive feedback while writing the test.

This PR is actually a rather superficial attempt. In fact, the data flow involving blockContent, various iterators, internal and memtable are all involved in this issue. This change is suit for those remain unchanged ones written.

I need to write a test suite involving different kinds of reads, writes, and whether through snap, etc.

But at least now I see the most obvious change is reflected in Iter. Perhaps it will take some time to understand how the data flows, which changes are unnecessary, and to improve the test suit.

Benchmarking DB_Iteration/Items1000_Snappy: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 21.1s, or reduce sample count to 20.
DB_Iteration/Items1000_Snappy
                        time:   [450.56 µs 464.12 µs 480.16 µs]
                        thrpt:  [2.0827 Melem/s 2.1546 Melem/s 2.2195 Melem/s]
                 change:
                        time:   [−37.185% −34.622% −31.818%] (p = 0.00 < 0.05)
                        thrpt:  [+46.666% +52.956% +59.197%]
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
DB_Iteration/Items10000_None
                        time:   [4.9273 ms 5.0103 ms 5.1019 ms]
                        thrpt:  [1.9601 Melem/s 1.9959 Melem/s 2.0295 Melem/s]
                 change:
                        time:   [−31.101% −28.893% −26.712%] (p = 0.00 < 0.05)
                        thrpt:  [+36.447% +40.633% +45.139%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

@lonless9
Copy link
Contributor Author

lonless9 commented Jun 2, 2025

Performance Improvements

DB_WriteBatch/Batch1000_Snappy
time: [1.2751 ms 1.3000 ms 1.3299 ms]
thrpt: [751.92 Kelem/s 769.25 Kelem/s 784.25 Kelem/s]
change:
time: [−17.295% −12.279% −7.1801%] (p = 0.00 < 0.05)
thrpt: [+7.7356% +13.998% +20.912%]
Performance has improved.

DB_Iteration/Items1000_None
time: [551.40 µs 575.29 µs 608.49 µs]
thrpt: [1.6434 Melem/s 1.7382 Melem/s 1.8136 Melem/s]
change:
time: [−28.737% −21.714% −12.548%] (p = 0.00 < 0.05)
thrpt: [+14.348% +27.736% +40.326%]
Performance has improved.

DB_Iteration/Items1000_Snappy
time: [547.03 µs 1.2029 ms 2.6748 ms]
thrpt: [373.87 Kelem/s 831.29 Kelem/s 1.8281 Melem/s]
change:
time: [−35.759% +10.912% +98.954%] (p = 0.83 > 0.05)
thrpt: [−49.737% −9.8385% +55.665%]
No change in performance detected.

DB_Iteration/Items10000_None
time: [5.0404 ms 6.2258 ms 7.9744 ms]
thrpt: [1.2540 Melem/s 1.6062 Melem/s 1.9840 Melem/s]
change:
time: [−28.330% −11.980% +11.367%] (p = 0.34 > 0.05)
thrpt: [−10.207% +13.611% +39.529%]
No change in performance detected.

DB_Iteration/Items10000_Snappy
time: [4.9483 ms 5.0085 ms 5.0752 ms]
thrpt: [1.9704 Melem/s 1.9966 Melem/s 2.0209 Melem/s]
change:
time: [−46.298% −35.816% −28.197%] (p = 0.00 < 0.05)
thrpt: [+39.270% +55.803% +86.212%]
Performance has improved.

DB_Snapshot_Get/None time: [164.00 µs 170.36 µs 177.92 µs]
change: [−22.067% −13.808% −6.5875%] (p = 0.00 < 0.05)
Performance has improved.

DB_Snapshot_Get/Snappy time: [180.69 µs 184.57 µs 188.30 µs]
change: [−6.0673% −2.3115% +1.6904%] (p = 0.27 > 0.05)
No change in performance detected.

DB_CompactRange/None time: [6.2198 ms 6.3014 ms 6.3856 ms]
change: [−7.5711% −5.7784% −4.1254%] (p = 0.00 < 0.05)
Performance has improved.

DB_CompactRange/Snappy time: [7.9395 ms 8.0341 ms 8.1310 ms]
change: [−5.1079% −3.6891% −2.2758%] (p = 0.00 < 0.05)
Performance has improved.

@lonless9 lonless9 force-pushed the bytes-optimization branch from 11745e1 to f9ef19c Compare June 2, 2025 19:06
@dermesser
Copy link
Owner

Just a first couple comments, but overall good work! I will review the rest hopefully quite soon.

use integer_encoding::FixedInt;
use integer_encoding::VarInt;

pub type BlockContents = Vec<u8>;
pub type BlockContents = Bytes;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the block field in struct Block now just be a bare BlockContents? IIUC, Bytes is already cloneable, so does not need to be wrapped in an Rc<>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried this operation, but it resulted in some strange minor performance degradation. I'm not quite sure about the reason for this. However, from the perspective of Rust's general design philosophy, we should clearly do this.

The CRC transformation significantly improves write performance, and this PR primarily improves iteration performance. So I think the minor performance degradation I considered earlier can no longer be taken into account.

@@ -34,7 +34,7 @@ fn iter(db: &mut DB) {
let (mut k, mut v) = (vec![], vec![]);
let mut out = io::BufWriter::new(io::stdout());
while it.advance() {
it.current(&mut k, &mut v);
it.current();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this line forgets to update k and v, meaning the writes below are pointless. The returned values should probably be stored and written, instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously focused mainly on the copy overhead of the library and didn't pay much attention to the changes related to the examples. These examples might only have passed clippy without a careful review of the type conversions.

@@ -100,5 +100,5 @@ fn main() {
let mut db = DB::open(PATH, opt).unwrap();
db.put(b"~local_player", b"NBT data goes here").unwrap();
let value = db.get(b"~local_player").unwrap();
assert_eq!(&value, b"NBT data goes here")
assert_eq!(&*value, b"NBT data goes here")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the & operator necessary here? value.deref() should already produce a &[u8]?

src/filter.rs Outdated
@@ -139,7 +139,7 @@ impl FilterPolicy for BloomPolicy {
// Add all keys to the filter.
offset_data_iterate(keys, key_offsets, |key| {
let mut h = self.bloom_hash(key);
let delta = (h >> 17) | (h << 15);
let delta = h.rotate_left(15);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this equivalent? If not, what's the rationale? In the original implementation, this is supposed to swap the first two bytes and the last two bytes (dropping two bits in the middle), but now, the second two bytes will always be 0x8000 or 0x0000, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-analyzer complains about this, and I think
(h >> 17) | (h << 15)
= (h << 15) | (h >> 17)
= h.rotate_left(15)

stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/uint_macros.rs explains that

    /// Shifts the bits to the left by a specified amount, `n`,
    /// wrapping the truncated bits to the end of the resulting integer.
    ///
    /// Please note this isn't the same operation as the `<<` shifting operator!
    ///
    /// # Examples
    ///
    /// Basic usage:
    ///
    /// ```
    #[doc = concat!("let n = ", $rot_op, stringify!($SelfT), ";")]
    #[doc = concat!("let m = ", $rot_result, ";")]
    ///
    #[doc = concat!("assert_eq!(n.rotate_left(", $rot, "), m);")]
    /// ```

A quick check is here
https://gist.github.com/lonless9/5a0a55c67ed773f76f9d9cfdbf8e02e2

Even if changed to this rusty form, I don't think it will provide any advantage or disadvantage—just eliminating one that compiler would complain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change related to clippy is not relevant to this PR, and I will revert it. Actually, there are many clippy-related warnings now. Maybe we can focus on cleaning them up next time.

src/version.rs Outdated
@@ -592,12 +593,20 @@ pub mod testutil {
largest: &[u8],
largestix: u64,
) -> FileMetaHandle {
let smallest_key = LookupKey::new(smallest, smallestix);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two variables are unused, you can remove them

@lonless9 lonless9 force-pushed the bytes-optimization branch from 6af07a6 to 02a394c Compare June 12, 2025 20:12
@lonless9 lonless9 force-pushed the bytes-optimization branch from 02a394c to 0e2afab Compare June 12, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants